Volume 44 Number 2 June 2020 ISSN 0350-5596 Informatica An International Journal of Computing and Informatics Special Issue: SoICT 2019 Guest Editors: Huynh Thi Thanh Binh Ichiro Ide 1977 Editorial Boards Informática is ajournai primarily covering intelligent systems in the European computer science, informatics and cognitive community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international refereeing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor from the Editorial Board can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the list of referees. Each paper bears the name of the editor who appointed the referees. Each editor can propose new members for the Editorial Board or referees. Editors and referees inactive for a longer period can be automatically replaced. Changes in the Editorial Board are confirmed by the Executive Editors. The coordination necessary is made through the Executive Editors who examine the reviews, sort the accepted articles and maintain appropriate international distribution. The Executive Board is appointed by the Society Informatika. Informatica is partially supported by the Slovenian Ministry of Higher Education, Science and Technology. Each author is guaranteed to receive the reviews of his article. When accepted, publication in Informatica is guaranteed in less than one year after the Executive Editors receive the corrected version of the article. Executive Editor - Editor in Chief Matjaž Gams Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 251 93 85 matjaz.gams@ijs.si http://dis.ijs.si/mezi/matjaz.html Editor Emeritus Anton P. Železnikar Volariceva 8, Ljubljana, Slovenia s51em@lea.hamradio.si http://lea.hamradio.si/~s51em/ Executive Associate Editor - Deputy Managing Editor Mitja Luštrek, Jožef Stefan Institute mitja.lustrek@ijs.si Executive Associate Editor - Technical Editor Drago Torkar, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 251 93 85 drago.torkar@ijs.si Executive Associate Editor - Deputy Technical Editor Tine Kolenik, Jožef Stefan Institute tine.kolenik@ijs.si Editorial Board Juan Carlos Augusto (Argentina) Vladimir Batagelj (Slovenia) Francesco Bergadano (Italy) Marco Botta (Italy) Pavel Brazdil (Portugal) Andrej Brodnik (Slovenia) Ivan Bruha (Canada) Wray Buntine (Finland) Zhihua Cui (China) Aleksander Denisiuk (Poland) Hubert L. Dreyfus (USA) Jozo DujmoviC (USA) Johann Eder (Austria) George Eleftherakis (Greece) Ling Feng (China) Vladimir A. Fomichov (Russia) Maria Ganzha (Poland) Sumit Goyal (India) Marjan Gušev (Macedonia) N. Jaisankar (India) Dariusz Jacek Jak6bczak (Poland) Dimitris Kanellopoulos (Greece) Samee Ullah Khan (USA) Hiroaki Kitano (Japan) Igor Kononenko (Slovenia) Miroslav Kubat (USA) Ante Lauc (Croatia) Jadran Lenarčič (Slovenia) Shiguo Lian (China) Suzana Loskovska (Macedonia) Ramon L. de Mantaras (Spain) Natividad Martinez Madrid (Germany) Sando Martincic-Ipišic (Croatia) Angelo Montanari (Italy) Pavol Ndvrat (Slovakia) Jerzy R. Nawrocki (Poland) Nadia Nedjah (Brasil) Franc Novak (Slovenia) Marcin Paprzycki (USA/Poland) Wieslaw Pawlowski (Poland) Ivana Podnar Žarko (Croatia) Karl H. Pribram (USA) Luc De Raedt (Belgium) Shahram Rahimi (USA) Dejan Rakovic (Serbia) Jean Ramaekers (Belgium) Wilhelm Rossak (Germany) Ivan Rozman (Slovenia) Sugata Sanyal (India) Walter Schempp (Germany) Johannes Schwinn (Germany) Zhongzhi Shi (China) Oliviero Stock (Italy) Robert Trappl (Austria) Terry Winograd (USA) Stefan Wrobel (Germany) Konrad Wrona (France) Xindong Wu (USA) Yudong Zhang (China) Rushan Ziatdinov (Russia & Turkey) https://doi.org/10.31449/inf.v44i1.3031 Informatica 44 (2020) 113-108 103 Special issue on "The Tenth International Symposium on Information and Communication Technology —SoICT 2019" Since 2010, the Symposium on Information and Communication Technology—SoICT has been organized annually. The symposium series provides an academic forum for researchers to share their latest research findings and to identify future challenges in computer science. The best papers from SoICT 2015, SoICT 2016, and SoICT 2017 have been extended and published in the Special issue "SoICT 2015", "SoICT 2016", and "SoICT 2017" of the Informatica Journal, Vol.40, No.2 (2016), Vol. 41, No. 2 (2017), and Vol. 42, No. 3 (2018), respectively. In 2019, SoICT was held in the scenic Ha Long bay, Vietnam, during December 4-6, commemorating the tenth event of the symposium series. The symposium covered four major areas of research including Artificial Intelligence and Big Data, Information Networks and Communication Systems, Human-Computer Interaction, and Software Engineering and Applied Computing. Among 145 submissions from 28 countries, 63 papers were accepted for presentation at SoICT 2019. Among them, the following two papers were carefully selected, after further extension and additional reviews, for inclusion in this special issue. The first paper, "Privacy Preserving Visual Log Service with Temporal Interval Query using Interval Tree-based Searchable Symmetric Encryption" by Viet-An Pham, Huy-Hoang Huy Chung-Nguyen, Dinh-Hieu Hoang, Mai-Khiem Tran, and Minh-Triet Tran developed a smart secure service for visual logs with a temporal interval query. The proposed scheme achieves efficient search and update time while also maintaining all important security properties such as forward privacy, backward privacy, and it does not leak information outside the desired temporal range. The second paper, "Cycle Time Enhancement by Simulated Annealing for a Practical Assembly Line Balancing Problem" by Huong Mai Dinh, Dung Viet Nguyen, Long Van Truong, Thuan Phan Do, Thao Thanh Phan, and Nghia Duc Nguyen investigated the assembly line balancing problem. For this problem, they proposed a solution that takes the simulated annealing approach, which was proved to be effective and potentially applicable in practice. We hope that readers interested in Information and Communication Technology will find this Special Issue a useful collection of papers. Huynh Thi Thanh Binh Ichiro Ide 114 Informatica 44 (2020) 113-113 H.T.T. Binh et al. https://doi.org/10.31449/inf.v44i1.3031 Informatica 44 (2020) 115-108 103 Privacy Preserving Visual Log Service with Temporal Interval Query using Interval Tree-based Searchable Symmetric Encryption Viet-An Pham, Dinh-Hieu Hoang, Huy-Hoang Chung-Nguyen, Mai-Khiem Tran and Minh-Triet Tran Faculty of Information Technology - Software Engineering Lab - University of Science, VNU-HCM E-mail: pvan@apcs.vn, hdhieu@apcs.vn, cnhhoang@apcs.vn, tmkhiem@selab.hcmus.edu.vn, tmtriet@fit.hcmus.edu.vn Keywords: searchable symmetric encryption, temporal interval query, visual concept extraction Received: March 15, 2020 Visual logs become widely available via personal cameras, visual sensors in smart environments, or surveillance systems. Storing such data in public services is a common convenient solution, but it is essential to devise a mechanism to encrypt such data to protect sensitive information while enabling the capability to query visual content even in encrypted format at the services. More precisely, we need smart systems that their security and practicality must be balanced against each other. As far as we know, in spite of their importance in preserving personal privacy, such reliable systems have not gained sufficient attention from researchers. This motivates our proposal to develop a smart secure service for visual logs with a temporal interval query. In our system, visual log data are analyzed to generate high-level contents, including entities, scenes, and activities happening in visual data. Then our system supports data owners to query these high-level contents from their visual logs at the server-side in a temporal interval while the data are still encrypted. Our searchable symmetric encryption scheme TIQSSE utilizes interval tree structure and we prove that our scheme achieves efficient search and update time while also maintaining all important security properties such as forward privacy, backward privacy, and it does not leak information outside the desired temporal range. Povzetek: Problem uravnoteženja proizvodne poti je predstavljen odprto, brez omejitev npr. števila delavcev, zato je izviren. Avtorji testirajo vec algoritmov in predlagajo najboljšega. 1 Introduction In daily activities, people usually take photos and record video clips to capture moments and events in their lives. Besides, with the booming trend of developing smart interactive environments, such as smart homes, offices, or even cities, visual sensors are densely integrated to our habitats to record then analyse external contexts, such as monitoring users, objects, activities, etc. Consequently, visual lifelogs become increasingly available and are usually uploaded to store in online storage services. In this paper, we target two challenging problems to better develop an online storage service for private visual data: (i) to search photos or video clips based on their content, and (ii) to protect private data leakage at server-side accidentally or intentionally. First, we aim to bridge the gap between visual data and their semantics by allowing data owners to search with keywords. Each photo or frame in a video clip is processed to extract high-level concepts, including entities, scene attributes, activities, etc. Different types of high-level concept extractors can be plugged into our framework to meet specific requirements in real applications. Consequently, a photo or video frame can be considered as a document or a set of concepts, which are ready to be retrieved by key- words. We also demonstrate a prototype smart edge camera which can be re-configured remotely to generate visual data with associated extracted concepts. Second, a typical solution to protect data secrecy is to encrypt before uploading data to an online storage server. However, after encryption, data are no longer suitable to be searched normally. Symmetric Searchable Encryption (SSE), first proposed by Song et al. [23], can be used as a promising solution to privately save data while maintaining the ability to search in a collection of encrypted records. We adopt the approach of SSE in our proposed solution, and carefully design it to ensure the property of a dynamic SSE [13], i.e. to add, update, and delete data efficiently without re-encrypting the whole database. Besides, we also consider the forward and backward privacy criteria for SSE. Informally, the former means that an update query does not leak information if a newly added document contains keywords that were searched in the past, while the latter is to make sure that it is impossible to retrieve data from deleted files. Forward privacy has been receiving a lot of attention, while backward privacy is only studied in recent years. Most of the existing schemes suffer from key-size overgrowing after deletion queries [2, 4], thus limits the practicality of these schemes. Moreover, in particular cases, there are new security 116 Informatica 44 (2020) 115-125 V.-A. Pham et al. properties that must be satisfied: search only in a temporal interval, and do not leak any information outside of the requested range. For example, a police wants to check the private security camera of a company from a range of time for a criminal event. The company wants to provide the information exactly from the requested range and not leak any information from other temporal intervals. A similar problem is when we want to search for some disease in a medical database in a temporal interval, it is best to prevent leaking information of patients in other time. This motivates us to define Temporal Interval Query SSE (TIQSSE), a new SSE problem to search by keywords for documents in a particular temporal interval. This work is the extension paper of previous TIQSSE work [20], with more in-depth explanations and analysis. This paper is also a significantly enhanced version of [7]. Our previous work only guarantees a one-sided access pattern. For more clarity, the one-sided access pattern means that it can only preclude adversaries from extracting information about the documents that were added after the queried interval, while still leaking information of documents that were added before the requested range. In this paper, there is a great improvement on security since our SSE scheme now guarantees two-sided access pattern, which means it also prevents adversaries from gaining information of added documents. Our newly defined problem is different from the existing range query SSE schemes [1, 14]. In a range query SSE scheme, a server returns every document whose key/identifier is in a queried range. In our temporal interval SSE problem, the server only examines documents whose identifiers are inside the temporal interval to select the documents containing a query keyword w. Our secure SSE scheme does not suffer from key-size overgrowing after sufficient deleting queries like previous schemes. Our idea is based on Socpo? from Raphael Bost et al. [2] in 2016 and modifies it to match our problem. Although there are many improved constructions later [4,24], these ideas are not suitable for our problems that the use cases we target require efficient deletion operations which (1) have an acceptable time complexity and (2) do not increase server-side usage. Our main contributions in this paper are as follows. - We propose a solution for a public visual data storage service to assist data owners to search their photos and video clips with keywords, i.e. concepts extracted from visual content, and preserve data privacy in query and data manipulation (insert, update, delete). We also develop a prototype smart edge camera to handle concept extraction for recorded photos or video clips. - We also define TIQSSE as a new SSE problem to search with encrypted documents in a temporal interval while preventing data leakage outside the requested range. We then propose an efficient solution to search for a keyword in documents within a deter- mined time range and achieves both forward and backward privacy. In Section 2, we briefly review approaches and methods related to the two main aspects of our work, visual retrieval with concepts, and searchable symmetric encryption. We propose a smart secure framework for visual data storage service and smart edge camera in Section 3. in Section 4, We review the necessary preliminaries of cryptography, then define the novel TIQSSE problem. Our scheme which tackles this problem is introduced in Section 5. The security analysis of our proposed scheme is presented in Section 6. In Section 7, we draw our conclusion and discuss some fascinating directions for future works. 2 Related work 2.1 Visual retrieval with semantic concepts Visual log retrieval is one of the important problems to analyse and understand visual content. Different approaches have been proposed to provide users with various modalities to input queries and get retrieved results [17, 18, 26]. Visual semantic concepts from images are usually used as tags or keywords for interactive retrieval systems[26, 25]. The concepts can be detected using available APIs, such as Google Cloud Vision API, or pre-trained object detectors, such as Yolo [21], FasterRCNN [22], etc. Besides, scene attributes and categories [30] can be extracted from images to augment further environmental information of visual data[25]. Some works also utilize cap-tioning [27] or activity recognition to capture the dynamic nature of an image or video clip[16, 15]. In this work, we propose to integrate different concept extractors to create the associated metadata for each photo or video clip stored in the smart visual service. We also develop a prototype smart edge camera that can locally extract concepts in certain tasks before uploading visual data to online storage service (see Section 2.1). 2.2 Searchable symmetric encryption Song et al. [23] first proposed a solution to Searchable Symmetric Encryption in 2000. Although the first SSE scheme was not efficient, it provided a solid foundation for the problem. Many works were proposed [10, 6] to improve search time and security. However, leakage problems in SSE were not formally defined. Curtmola et al. [8] were the first to explicitly define the general acceptable leakage criteria for SSE problems, including search patterns and access patterns that are frequently used several years later. Although the previous schemes were optimal in search time, there was no way to update a database without re-encrypting the whole database. To remove this limitation, in 2012, dynamic SSE was proposed by Kamara et al. [13]. Their scheme can efficiently add or remove files with the trade-off by leaking some information when those queries Privacy Preserving Visual Log Service with... Informatica 44 (2020) 115-125 117 are executed. In particular, forward privacy and backward privacy are not fully satisfied. SSE problem is continuously studied and improved. Raphael Bost achieved forward privacy in 2016 [2], and also achieved backward privacy one year later [4]. In 2018, Sun et al. [24] proposed Puncturable Symmetric Encryption to construct and improve backward secure. Unfortunately, all schemes mentioned above not only suffer from key-size overgrowing after many deleting queries, but also do not support range query property that we need. Other than proposing new SSE constructions, many efforts were made to attack the proposed security models. Some notable works are inference attacks on deterministic encryption (DTE) and order-preserving encryption (OPE) [19, 11], leakage-abuse attacks [5, 3, 11, 12] and File-Injection attacks [29, 12]. Before us, there are many works about range queries. However, they all are different from ours. Their solution is used for indexing in relational databases and return entities that have acquired attributes within some range, while in our scheme, we need to return all the files containing the searched keyword in a period. 3 Smart secure framework for visual data storage service In this section, we present our proposal for a smart secure framework for a visual data storage service. We are inspired by the idea of edge computing to shift the concept extraction task toward the smart camera. There has been an ongoing interest on this shift, particularly from privacy-aware users due to recent breaches in data centers, where sensitive user data is processed and may be used for malicious purposes. If the process is on users' premise, they will have more control over the data that is generated. 3.1 Smart edge camera with concept extraction Figure 1 illustrates the process for concept extraction from photos/clips in a smart edge camera before uploading visual data with their associated metadata to the secure visual service. Different modules for various concept types can be deployed in the smart edge camera, such as object detection, person recognition, action recognition, scene attribute, category classification, and image captioning. In our prototype, we utilize NVIDIA Jetson Nano embedded computers with dedicated 128 Maxwell CUDA cores to handle various machine learning tasks. Our smart edge camera prototype can be specialized for various specific tasks with different models to be deployed and updated (see Figure 2). In our model repository, not only there are existing pre-trained models, such as ResNet-50, MobileNet-v2, SSD ResNet-18, SSD Mobilenet-V2, Tiny YOLO V3, but we also prepare our own custom models for other tasks, such as custom object detectors for contexts Associated Concepts Object Detection Visual Content Analysis Person Action Recognition Recognition Scene Scene Attribute Classification Extraction Image Captioning ill' Smart Edge Camera Figure 1: Concept extraction from photos/clips in a smart edge camera. originated from Vietnam or image captioning with concept augmentation [27]. Future custom models can also be created and further optimized with various techniques such as quantization, fusion, and scheduling available in NVIDIA TensorRT SDK, then deployed to the smart camera. Due to its cloud nature, the devices' software can be remotely updated, and additional machine learning models can be added in a secure manner. Pretrained Model Repository W Custom Model 3.2 Figure 2: Model update for an edge camera. Components in a smart secure visual system We propose a scenario in which a system collects, processes, and synchronizes the data from various cameras, including the proposed smart edge ones, to a visual data server that utilizes our proposed secure scheme for SSE. Figure 3 illustrates the three key components of the system: a storage and query processing server, camera nodes, and query nodes. Query Node Figure 3: Main components in smart secure visual system. 118 Informatica 44 (2020) 115-125 V.-A. Pham et al. In our system, the storage and query processing server supports multiple users, and the server owner can be different from the data owners. The owners of the server can fully examine the stored data, but are expected not to understand or to exploit useful information from stored data. Thus, to ensure this crucial property of our visual system, i.e. preserving data privacy for data owners, we define a new problem of Temporal Interval Query SSE (in Section 4) and propose an efficient solution for this problem (in Section 5). A user, after signing up, is provided a means to submit and retrieve data over commonly utilized protocols, such as HTTP SSL, SMB, or SFTP. Querying is done over an API with a common contract protocol implemented in gRPC, a protocol buffer library that utilizes HTTP2 over an SSL Channel. With gRPC's wide adoption status across numerous languages and libraries, the implementation is relatively easy and open for everyone. Connections to the server are secured with the server's certificate by default. We assume this certificate is self-signed and pre-installed on every query node via personal trusted channels beforehand. A user usually plays both roles as a generator party at upload time from a camera node and a querying party at retrieve time from a query node, which can be his or her mobile device. Thanks to the loosely coupled architecture, our proposed system allows new users to dynamically join in without any interruptions on the server-side using a streamlined user interface. 4 Temporal interval query searchable symmetric encryption In this section, we first provide background knowledge that includes several cryptographic primitives and the dynamic SSE problem. Then we introduce the definition and security properties for TIQSSE. 4.1 Preliminaries For consistency in presentation, we denotes: $ - x <— {0,1}n as randomizing n bits then store the result to x. function, pseudo random generator (PRG), pseudo random function (PRF), simulator, and symmetric encryption. For the symmetric encryption, we denote the encryption of plaintext m with secret key sk as SE.enc(sk, m), and the decryption of ciphertext c with secret key sk as SE.dec(sk, c). We also inherit the idea of trapdoor permutation from Bost et al. [2] and denote the function as n. Formally: One can compute n of p1 with the secret key Ks: p2 n(Ks,p1). Givenp2 in n's proper, one can derive the original p1 with the public key: p1 n-1(Kp,p2). Finally, for all p we have p = n(Ks,n-1(Kp,p)) = n-1(Kp,n(Ks,p)). 4.2 Dynamic symmetric searchable encryption In SSE, we view the database as an array of files F = (fi, f2,..., fn) where fi consists of multiple words (w1 ,w2,..., wmi). Later when the client request a search on keyword w, the client obfuscate or encrypt w into trapdoor T and give it to server. The server when receiving T must return a list of result identifiers R = (id1 ,id2, ...,idr) such that when returned to the client, for every i we have w € Fidi. It is notable that the act of obfuscating w into T is essential because it hides the original keyword from the server, in this paper we call this as trapdoor generation procedure. In other words, dynamic SSE consists of one algorithm Setup and two protocols Search and Update. - In Setup phase, the client creates some keys and key-pairs that will be used in the other 2 protocols. - The Search protocol consists of multiple interactions between client and server when the client request a search. For each client's request, the server should receive the trapdoor T and return a list of files as we mentioned in the above paragraph. - The Update protocol is comprised of 2 types of updates which is add a new file and delete an existed file. Depending on which update protocol, the encrypted database on the untrusted server will be modified based on the SSE scheme. - n as the number of added files. Fn as the n-th file. EFn as the encrypted file corresponding to Fn. - ± as null or empty. A as the security parameter. Security parameter means that unless specified explicitly, the keys used in SSE scheme is A-bit in length, and the probability for an adversary to break the scheme is 2_A. We use several cryptographic primitives from Dan Boneh and Victor Shoup [9] which includes: negligible Correctness. An SSE is correct if the probability that the search protocol returns the false results to client is negligible. Security. The SSE scheme E is said to be adaptively secure, if for any adversary A who issues a polynomial number of queries q(A), there exist a polynomial-time simulator S such that: |P[SSERealA(A, q) = 1] - P[SSEIdealA5jL(A,q) = 1]| < negl(A) Privacy Preserving Visual Log Service with... Informatica 44 (2020) 115-125 119 Informally, the simulator S can be thought of as an efficient probabilistic algorithm such that its output distribution is identical to the real scheme's output distribution. Then, the theorem above can be semantically understood as: if we can prove there exists a simulator S of SSE scheme E, then it is very hard for the adversary to distinguish between the real case with a simulation case. Hence, we achieve adaptive security for SSE. 4.3 Definition of temporal interval query SSE Temporal Interval Query SSE continues to use the model of the original dynamic Symmetric Searchable Encryption but modifies the Search protocol. When the client issues a search request, firstly he chooses a range of interest [L; R], then he chooses a keyword w he wants to search on, then he generates the trapdoor vector T that represents the keyword w for that range [L; R], finally he gives (T, L, R) to server. The server when receiving (T, L, R) must return a list of result identifiers Rw,l,r = (idi, id2,..., idr) such that when returned to the client, for every i we have w € Fidi and L < id < R (see Figure 4). 4.4 Security Forward privacy: Informally, a SSE scheme achieves forward privacy if its Update query does not leak any information about the newly added file even if it contains keywords that are previously searched keywords. For example, the client searched for a keyword w. Later, when the client add a file F that contains w, the server should not know that w exists inside F. Many researches [5] showed that if a scheme does not attain forward privacy, the client's queries, or even the plaintext, can be revealed even with small leakage. There also exist attacks [29] that can effectively exploit the vulnerability of those schemes to break query privacy. In addition, forward privacy can also improves time and space performances [2]. Backward privacy: To have backward privacy in dynamic SSE, we must prevent the adversary from gaining knowledge of deleted files from new queries. For example, if there exists a deleted file F that contains a word w and has never been queried, in the future when client search for Search for keyword w in "documents" only from time instant L to R ,-A-^ Time instant t1 Time instant L Time instant R Time instant tn Figure 4: An example of temporal interval query. word w, it is expected to prevent the server from knowing w € F. In order to have searchable property over encrypted data, there must be some leaking information throughout the process. We follow many previous works [2, 4, 24] and call this as leakage function L = (LStp, LSrch, LUpdt). The leakage function L is used to express the information learned by the untrusted server from 3 protocols Setup, Search, Update. Setup leakage: In the setup algorithm, the client generates some keys and keypairs for later usage in Search and Update protocol. Because of that, the leakage of setup phase is the public keys (if there is any) that the client wants to share with the server LStp = PK. Search leakage: Firstly, let Q as the search requests of the client where Qi = (Ti, Lj, Rj); R as the results returned by the untrusted server; Ri as the result of Qi where its content is (idij1, idij2,..., idijr.); H as the history of previous searches from the client that H = (Q, R). We define the allowed leakage of search protocol is comprised of search pattern e do upR+low R > min valid R > max effective R R 2 (N, H, wstSet) ^ GALBP1 (Tasks, R, A) if N> N then lowR ^ R else upR ^ R (N, H, wstSet) ^ GALBP1 (Tasks, upR, A) if N > N then > no solution return (to, to, 0%, NULL) else return (upR, N, H, wstSet) this procedure. 3.2 Binary search correctness Recall that in section 3.1, we have stated that if there is a solution which consumes no more than N workers when R = x then the minimum value of R is certainly not greater than x, and if such a solution does not exist it means that R must be greater than x. The correctness of this argument is proven in Lemma 3.1 below. Lemma 3.1. Let Tasks be any set of tasks and A G {5%, 10%, 15%}. Let Ri,R2 such that 3(1f:Ay < Ri < R2, where a is the maximum processing time of a task in Tasks. Assume that procedure GALBP1 can always produce an accurate result, if we set (Ni,Hi,wstSet1) = GALBP1(Tasks, Ri, A) and (N2,H2,wstSet2) = GALBP1(Tasks, R2, A), then N1 > N2. Proof. First we need to show that for all R > 3(1+Ay, a valid solution for GALBP1 always exists. Indeed, a solution where each workstation contains exactly one task would fit all the constraints mentioned in the problem statement. Then, we consider an interesting observation here: wstSet 1 is also a solution when R = R2 since all mentioned constraints are still satisfied. Moreover, if R = R2, solution wstSet 1 will consumes not as many workers as itself when R = R1, because of the way we calculate the number of workers in each workstations. Assume that when R = R2, wstSet1 consumes N workers, then we have N2 < N < N1 which is what we want to prove. □ Actually, when R < there will be no solu- 3(1+A)> tion. Because there exists at least one workstation i which has T > 3(R + AR), contradict with problem statement. Therefore setting lowR = 3(1+Ay at the beginning of procedure GALBP2 is indeed appropriate. Moreover when R > ^a , the minimum number of workers stops to decrease further, so initializing upR = ^a is suitable too. 4 Methods to estimate the procedure GALBP1 With the application of GALBP2 procedure, our original GALBP-2 is turned into solving another GALBP-1 in procedure GALBP1. GALBP-1 is very similar to the original problem GALBP-2, with all the constraints remain the same except that the number of workers is not bound anymore. Since the GALBP-1 in procedure GALBP1 is an NP-hard problem, it cannot be fully solved in polynomial time. Therefore, we tried to apply exhaustive search (brute-force search) along with different meta-heuristic methods such as simulated annealing (SA for short) and SA with greedy to produce answers as close as possible to the optimal ones. For a similar version of this GALBP-1 where H must not be less than 80%, we have already proposed an efficient SA algorithm [8] which performs excellently in terms of accuracy and speed. Therefore, with some reasonable modifications, we could expect our same methods to work well in this GALBP-1. Throughout section 4, we introduce about our approaches in detail to cope with this GALBP-1. The following section 5 will contain a full evaluation of all methods when being applied to solve our original GALBP-2 based on experimental results on real data of the garment industry. 4.1 Exhaustive search The exhaustive search finds the optimal result by considering all possible solutions. We design a simple exhaustive search algorithm for this GALBP-1 in the procedure 2. In this procedure, wst is the current built workstation which consists of tasks, curSol is the current solution which is a set of workstations and bestSol is the current best solution. By initializing bestSol as a random valid solution and calling exhaustive(1,1,0,0), we will have bestSol as our optimal solution when exhaustive terminates. For our Polo-Shirt example, when the number of tasks is not too large, the exhaustive search can still return an optimal solution after a reasonable time. Nonetheless, when input is big enough, it takes hours to run until termination, which is infeasible in industrial environment. Therefore, better approaches should be applied to deal with this problem. a Cycle Time Enhancement by Simulated Annealing for a... Informatica 44 (2020) 127-138 133 Procedure 2 Exhaustive search for GALBP-1_ Require: i: 1st task in current workstation, j: last added task in current workstation, wst: current workstation, curSol: current solution. 1: procedure exhaustive(i, j, wst, curSol) 2: if i> M then 3: if curSol is better than bestSol then 4: bestSol ^ curSol 5: else if wst = 0 then 6: if taski is marked then 7: exhaustive(i + 1,i + 1, 0, curSol) 8: else 9: Push taski into wst 10: exhaustive(i, i, wst, curSol) 11: Pop taski out of wst 12: else 13: if wst is valid then 14: Mark all tasks in wst 15: Push wst into curSol 16: exhaustive(i + 1,i + 1, 0, curSol) 17: Pop wst out of curSol 18: Unmark all tasks in wst 19: if |wst| < 3 then 20: for k ^ j + 1 to M do 21: if taskk is not marked then 22: Push taskk into wst 23: exhaustive(i, k, wst, curSol) 24: Pop taskk out of wst cedure is a close edition of the general SA algorithm from Talbi's book [28]. Procedure 3 Simulated Annealing Require: s0: initial solution, Tmax: starting temperature, L: neighbor generation loop time limit, Tdec: temperature drops after each step, P: probability to accept worse solution. procedure SA(so, Tmax, L, Tdec, P) s ^ so TT " max while T> 0 do for i ^ 1 to L do Generate a random neighbor s' if s' is better than s then s ^ s' else Assign s ^ s', probability P(T) T ^ T - Tdec return Best solution found There are five parameters that we need to decide for SA: so as the initial solution; starting temperature Tmax, L and Tdec for cooling schedule; and P as the acceptance probability of moving to a worse solution. Also we need to design a procedure to generate a random neighbor s' from a current solution s. All these factors will affect the quality of our algorithm. 4.2 Simulated annealing SA algorithm has been widely applied due to its feasibility in NP-hard problem classes through a randomized controlled process with reasonable calculation time. Therefore, the SA algorithm is a good tool for ALBP with a lot of constraints. 4.2.1 Motivation and idea Simulated annealing (SA for short) was first applied to optimization problems by S. Kirkpatrick et al. [18] and V. Cerny [5]. In the book "Metaheuristics: From design to implementation" of El-Ghazali Talbi [28], the author described almost every aspect of SA in detail. It is a meta-heuristic to approximate optimal solution in a large search space for an optimization problem. The idea of SA algorithm is derived from physical metallurgy. The metal is heated to high temperatures and cooled slowly so that it crystallizes in a low energy configuration. SA is chosen to solve this ABLP because of its simplicity and efficiency. It allows for a more extensive search for the global optimal solution, and can even find a global optimal solution if it runs for enough amount of time. We represent our SA approach in Procedure 3. This Pro- 4.2.2 Initial solution In theory, the initial solution s0 can be any valid solution and it does not affect the quality of SA. However, when the solution searching space is too large, a good initial solution can be a suitable approximation for the global optimum in a short amount of time. In section 4.2 we set a random solution as the initial solution for SA, and in section 4.3 we will assign a solution obtained from a greedy method to s0. Result comparison between these two approaches shows a remarkable efficiency difference. 4.2.3 Neighbor generation A neighbor of a solution s is generated simply by moving a task from a workstation to another workstation (including creating a new workstation consist of only that task) or swapping two tasks in two different workstations. There are at most M2 valid neighbors of a solution. Among all valid neighbors of s, we just consider its x best neighbors and randomly choose one of them. The reason why we do not choose among all valid neighbors is to save computation cost without worsening the algorithm efficiency too much. X is set high at the beginning and decreases as the temperature decreases, so that when temperature is high a wide 134 Informatica 44 (2020) 115-125 V.-A. Pham et al. range of neighbor is considered and at the end only better solutions are chosen. 4.2.4 Move acceptance Usually, the probability P that a worse solution is accepted depends on the current temperature T, the current solution s and the new solution s'. One of the most basic forms of P [28] can be written as: P (t , s, s') = e t = e t In which AE = f (s') - f (s) is the different of quality between the new and current solution. However in our SA algorithm, P depends only on T by a simple formula: P (T) = ttx AE is not used in our case since the quality of s and its chosen neighbor s' are not too different, they are even very close. Because s' is generated from s by just moving a task from a workstation to another or swapping two tasks in two workstations, and also s' is chosen among x best neighbors of s. Therefore, AE tends to be very small and negligible. Also, it is very hard to find an ideal formula for calculating the quality of a solution. Any tuned formula for a solution's quality is just overfit to some set of tests and performs badly in other tests. Computational results show that P (T) works well compared to any tuned version of P (T, s, s') that we design. Moreover, in our case P (T) formula is much simpler and more reasonable. 4.2.5 Cooling schedule In theory, the higher Tmax and L are the higher chance for optimal solution to be discoverable. Similarly, the lower Tdec is, the better our final solution will be. However, to save computation energy, these three parameters should be carefully tuned. 4.2.6 Multiple execution Since the solution search space for this GALBP-1 is very large, it is not guaranteed that when SA is applied on a unique input, a unique output will be produced. Therefore, given an input, SA algorithm will be repeated multiple times to provide multiple answers, then the best answer among them will be the solution for GALBP1 procedure. By experimenting on actual data, we realize that 10 times of repetition is enough to stabilize our SA algorithm without taking too much of time. 4.3 Simulated annealing with greedy A good initial solution provided by a greedy approach can always be a suitable approximation for the optimal result in a short amount of time. Also, when the solution search space is too large, it could help SA to find better final solution by focusing the process on a critical region only. With our GALBP-1, our initial solution s0 for SA is constructed by a 5-step algorithm described below: * Step 1: Choose a task u such that there is not any remaining task v = u where v must be done before u is processed. * Step 2: Create a workstation X which contains u and some of the remaining tasks so that X is valid and the following Wsx value is maximize (Ws here stands for "worker saved"): Wsx = nX - nx Where n'x is the total number of workers needed to complete all the tasks in workstation X if we divide these tasks into separated one-task-only workstations. If there are many workstations X with the same value WsX, choose any workstation which is balanced. * Step 3: Add X to so. * Step 4: Remove all tasks belong to X. * Step 5: If there is some task remaining, go back to Step 1. At step 2 of this algorithm, a greedy strategy is utilized: the best workstation which contains task u is added to the solution. Such a strategy efficiently exploits a signature property of an assembly line: Its precedence graph is almost identical to a tree with only a few number of branches. Therefore, a workstation tends to consist of connected tasks on the precedence graph, and removing them does not affect our future decisions so much. Indeed, experimental results which will be discussed in section 5 show that the SA with greedy solution's efficiency is usually better than that of exhaustive search and traditional SA, in terms of both accuracy and running time. 5 Computational results If the exhaustive search procedure were allowed to run fully, it would take several hours or even days until termination which is infeasible in industrial environment. Therefore, for each test, we forced it to terminate when it is called more than 6 x 106 times recursively, and its best produced result is collected. Besides that, for all versions of SA, we set Tmax = 100, L = 20 and Tdec = 5 to guarantee solution quality without consuming too much time. All algorithms are implemented in C++, and run on a computer which has 2.60GHz i7-8850H CPU (12 CPUs), NVIDIA Quadro P1000, 16GB RAM and 512GB SSD. Our algorithms were tested on real data set related to the production of Polo-Shirt products at Dong Van Garment Factory, Hanoi Textile & Garment Joint Stock Corporation, Vietnam. There are 12 cases, where 6 tests are created from each of these cases by modifying A and N. The values of A and N for each test are the combinations of three values of A (5%, 10% and 15%) and two different values of N where Nhigh the greater is about twice as NIow the smaller and NIow < 1.5M. Nhigh and NIow are different among cases. Hence there are 72 tests in total. The number of tasks M spreads evenly among tests, from 15 to 60. The performance of each algorithm on all tests in terms of the cycle time R, number of workers N, balance efficiency H Cycle Time Enhancement by Simulated Annealing for a... Informatica 44 (2020) 127-138 135 Table 3: Results for tests having A = 5% and V = JV¡o M N R-Ex R-SA R-SA-gr N- Ex NSA NSA- H-Ex (%) H-SA (%) H-SA- gr(%) T-Ex (s) T-SA (s) T-SA- gr(s) 15 18 31.429 31.429 31.429 18 18 18 54.545 54.545 54.545 0.055 10 10 20 18 30.952 31.157 30.953 18 18 18 55.556 55.556 50 37 28 28 25 24 41.857 41.857 41.857 24 24 24 29.412 29.412 29.412 11 57 55 30 28 41.81 41.81 41.81 28 28 28 23.81 23.81 19.048 110 89 90 32 26 84.762 72.621 69.02 26 26 26 23.81 38.889 29.412 843 125 128 33 36 93.016 92.245 91.429 36 36 36 33.333 25 37.5 952 125 127 35 32 42.857 37.871 37.66 32 32 32 28 52.381 44.444 911 155 162 47 33 No 35.05 32.273 No 32 33 No 34.783 54.545 1240 439 432 48 40 87.619 61.903 60.272 40 40 40 14.286 32 40.741 1368 343 347 50 40 47.143 34.822 33.225 40 40 40 11.111 33.333 46.154 1353 439 447 52 52 66.952 53.401 48.857 49 51 51 18.421 32 52 1552 531 531 60 40 No 71.011 59.831 No 38 40 No 32.143 58.333 1625 895 897 and running time in seconds is documented to make diagrams on Figure 3. The top six diagrams on Figure 3 show the cycle time R obtained from exhaustive search, SA and SA with greedy algorithms, divide by a number R0 which is calculated as: M Ro N (12) R0 is used as an approximation for the lower bound of R, since if A = 0% then R0 is exactly the lower bound of R and actually A is quite small (A < 15%) which means the real lower bound of R is not so different from R0. Therefore R0 is used to normalize R. Among the top six diagrams, the upper three of them consist of tests having TV = NViow and the lower three consist of tests having N = Nhigh. Each column contains a pair of diagrams sharing a particular A value (5%, 10% or 15%). The same order applies to diagrams of the balance efficiency H and running time. For example, Table 3 shows results of 12 tests having A = 5% and TV = NV;ow. Here "Ex" is exhaustive search, "SA" is simulated annealing and "SA-gr" is simulated annealing with greedy. These results are used to built the top-left diagram in each set of six diagrams in Figure 3. Since R0 is an approximation for the lower bound of R, a value of R is a good answer if it is not so far from R0. When TV = NV;ow, based on Figure 3, we can see that both SA and SA with greedy results are as good as results of exhaustive search in small tests but much better than exhaustive search in medium and large tests. Even in some cases, due to early termination, exhaustive search does not provide any valid solution, as opposed to SA algorithms, which still produces quality answers for all tests. In case of TV = NVhigh, the results of R may not be close to R0 since NVhigh « 2NViow can be a bit too high which made R0 too much lower than the real lower bound of R. Nevertheless, SA algorithms still show that they are always not worse than exhaustive search. In addition, SA with greedy is usually slightly better than traditional SA in terms of R, which reveals the effectiveness of greedy initial solution. For the balance efficiency H, SA algorithms can be slightly worse than brute force when the number of tasks M is small. However as M grows larger, SA algorithms clearly become superior to the exhaustive one. Moreover, H is usually higher than 40% and often fluctuates from 60% to 80% when SA is utilized which are quite satisfying outcomes. A point worth noting is that SA with greedy is remarkably better than exhaustive search and traditional SA in almost all test cases. In case of running time, SA algorithms completely outperform exhaustive search as expected since they are polynomial time algorithms while exhaustive search theoretically runs in exponential time. Also, results are produced from SA in less than 20 minutes even for the largest test cases. With its fast processing speed, SA is perfectly suitable for real industrial environment. With all the evaluation above, we can conclude that SA is an efficient meta-heuristic for our GALBP-2. In addition, 136 Informatica 44 (2020) 139-145 J. Pisanski et al. Figure 3: Diagrams of cycle time (R), balance efficiency (H) and running time of exhaustive search, SA and SA with greedy on 72 tests from Dong Van Garment Factory, Hanoi Textile & Garment Joint Stock Corporation, Vietnam. the SA with greedy version is clearly the most excellent, compared to both exhaustive search and traditional SA. 6 Conclusion In this paper, we represented a Simulated Annealing algorithm to solve a generalized assembly line balancing prob- Cycle Time Enhancement by Simulated Annealing for a... Informatica 44 (2020) 127-138 137 lem in the garment industry. Our GALBP-2 has the primary goal of minimizing the cycle time given the upper bound of number of workers. The secondary goal is minimizing the total number of workers on the assembly line. Then the last goal is determining the maximum balance efficiency. We efficiently utilized binary search to turn the original problem into a simpler problem GALBP-1, where the primary objective is minimizing the total number of workers and the secondary goal is maximizing the balance efficiency, given the cycle time. Then we introduced three methods to solve this GALBP-1: exhaustive search, SA and SA with greedy. All of them have their particular advantages in terms of accuracy and running time, depend on different test sizes. These algorithms are good supporting tools for garment factory managers to make plans before decisions. In other real assembly line balancing cases, our mentioned methods should also be considered as promising directions. Acknowledgments The authors would like to acknowledge the Dong Van Garment Factory, Hanoi Textile & Garment Joint Stock Corporation, Vietnam for supporting survey, experiment to complete this study. References [1] Ilker Baybars. A survey of exact algorithms for the simple assembly line balancing problem. Management science, 32(8):909-932, 1986. https: //doi.org/10.12 87/mnsc.32.8.90 9. [2] Nils Boysen, Malte Fliedner, and Armin Scholl. A classification of assembly line balancing problems. European journal of operational research, 183(2):674-693, 2007. https://doi.org/10. 1016/j.ejor.2006.10.010. [3] Benjamin Bryton. Balancing of a continuous production line. PhD thesis, Northwestern University, 1954. [4] GM Buxey. Assembly line balancing with multiple stations. Management science, 20(6):1010-1021, 1974. https://doi.org/10.12 87/mnsc. 20.6.1010. [5] Vladimir Cerny. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. Journal of optimization theory and applications, 45(1):41-51, 1985. https://doi.org/ 10.1007/bf00940812. [6] James C Chen, Chun-Chieh Chen, Yi-Jhen Lin, CJ Lin, and TY Chen. Assembly line balancing problem of sewing lines in garment industry. In Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, pages 7-9, 2014. https://doi.org/ 10.1109/icmlc.2009.5212600. [7] Wen-Chyuan Chiang. The application of a tabu search metaheuristic to the assembly line balancing problem. Annals of Operations Research, 77:209-227, 1998. [8] Mai Huong Dinh, Viet Dung Nguyen, Van Long Truong, Phan Thuan Do, Thanh Thao Phan, and Duc Nghia Nguyen. Simulated annealing for the assembly line balancing problem in the garment industry. In Proceedings of the Tenth International Symposium on Information and Communication Technology, pages 36-42, 2019. https://doi.org/10. 1145/3368926.3369698. [9] Selin Hanife ERYÜRÜK. Clothing assembly line design using simulation and heuristic line balancing techniques. Journal of Textile & Apparel/Tekstil ve Konfeksiyon, 22(4), 2012. [10] SH Eryuruk, F Kalaoglu, and M Baskak. Assembly line balancing in a clothing company. Fibres & Textiles in Eastern Europe, 66(1):93-98, 2008. [11] Rasul Esmaeilbeigi, Bahman Naderi, and Parisa Charkhgard. The type e simple assembly line balancing problem: A mixed integer linear programming formulation. Computers & Operations Research, 64:168-177, 2015. https://doi.org/ 10.1016/j.cor.2015.05.017. [12] Waldemar Grzechca. Assembly line balancing problem with reduced number of workstations. IFAC Proceedings Volumes, 47(3):6180-6185, 2014. https://doi.org/10.3182/ 20140824-6-za-1003.02530. [13] Allan L Gutjahr and George L Nemhauser. An algorithm for the line balancing problem. Management science, 11(2):308-315, 1964. https:// doi.org/10.12 87/mnsc.11.2.30 8. [14] WB Helgeson and Dunbar P Birnie. Assembly line balancing using the ranked positional weight technique. Journal of industrial engineering, 12(6):394-398, 1961. [15] Thomas R Hoffmann. Assembly line balancing with a precedence matrix. Management Science, 9(4):551-562, 1963. https://doi.org/10. 1287/mnsc.9.4.551. [16] Mahmut Kayar and Ö C Akyalgin. Applying different heuristic assembly line balancing methods in the apparel industry and their comparison. Fibres & Textiles in Eastern Europe, 2014. https://doi. org/10.5604/12303666.11914 38. [17] Maurice D Kilbridge and Leon Wester. A heuristic method of assembly line balancing. Journal ofIndus-trial Engineering, 12(4):292-298, 1961. 138 Informática 44 (2020) 139-145 J. Pisanski et al. [18] Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vec-chi. Optimization by simulated annealing. science, 220(4598):671-680, 1983. https://doi.org/ 10.1126/science.220.4598.671. [19] N Kriengkorakot and N Pianthong. The assembly line balancing problem: Review problem. J. Ind. Eng, 6(3):18-25, 1955. [20] Sophie D Lapierre, Angel Ruiz, and Patrick Soriano. Balancing assembly lines with tabu search. European journal of operational research, 168(3):826-837, 2006. https://doi.org/10.1016/j. ejor.2004.07.031. [21] Yuchen Li, Honggang Wang, and Zaoli Yang. Type ii assembly line balancing problem with multioperators. Neural Computing and Applications, 31(1):347-357, 2019. https://doi.org/10. 1007/s00521-018-3834-1. [22] Patrick R McMullen and GV Frazier. Using simulated annealing to solve a multiobjective assembly line balancing problem with parallel workstations. International Journal of Production Research, 36(10):2717-2741, 1998. https://doi.org/ 10.1080/0020754981924 54. [23] SG Ponnambalam, P Aravindan, and G Mogileeswar Naidu. A multi-objective genetic algorithm for solving assembly line balancing problem. The International Journal of Advanced Manufacturing Technology, 16(5):341-352, 2000. https://doi.org/ 10.1007/s001700050166. [24] M. E. Salveson. Induced matchings in intersection graphs. The Journal of Industrial Engineering, 6(3):18-25, 1955. [25] Bhaba R Sarker and JG Shanthikumari. A generalized approach for serial or parallel line balancing. THE INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 21(1):109-133, 1983. https://doi. org/10.1080/0020754 8308942341. [26] Armin Scholl and Christian Becker. State-of-the-art exact and heuristic solution procedures for simple assembly line balancing. European Journal of Operational Research, 168(3):666-693, 2006. https:// doi.org/10.1016/j.ejor.2004.07.022. [27] Yuri N Sotskov, Alexandre Dolgui, Tsung-Chyan Lai, and Aksana Zatsiupa. Enumerations and stability analysis of feasible and optimal line balances for simple assembly lines. Computers & Industrial Engineering, 90:241-258, 2015. https://doi.org/ 10.1016/j.cie.2 015.0 8.018. [28] El-Ghazali Talbi. Metaheuristics: from design to implementation, volume 74. John Wiley & Sons, 2009. https://doi.org/10 .1002/ 9780470496916. [29] Pedro M Vilarinho and Ana Sofia Simaria. A two-stage heuristic method for balancing mixed-model assembly lines with parallel workstations. International Journal of Production Research, 40(6):1405- 1420, 2002. https://doi.org/10 . 1080/ 00207540110116273. https://doi.org/10.31449/inf.v44i2.3083 Informatica 44 (2020) 139-138 127 A Novel Method for Determining Research Groups from Co-Authorship Network and Scientific Fields of Authors Jan Pisanski University of Ljubljana, Faculty of Arts E-mail: Jan.Pisanski@ff.uni-lj.si, http://oddelki.ff.uni-lj.si/biblio/oddelek/osebje/pisanski.html ORCID 0000-0002-3060-111X Mark Pisanski Tomaž Pisanski (corresponding author) University of Primorska, FAMNIT E-mail: Tomaz.Pisanski@upr.si, https://en.wikipedia.org/wiki/Tomaz Pisanski, ORCID 0000-0002-1257-5376 Keywords: co-authorship network, scientific field, maximal spanning tree, induced subnetwork, pruning of networks, pathfinder network, MST, line-cut Received: March 9, 2020 Large networks not only have a large number of vertices but also have a large number of edges. Although such networks are generally sparse they are usually difficult to visualise, even locally. This paper considers the case where large weights on edges represent similarity between the corresponding end-vertices. We follow two main ideas in this paper. The first one is network pruning, that is removal of edges that makes the resulting network more manageable while keeping the main characteristic of the original network. The other idea is to partition the network vertex set in such a way that the induced connected components represent groups of network elements that fit together. Furthermore, we assume that the vertices of the network are labeled by types. Here we apply our approach to co-authorship network of researchers in Slovenia in order to identify research groups, finding group leaders and the degree of interdisciplinarity of the group. For the network pruning phase we use a MST-pathfinder network and for vertex partition appropriate linecuts. Each cluster is assigned a distribution of types. In this case, the types correspond to scientific fields, also known as research interests of authors. A measure of interdisciplinarity of research group is derived from such a distribution. Povzetek: Velika omrežja nimajo le mnogo vozlišč, ampak imajo tudi mnogo povezav. Čeprav so običajno taka omrežja redka, so nepregledna in jih je težko prikazati na sliki, tudi lokalno. Ta prispevek obravnava omrežja, pri katerih velike vrednosti uteži na povezavah pomenijo podobnost pripadajočih krajišč. V prispevku sledimo dvema glavnima idejama. Prvajeklešcenje omrežja, to je odstranitev manj pomembnih povezav, zaradi cesarje nastalo omrežje bolj obvladljivo, hkrati pa se ohrani glavna značilnost prvotnega omrežja. Druga ideja je razdeliti vozlišča omrežja tako, da inducirane povezane komponente predstavljajo skupine omrežnih elementov, ki se med seboj prilegajo. Poleg tega predpostavljamo, da so vozlišča omrežja označena s tipi. V tem prispevku uporabljamo naš pristop k omrežju soavtorstev raziskovalcev v Sloveniji z namenom identifikacije raziskovalnih skupin, iskanja voditeljev skupin in stopnje interdisciplinarnosti skupine. Za fazo kleščenja omrežja uporabljamo usmerjevalno omrežje (MST-pathfinder network), za vozlišcno razbitje pa ustrezne reze povezav. Vsaki skupini je dodeljena porazdelitev tipov. Mero interdisciplinarnosti raziskovalne skupine izpeljemo iz takšne porazdelitve. V tem primeru tipi predstavljajo znanstvena podrocčja, oz. raziskovalne interese avtorjev. 1 Introduction In contemporary research community scientists collaborate within formal or informal research groups. Identifying such groups from data available in various bibliometric networks is an interesting challenge. In this note we propose a method that uses the co-authorship network on the one hand and declared scientific field of authors that can be extracted from some bibliographic databases, on the other. We propose a theoretical model that uses a network, i.e. graph with weights on edges and labels, called types, on its vertices. We may view labels as scientific fields or sub-fields. Our approach is quite general and can be applied to any weighted network with types. In this paper we apply it to co-authorship networks. Note that scientific fields are sometimes caller research interests. The method consists of two steps. In the first step the original co-authorship network is pruned in order to reduce 140 Informatica 44 (2020) 139-145 J. Pisanski et al. the number of edges and increase the number of components, in our case producing research groups. In this step line-cuts are determined. In the second step a collection of induced monotype subnetworks is pruned by applying MST-pathfinder algorithm to further reduce the number of edges while keeping the same connectivity. Our original contribution is combination of both methods and the use of symmetric predicate in the first step; see Algorithm 3. Note that the idea of using MST, pathfinder and MST-pathfinder has been used extensively in the past in variety of contexts of bibliographic and other research[6, 8, 11, 22, 23]. This rough general approach may be refined in several different ways. We present in detail only one such refinement and discuss some others in the conclusion. In general, bibliographic networks are very large and allow for a variety of methods for data mining [15], however in this pilot study we focus our attention on a relatively small data set. The data is restricted to Slovenian researchers and is taken from Slovenian bibliographic system SICRIS. Moreover, only researchers that are co-authors of mathematicians are considered. 2 Pruning of co-authorship network 2.1 Co-authorship network For basics in graph theory, the reader is referred to [4], for network theory, see for instance [3]. Let V be a list of authors from some bibliographic database. We say that u, v e V are adjacent: u ~ v, if u and v are co-authors of a common work from the corresponding database. Sometimes we restrict our attention to certain types of works or certain types of co-authorships. Usually only scientific works are considered and the co-authorship graph is computed from a two-mode network WA composed of pairs (w, a), works and authors for each co-author a of work w. Since binary relation ~ is irreflex-ive1 and symmetric it defines a simple graph G = (V, that we call the co-authorship graph. Let E = {uv e V2 |u ~ v} denote the set of unordered pairs of adjacent vertices of G. Instead of G = (V, we may use notation G = (V, E) to denote the same graph. The graph may be weighted where the weights w on the edges represent the number of joint papers between the two authors. In this way a network N = (V, E, w) is obtained. Let w(u, v) denote this weight. Sometimes, we may consider the weight of co-authorship differently for different number n(w) of co-authors of work w. Let W(u, v) denote the collection of works co-authored by u and v. For any work w let n(w) denote the number of authors of w. Then In a fractional approach [2] the weight f (u, v) is defined as: 2 f (u,v)= Y] -TY2 • ' n(w)2 w(u, v) = |W(u, v) 1 Sometimes one may use also loops at each vertex. The weight associ- ated with a loop may depend on the method that the co-authorship graph is constructed. If it is obtained by multiplication of two-mode networks [4] it represent the number of works for a given author. In the fractional approach it may represent the total contribution of an author. Loops are removed if we follow Newman's approach. we w (u,v) In case of Newman's normalization the weight is: f(u, v) v _2_ n(w) • (n(w) — 1) wew(u,v) v ' v v ' ' A network N is a weighted graph N = (V, E, w), where w : E ^ R is the weight function. In our case it is positive and the value 0 means there is no edge between u and v. A graph G = (V, is transformed into the network N = (V, E, a), where a(u, v) = 1 for all adjacent pairs of vertices u ~ v. The same bibliographic database can produce at least three types of networks for the weight functions a, w, f, defined above: 1. (V, E, a), the binary case, 2. (V, E, w) the standard case, and 3. (V, E, f ),thefractional case. Algorithm 4 Prune the network N = (V, E, w), n = |V |,m = |E | 1: Partition the edge set E into subsets Ej with equal weights: Ej = {e e E|w(e) = w4}. 2: Order the parts in descending order of weights wi > w2 > • • • > wfc 3: for u e V do 4: Su = {u} 5: F = 0 6: for i = 1, 2, ...,k do 7: Fj = 0 8: for e = uv e Ej do 9: Let Su, Sv be the corresponding sets. 10: if Su = Sv then 11: Append e to Fj. 12: for e = uv e Fj do 13: if Su = Sv then 14: Su = Sv = Su u Sv 15: Extend F by Fj 16: return subnetwork Pr(N) = (V, F, w). There is another aspect that we have not considered in this paper. Namely, the weight of an edge e = uv between two authors u and v may depend also on the total number of papers authored by each of the two authors. In this case we may modify the network to allow loops and define w* (u, v) = w(u, v) for u = v and let w* (u) = w* (u, u) denote the total number of papers having u as an author. Note that in general w* cannot be computed directly from w since we have no information about the single-authored papers. In this case the best way to compute w* is to multiply WAT by WA, where WA represents a two-mode network work-author. The theory of two mode networks and A Novel Method for Determining Research Groups Informatica 44 (2020) 139-145 141 their applications to bibliographic data can be found, for instance in [3]. 2.2 Pruning networks In the analysis of large networks, dense networks present a challenge. Usually one tends to partition the set of vertices and investigate the induced networks on such parts. In [3] one may find a variety of concepts that are useful in such analysis, e.g. cuts, islands, etc. Nevertheless, such subnetworks may be dense again and the role of particular vertices is not clearly visible. For this reason we prune the original network N = (V, E, w) by appropriately selecting a subset of important edges E' c E. If w' denotes the restriction of w on E', the pruned subnetwork N' = (V, E', w') is obtained. In case of co-authorship networks large weights indicate close collaboration between authors. When considering research groups one may assume strong collaboration within each group. Hence, in such a case a natural approach to pruning would be to remove all edges of lesser weights, while keeping the same connected components. A possible solution is given by the well-known maximum cost spanning tree. More precisely, in case of a disconnected network the resulting graph is a maximum cost spanning forest. However, the problem with a maximum cost spanning forest is that, in case when several edges have the same weights, the forest may not be unique. We use a Kruskal-like algorithm that produces a unique pruned network. Algorithm 1 is almost identical to the MST-pathfinder algorithm of [14] and produces the pathfinder network Pn(<, n — 1); for discussion and various aspects see also [19, 5, 20]. It is not hard to see, that the following is true: Theorem 1. The network N'(V, F,w) is uniquely determined from the original network. If all weights are positive, the connected components are the same as in the original network. Theorem 2. If all weights in N (V, E, w) are distinct, Algorithm 1 produces the (unique) maximum cost spanning forest. On the contrary, if all weights are the same no edge is discarded. Moreover, we easily compute the running time of Algorithm 1. Theorem 3. The time complexity of the above algorithm is O(m log m). In fact, the time complexity is the same as for Kruskal's algorithm[10]. The sorting and partitioning takes O(m log m) steps. There are two loops, each with O(m) steps, and the time complexity for the UNION-FIND is of lesser order. By applying this pruning method strong ties among the nodes remain visible. 3 Line-cuts For further refining the network N(V, E, w) one may choose a parameter t > 0, the threshold, or cut parameter and prune the edges with weights less than t. In this way the network Nt(V,Et,w) is obtained, where Et = {e e E|w(e) > t}. The choice of parameter t depends on our aims. There are several obvious goals. For instance: 1. We may choose maximal value of t that guarantees at least a prescribed number of connected components, say k. 2. An alternative is to insist that all components have at most prescribed number of vertices, say v. We present the basic pruning algorithm; see Algorithm 2. It produces essentially a line-cut, see for instance [3]. The only difference is that we keep isolated vertices. Algorithm 5 Prune the network N = (V,E,w), given threshold parameter t. Connected components of the resulting network are called line-cuts. 1: F = 0 2: for e e E do 3: if w(e) > t then 4: Append e to F. 5: return subnetwork Pr(N,t) = (V, F, w). In Python, Algorithm 2 can be reduced to a single statement: F = [e for e e E if w(e) > t] 4 Pruning networks with vertex types Let us assume we are given a finite number of types, or colors T, a network N(V, E, w) and a mapping c : V ^ T. The structure N(V, E, w, T, c) will be called a weighted network with vertex types. When pruning network with vertex types, a connected component consisting of vertices of a single type will be called monotype. Additionally, we will refer to the number of types used in a connected component as its type number. The maximum of type numbers of network components is called the type number of the network, In particular, we are interested in networks of low type number, preferably with monotype networks. Parameters of pruning may be adjusted in such a way that a monotype network is obtained. For networks with vertex types, in addition to the two goals described in Section 3, a third goal may be considered. 142 Informática 44 (2020) 139-145 J. Pisanski et al. - One may insist that all connected components are monotype, or more general that each component has at most S types (colors). The following basic algorithm (Algorithm 3) for a given network with types removes all edges that have endpoints of different types, or more generally, when they satisfy a symmetric predicate P. Algorithm 6 Prune the network with vertex types N = (V, E, w, T, c) with given threshold parameter t and (symmetric) predicate P : T2 {T, ±}. Connected components of the resulting network are called monotype linecuts. l 2 3 4 5 As we mentioned above the predicate P usually is true if both endpoints are of the same type. However, other options are possible. Namely we may have a similarity imposed on the predicates and P signifies that two types are sufficiently similar. We need an algorithm to analyse the network with vertex types; see Algorithm 4. Using these numbers we may select different parameters and re-run this algorithm to reduce the size of the maximal component or alternatively limit the number of different components. We may also insist that all components be composed of a single type. 5 Interdisciplinarity of research groups and leaders of research groups For a given network with vertex types one may perform basic statistics on it. Namely, one may compute absolute frequencies of types on the vertex set. f (x) = |{v e V|c(v) = x}| fi(x) = |{v e Vi|c(v) = x}| or relative frequencies &(x) = f (x)/n & (x) = fi(x)/ni where n = |V | and n = |Vi|. We consider two measures: r(N) = max{&(x)|x e T} and for each component: r(Vi) = max{^j(x)|x e T} s(Vi) = |supp = |{x e T|&(x) > 0}| Both measure the diversity of research interests in a research group. If r(Vi) < 0.5 there is no dominant discipline. If r(Vj) = 1, the group is totally homogeneous. Algorithm 7 Analyse network with vertex types N = (V,E, w, T, c). l: Partition V into connected components V1, V2,..., Vk Let d = max{|Vj j = 1, 2,..., k} for j = 1,2,..., k do b(j) = number of different types in Vj. Let y = max{b(j); j = 1, 2,..., k} return number of connected components k, order of maximal connected component d, and maximal number of types y in any component. One way to define a leader of a research group is to determine the vertex of maximal degree in the corresponding network, or even better the sum of weights of edges to the neighbouring vertices. There are two parameters that we are interested in. Let m be the number of edges of network N and let d be the maximal degree attained at vertex x. Let dd be the second largest degree. Then x can be defined as a leader of the research group, while dominance is the quotient d/m and absolutism is defined by expression 1 - d'/d. Note that it would be also interesting to explore the diversity index [21] in this context. However, we will address all of these in a future work. 6 Example The data used in our experiments was taken from COBIS-S/SICRIS [18] for the works indexed in Scopus [17]. Only papers, where at least one author was a mathematician, were considered. Co-authors that were not registered as researchers in Slovenia were not included. Scientific fields alias research interests, used in Slovenia have three levels. On Level 1 we have: 1 Science 2 Engineering 3 Medicine 4 Biotechnology 5 Social Sciences 6 Humanities 7 Interdisciplinary F = 0 2 for uv = e € E do 3 if P(c(u),c(v)) and w(e) > t then 4 Append e to F. 5 return subnetwork Pr(N, t, P) = (V, F, w, T, c). 6 s(N) = |supp = |{x e T|^(x) > 0}| The next table shows division of Science on Level 2. A Novel Method for Determining Research Groups Informatica 44 (2020) 139-145 143 1.01 Mathematics 1.02 Physics 1.03 Biology 1.04 Chemistry 1.05 Biochemistry and Molecular Biology 1.06 Geology 1.07 Comp. Intensive Methods and Appl. 1.08 Environment Protection 1.09 Pharmacy Figure 2: One of several research groups determined by the line-cut for t = 8. Figure 3: The research group of Figure 2 , pruned by the MST-pathfinder network. Red - Graph Theory, Yellow - Algebra, Blue - Numerical Mathematics, Green -Mathematics Figure 1: Line-cuts for threshold values t = 0,1, 2,... for four different predicates, each depending on the level i = 0,1,2,3. Each predicate depends on the interpretation of equality between two research types. Red -i = 0, blue -i =1, green - i = 2 and yellow - i = 3. Finally, the division of Mathematics in the Level 3 is indicated here: 1.01.01 Analysis 1.01.02 Topology 1.01.03 Numerical and Computer Mathematics 1.01.04 Algebra 1.01.05 Graph Theory 1.01.06 Probability and Statistics Level i may be interpreted as the length of the research interest code that is used to test equality: for i = 0, the string is not used at all, for i = 1 only the first characters are compared, for i = 2, the first four characters are compared, while for i = 3 all seven characters are compared. Different levels can be associated with the suitable choice of predicate P in Algorithm 3. Let Pi denote the predicate applicable to level i. For instance, for u = 1.01.01 and v = 1.01.04 P2(u,v) = T while P3(u,v) = ±. Here we give an example of a pruned research group network. We intend to perform a thorough analysis on more complete data set elsewhere. Figures 2 and 3 depict the same research group. The network in Figure 3 is tree-like and is obtained from the network of Figure 2 by MST-pathfinder method. Vertex colors denote vertex types: red: 1.01.05, yellow: 1.01.04, blue: 1.01.03 and green 1.01.00. In the database some researchers were assigned research interest at level 2, e.g 1.01 (Mathematics). For consistency, we expanded that to level three as 1.01.00. Note that the research group in Figure 3 is composed of two subgroups, one predominantly interested in graph theory and the other in algebra. There is a central triangle connecting the two subgroups. 7 Conclusion Co-authorship graphs and networks are important in the study of research structure and dynamics; see for instance [7, 9, 12, 1]. Their practical value has first been recognised by specialised systems, such as MathSciNet and Zb-Math; see [16, 24]. Including them in more general bibliographic systems such as SICRIS [18] would be beneficial for most users. Potential applications are plenty. In this paper we presented only one aspect of such applications. In a recent paper [13] a completely different application is sought, namely, organising talks at a conference in such a way that speakers with similar topics are scheduled at different times. The data that was available to us has also authors with UNKOWN research interest. In this preliminary study we considered it as a separate research interest. It would be interesting to repeat the study with some flexibility and con- 144 Informática 44 (2020) 139-145 J. Pisanski et al. sider the function: c : V ^ T U {UNKNOWN}. Clearly line-cuts refine the vertex partition and apply only within a component. Note that in general one could take different thresholds in different components. In case we intend to have components with given maximal size v, then indeed different threshold values may be used. In or future more comprehensive work we intend to address some further extensions and applications of the MST-pathfinder method as well as some of the parameters that we have introduced. Acknowledgement We would like to thank Vladimir Batagelj for useful advice and fruitful discussion. The work of both referees improved the presentation considerably and is gratefully acknowledged. Work of Jan Pisanski is supported in part by the ARRS grants P5-0361 and J5-8247, while work of Tomaž Pisanski is supported in part by the ARRS grants P1-0294,J1-7051, N1-0032, and J1-9187. References [1] T. Bartol, G. Budimir, P. Južnic, K. Stopar. Mapping and classification of agriculture in Web of Science: other subject categories and research fields may benefit. Scientometrics, vol. 109 (2016) no. 2, pp. 979-996. https://doi.org/10.1007/s11192-016-2071-6 [2] V. Batagelj. On Fractional Approach to Analysis of Linked Networks, Scientometrics (2020). https://doi.org/10.1007/s11192-020-03383-y [3] V. Batagelj, P. Doreian, A. Ferligoj, N. Kejžar. Understanding large temporal networks and spatial networks: Exploration, pattern searching, visualization and network evolution, (2014) John Wiley & Sons. [4] J.A. Bondy and U.S.R. Murty. Graph theory, (2008) Graduate Texts in Mathematics, 244. Springer, New York. [5] C. Chen. Science mapping: a systematic review of the literature. Journal of Data and Information Science, vol. 2 (2017) no.2, pp1-40. https://doi.org/10.1515/jdis-2017-0006 [6] C. Chen, S. Morris. Visualizing evolving networks: Minimum spanning trees versus pathfinder networks. In IEEE Symposium on Information Visualization 2003 (2003), pp. 67-74. https://doi.org/10.1109/INFVIS.2003.1249010 [7] A. Ferligoj et al. Scientific collaboration dynamics in a national scientific system. Scientometrics, vol. 104, (2015), no. 3, pp. 985-1012. https://doi.org/10.1007/s11192-015-1585-7 [8] M. Gallivan, M. Ahuja. . Co-authorship, ho-mophily, and scholarly influence in information systems research. Journal of the Association for Information Systems, vol. 16 (2015) no. 12, 2. https://doi.org/10.17705/1jais.00416 [9] L. Kronegger, F. Mali, A. Ferligoj, and P. Doreian. Collaboration structures in Slovenian scientific communities. Scientometrics, vol. 90 (2012), no.2, pp. 631-647. https://doi.org/10.1007/s11192-011-0493-8 [10] JB. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society. vol.7 (1956) no.1, pp. 48-50. https://doi.org/10.1090/S0002-9939-1956-0078686-7 [11] T. Jacobsen, RL. Punzalan, ML. Hedstrom. Invoking "collective memory": Mapping the emergence of a concept in archival science. Archival Science, vol. 13(2013)no. 2-3, pp. 217-251. [12] S. Peclin, P. Juznic, R. Blagus, MC. Sajko, J. Stare. Effects of international collaboration and status of journal on impact of papers. Scientometrics, vol. 93 (2012) no. 3, pp. 937-948. https://doi.org/10.1007/s11192-012-0768-8 [13] J. Pisanski, T. Pisanski. The use of collaboration distance in scheduling conference talks. Informatica : an international journal of computing and informatics, vol. 43 (2019) no. 4, pp. 461-466, https://doi.org/10.31449/inf.v43i4.2832. [14] A. Quirin O. Cordón, V. P. Guerrero-Bote, B. Vargas-Quesada, F. Moya-Anegón. A quick MST-based algorithm to obtain Pathfinder networks — 1), Journal of the American Society for Information Science and Technology vol. 59 (2008) no. 12, pp.19121924. https://doi.org/10.1002/asi.20904 [15] J. Leskovec, A. Rajaraman, and J. Ullman. Mining of Massive Datasets (2014), Cambridge University Press. https://doi.org/10.1017/CB09781139924801 [16] MathSciNet: https://mathscinet.ams.org/mathscinet/index.html [17] Scopus: https://www.scopus.com/home.uri [18] SICRIS: https://www.sicris.si/public/jqm/cris.aspx?lang=eng [19] A. Vavpetic, V. Batagelj, V. Podpecan. An implementation of the Pathfinder algorithm for sparse networks and its application on text network, In M. Bohanec (ed.), 12th International Multiconference Information Society, vol. A (2009) pp. 236-239. A Novel Method for Determining Research Groups Informatica 44 (2020) 139-145 145 [20] HD. White. Pathfinder networks and author cocitation analysis: A remapping of paradigmatic information scientists. Journal of the American Society for Information Science and Technology, vol. 54 (2003) no. 5, pp. 423-434. https://doi.org/10.1002/asi.10228 [21] Diversity index, Wikipedia, https://en.wikipedia.org/wiki/Diversity_index [22] H. Yang, HJ. Lee. Research trend visualization by MeSH terms from Pubmed. International journal of environmental research and public health, vol. 15 (2018) no. 6, 1113. https://doi.org/10.3390/ijerph15061113 [23] SY. Yu. Detecting collaboration patterns among iSchools by linking scholarly communication to social networking at the macro and micro levels. LIBRES: Library and Information Science Research Electronic Journal vol. 23 (2013) no. 2, pp. 1-13. [24] zbMATH: https://zbmath.org/ 146 Informatica 44 (2020) 139-145 J. Pisanski et al. https://doi.org/10.31449/inf.v44i2.3083 Informatica 44 (2020) 147-138 127 Dialogue Act-Based Expressive Speech Synthesis in Limited Domain for the Czech Language Martin Gruber, Jindrich Matousek, Zdenek Hanzlicek and Daniel Tihelka University of West Bohemia Faculty of Applied Sciences NTIS - New Technologies for the Information Society, Department of Cybernetics Univerzitni 8, Pilsen, Czech Republic E-mail: gruber@ntis.zcu.cz Keywords: speech synthesis, unit selection, HMM, expressivity, dialogue act, limited domain Received: October 30, 2018 This paper deals with expressive speech synthesis in a dialogue. Dialogue acts - discrete expressive categories - are used for expressivity description. The aim of the work is to create a procedure for development of expressive speech synthesis for a dialogue system in a limited domain. The domain is here limited to dialogues between a human and a computer on a given topic of reminiscing about personal photographs. To incorporate expressivity into synthetic speech, modifications of current algorithms used for neutral speech synthesis are made. An expressive speech corpus is recorded, annotated using a predefined set of dialogue acts, and its acoustic analysis is performed. Unit selection and HMM-based methods are used to synthesize expressive speech, and an evaluation using listening tests is presented. The listeners asses two basic aspects of synthetic expressive speech for isolated utterances: speech quality and expressivity perception. The evaluation is also performed for utterances in a dialogue to asses appropriateness of synthetic expressive speech. It can be concluded that synthetic expressive speech is rated positively even though it is of worse quality when comparing with the neutral speech synthesis. However, synthetic expressive speech is able to transmit expressivity to listeners and to improve the naturalness of the synthetic speech. Povzetek: Razvitaje metoda za izrazno govorno sintezo v cescini. 1 Introduction Nowadays, speech synthesis techniques produce high quality and intelligible speech. However, to use synthetic speech in dialogue systems (ticket booking [1], information on restaurants or hotels [2], flights [3], trains [4] or weather [5]) or in any other human-computer interactive systems (virtual computer companions, computer games), the voice interface should be more friendly to make the user to feel more involved in the interaction or communication. Synthetic speech cannot sound completely natural until it expresses a speaker's attitude. Thus, expressive (or emotional) speech synthesis is a frequently discussed topic and has become a concern of many scientists. Even though some results have already been presented, this task has not been satisfactorily solved yet. Some papers which deal with this problem include, but are not limited to [6, 7, 8,9, 10, 11, 12]. To reduce the complexity of the general expressive speech synthesis, the task is usually somehow limited (as well as limited domain speech synthesis systems are) and focused on a specific domain, e.g. expressive football announcements [13], sport commentaries [14] or dialogue system in a tourism domain [15]. In this work, we limited the domain to conversations between seniors and a com- puter. Personal photographs were chosen as the topic for these discussions since the work started as a part of a major project aiming at developing a virtual senior companion with an audiovisual interface [16]. Once the specific limited domain is defined, the task of expressive speech synthesis becomes more easily solvable. However, this work tries to propose a general methodology for designing an expressive speech synthesizer in a limited domain. Thus, it should be possible to create a synthesizer for various limited domains following the procedure described herein. In the first phase of our research, becoming acquainted with the defined domain was the main goal. Thus, an extensive audiovisual database containing 65 natural dialogues between humans (seniors) and a computer (represented by a 3D virtual avatar) was created using the Wizard-of-Oz method which was proposed in [17] and used e.g. in [18, 19]. Afterwards, the dialogues were manually transcribed so that the text could be used later. The process of the database recording is described in Section 2. Next, on the basis of these dialogues (the texts and the audio recordings), an expressive speech corpus was designed and recorded. The recording of the expressive corpus was performed in the form of a dialogue between a professional female voice talent and a computer. The di- 148 Informática 44 (2020) 147-165 M. Grûber et al. alogues were designed on the basis of the natural dialogues recorded in the previous phase. Thus, the voice talent (acting as the virtual avatar now) was recording predefined sentences as responses to the seniors' speech that the voice talent was listening to. The expressive speech corpus recording process is in more details described in Section 3. To synthesize expressive speech, an expressivity description has to be defined. Many approaches have been suggested in the past. Continuous descriptions using multidimensional space with several axes to determinate "expressivity position" were described e.g. in [20, 21]. Another option is a discrete division into various groups, for emotions e.g. happiness, sadness, anger, joy, etc. [22]. The discrete description is the most commonly used method and various sets of expressive categories are used, e.g. dialogue acts [23, 15], emotion categories [24, 7, 25] or categories like good news and bad news [8, 26]. Thus, a set of expressive categories was defined and used to annotate the expressive speech corpus. The expressive categories used in our work are presented in Section 4 and annotation of the expressive speech corpus is described in Section 5. There are various methods to produce synthetic speech, the mostly used are unit selection [27], HMM-based methods [28], DNN-based methods [29] or other methods based on neural networks [30, 31]. These methods can be certainly used also for the expressive speech synthesis. In addition, a method for voice conversion [32] can be taken into consideration. Although this method is primarily used for a conversion of source voice to a target voice in the process of the speech synthesis, it can be also used to convert one speaking style to another [33]. DNN-based approaches then allows e.g. adaptation of an expressive model to a new speaker [34]. To incorporate expressivity into speech using unit selection method, the baseline algorithm used e.g. in [35, 36] was slightly modified. The main modification consists in a different target cost calculation. A prosodic feature representing an expressive category is considered in addition to the current set of features used for the cost calculation. To get specific penalties for speech units labelled with an expressive category different from the requested one, enumerated differences between various expressive categories are used. To compute the penalties, a penalty matrix based on perception and acoustic differences is used. The complex acoustic analysis of the expressive speech corpus along with the unit selection method modifications is described in Section 6. Even though this work is mainly focused on using the unit selection method for expressive speech synthesis, a brief description of preliminary experiments with HMM-based method is also presented. The HMM-based TTS system settings is described in Section 7. The results and evaluation are presented in Section 8. The expressivity perception ratio is investigated for natural speech and for synthetic speech generated by both the unit selection based TTS system and the HMM-based TTS system. The synthetic speech quality is also discussed in that section. As the results of this work are to be used in a dialogue system, the suitability of produced expressive synthetic speech is evaluated also directly in dialogues. 2 Natural dialogues To become acquainted with the limited domain, an extensive audiovisual database1 of natural dialogues was created using the Wizard-of-Oz method. This means that each dialogue was recorded as a dialogue between a human (senior) and a computer (avatar) which was allegedly controlled only by the human voice. However, the computer was covertly controlled by human operators from another room. Thus, the operators were controlling the course of the dialogue whereas the recorded human subjects thought they are interacting with an independent system based on artificial intelligence. The avatar was using neutral TTS system ARTIC [35] to speak to the human subjects. The recording procedure is described in [37] in more details. 2.1 Recording setup A soundproof recording room has been established for the recording purposes (the scheme is shown in Figure 1). In the recording room, the human subject faces an LCD screen and two speakers. The speech is recorded by two wireless high-quality head microphones (one for the human subject and one for the computer avatar), and the video is captured by three miniDV cameras. A surveillance web-camera was placed in the room to monitor the situation, especially the senior's state. The only contact between a user and the computer was through speech, there was no keyboard nor mouse on the table. speaker Oc speaker LCD O Wireless microphones r MiniDV cameras Surveillance Web camera Figure 1: Recording room setup. A snapshot captured by the miniDV cameras during a recording session is presented in Figure 2. The cameras 1The video recordings are not used for the purposes of the expressive TTS system design. They were just archived and are intended for future use in audiovisual speech recognition, emotion detection, gesture recognition, etc. Dialogue Act-Based Expressive Speech Synthesis in. Informatica 44 (2020) 147-165 149 were positioned to be able to capture the subject from three different views to provide data usable in various ways. Figure 2: Screenshot captured by the miniDV cameras during a recording session. 2.2 Recording application description A snapshot of the screen presented to human subjects is shown in Figure 3 ("Presenter" interface). On the left upper part of the LCD screen, there is visualized 3D model of a talking head. This model is used as the avatar, the impersonate companion that should play a role of the partner in the dialogue. Additionally, on the right upper part, there is shown a photograph which is currently being discussed. On the lower half of the screen, there is a place used for displaying subtitles (just in case the synthesized speech is not intelligible sufficiently). The subtitles were displayed only during the first few sessions and then they were switched off as the generated speech turned out to be understandable enough. by clicking on them. Each sentence of the scenario was given a number related to the picture displayed on the left. This enabled the orientation in large pre-prepared scenarios. Under the picture there is a button for displaying the picture on the "Presenter" screen. Once a sentence is selected by clicking on the list, it appears in the bottom edit box just above the buttons "SPEAK" and "clear". The displayed sentence can be modified before pressing "SPEAK" button and also an arbitrary text can be typed into the edit box. The right part of the screen is intended for displaying buttons bearing non-speech acts (smile, laughter, assentation, hesitation) and quick phrases (Yes. No. It's nice. Alright. Doesn't matter. Go on; etc.). Figure 4: Snapshot of the WoZ system interface - the operator's side. Figure 3: Snapshot of the WoZ system interface - the user's side. In Figure 4, a screen of the operator's part of the recording application is shown ("Wizard" interface). The interface provides the human operators with possibilities of dialogue flow controlling. The middle part of the screen serves to display the pre-prepared scenario for a dialogue. Note that the wizards could select the sentences from the scenario, the assumption on how the dialogue could develop, 2.3 Audiovisual database statistics Almost all audio recordings are stored using 22kHz sample rate and 16-bit resolution. The first six dialogues were recorded using 48kHz sample rate, later it was reduced to the current level according to requirements of the cooperating team dealing with ASR (automatic speech recognition). The total number of recorded dialogues is 65. Based on gender, the set of speakers can be divided into 37 females and 28 males. Mean age of the speakers is 69.3 years; this number is almost the same for both male and female speakers. The oldest person was a female, 86 years old. The youngest one was also a female, 54 years old. All the recorded subjects were native Czech speakers; two of them (1 male and 1 female) spoke a regional Moravian dialect. This dialect differs from regular Czech language in pronunciation and also a little in vocabulary. Approximately one half of the subjects stated in the after recording form that they have a computer at home. Nevertheless, most of them do not use it very often. Almost all the dialogues were rated as friendly and smooth. And even more, the users were really enjoying reminiscing on their photos, no matter that the partner in the dialogue was an avatar. Duration of each dialogue was limited to 1 hour, as this was the capacity of 150 Informática 44 (2020) 147-165 M. Grûber et al. tapes used in miniDV cameras, resulting in average duration 56 minutes per dialogue. During the conversation, 8 photographs were discussed in average (maximum was 12, minimum 3). 3 Expressive corpus recording 3.1 Texts preparation For developing a high-quality expressive speech synthesis system, an expressive speech corpus has to be created. Such a corpus can be then merged or just enhanced by a neutral one to create a robust corpus containing neutral speech as well as expressivity while keeping a maximum speech units coverage (phonetic balance). The process of designing texts for the expressive corpus recording is very important. The real natural dialogues and their transcriptions were taken as a basis for such a design. Thus, almost all the texts (more than 7000 sentences) uttered by the computer avatar during the natural dialogues were used. Texts containing unfinished phrases due to e.g. speakers overlapping were omitted. These texts form a set of sentences to be recorded. 3.2 Recording process For the expressive corpus recording, a method using so-called scenarios was applied. A scenario in our case can be viewed as a natural dialogue whose course is prepared in advance, just with missing audio of one of the participants (the avatar). This means that the parts of the dialogues to be uttered by a voice talent represent the computer avatar responses and order of these parts is fixed. The parts also follow the natural dialogues and are accompanied with the other participant's original speech to provide the voice talent with information about the context. Actually, the recording was a simulation of the natural dialogues where the voice talent was standing for the computer avatar and was pronouncing its sentences. This should stimulate the voice talent to became naturally expressive while recording. As the voice talent, a female professional stage-player experienced in speech corpora recording was chosen. The voice talent had already recorded the neutral speech corpus for our neutral TTS system. This corresponds with the intention suggested in Section 3.1 that the expressive corpus should be enhanced by the neutral one to keep the speech units coverage. To improve the performance of tools processing the recorded speech corpora, glottal signal was captured along with the speech signal during the recording. 3.3 Recording application description To record the expressive corpus using the above described method, a special recording application was developed. The application interface is depicted in Figure 5. Figure 5: Interface of the application for expressive corpus recording. On the upper part of the application window, the text to be recorded is displayed. However, the voice talent was allowed to change the exact sentence wording if unclear2 while keeping the same meaning. On the middle part, there are, among other things, control buttons for recording and listening. On the bottom, the waveform of the just recorded sentence is shown. The application can be also controlled via keyboard short-cuts to make it more comfortable for the voice talent. 4 Expressivity description To incorporate expressivity in synthetic speech, some kind of its description is necessary. A general description of expressivity is a very complex task that has not been satisfactorily solved yet even thoûgh there are some stûdies (e.g. [38]) dealing with this topic. For various research fields and their tasks, there are various possibilities of expressivity description. In our work, a description making use of so-called dialogue acts was used. It is a categorical description based on a classification of expressivity into pre-defined classes (used also in [39, 23, 15]). Although there are several schemas describing expressivity using dialogue acts (including DAMSL [40, 41], SWBD-DAMSL [42], VERBMOBIL [43, 44] or AT&T schema [39]), a new schema was employed to describe expressivity in our limited domain in question. The set of proposed dialogue acts is shown in Table 1 along with a few examples. The definition of the dialogue acts was based on the audiovisual database of the natural dialogues (described in Section 2) and on the expressive speech corpus (described 2 Since the texts for the recording were prepared automatically and were not manually checked due to their high number, they could contain some typos, unintelligibilities or unclarities. Dialogue Act-Based Expressive Speech Synthesis in. Informatica 44 (2020) 147-165 151 dialogue act example directive Tell me that. Talk. request Let's get back to that later. wait Wait a minute. Just a moment. apology I'm sorry. Excuse me. greeting Hello. Good morning. goodbye Goodbye. See you later. thanks Thank you. Thanks. surprise Do you really have 10 siblings? sad empathy I'm sorry to hear that. It's really terrible. happy empathy It's nice. Great. It had to be wonderful. showing interest Can you tell me more about it? confirmation Yes. Yeah. I see. Well. Hmm. disconfirmation No. I don't understand. encouragement Well. For example? And what about you? not specified Do you hear me well? My name is Paul. Table 1: Set of dialogue acts. in Section 3). These dialogue acts are than used for expressive corpus annotation (Section 5) and also in the process of the expressive speech synthesis (Section 6). The need for a new dialogue act schema was driven by a definition of our specific limited domain. Most of the dialogue acts are intended to encourage the (human) partner in a dialogue to talk more about a topic while keeping the computer dialogue system to behave more like a patient listener. Even though the dialogue acts schemas are generally supposed to describe various phases of dialogues, we assume that in various dialogues' phases a speaker can present his state of mind, mood or personal attitude in a specific way. We believe that the proposed set of dialogue acts can be used not only for description of various dialogue phases but that it also represents the speaker's attitude and affective state expressed by expressive speech. Using these dialogue acts in this limited domain, the synthetic speech is supposed to become more natural for the listeners (seniors in this case). 5 Expressive corpus annotation The expressive speech corpus was annotated by dialogue acts using a listening test. The test was aimed to determine objective annotation on the basis of several subjective annotations as the perception of expressivity is always subjective and may vary depending on a particular listener. Preparation works, listening test framework, evaluation of listening test result and a measure of inter-rater agreement analysis is presented in the following paragraphs. 5.1 Listening test background The listening test was organized on the client-server basis using a specially developed web application. This way, listeners were able to work on the test from their homes without any contact with the test organizers. The listeners were required to have only an internet connection, any browser installed on their computers and some device for audio playback. Various measures were undertaken to detect possible cheating, carelessness or misunderstandings. Potential test participants were addressed mostly among university students from all faculties and the finished listening test was financially rewarded (to increase motivation for the listeners). The participants were instructed to listen to the recordings very carefully and subsequently mark dialogue acts that are expressed within the sentence. The number of possibly marked dialogue acts for one utterance was just upon the listeners, they were not limited anyhow. Few sample sentences labelled with dialogue acts were provided and available to the listeners on view at every turn. If any listener marked one utterance with more than one dialogue act, he was also required to specify whether the functions occur in that sentence consecutively or concurrently. If the dialogue acts are marked as consecutive in a particular utterance, this utterance is omitted from further research for now. These sentences should be manually reviewed later and either divided into more shorter sentences or omitted completely. Finally, 12 listeners successfully finished the listening test. However, this way we obtained subjective annotations that vary across the listeners. To objectively annotate the expressive recordings, a proper combination of the subjective annotations was needed. Therefore an evaluation of the listening test was made. 5.2 Objective annotation We utilized two ways to deduce the objective annotation. The first way is a simple majority method. Using this easy and intuitive approach, each sentence is assigned a dialogue act that was marked by the majority of the listeners. In case of less then 50% of all listeners marked any dialogue act, the classification of this sentence is considered as untrustworthy. The second approach is based on maximum likelihood method. Maximum likelihood estimation is a statistical method used for fitting a statistical model to data and providing estimates for the model's parameters. Under certain conditions, the maximum likelihood estimator is consistent. The consistency means that having a sufficiently large number of observations (annotations in our case), it is possible to find the value of statistical model parameters with arbitrary precision. The parameter calculation is implemented using the EM algorithm [45]. Knowing the model parameters, it is possible to deduce true observation which is called objective annotation. Precision of the estimate is one of the outputs of this model. Using the precision, any untrustworthy assignment of a sentence with 152 Informática 44 (2020) 147-165 M. Grûber et al. a dialogue act can be eliminated. Comparing these two approaches, 35 out of 7287 classifications were marked as untrustworthy using maximum likelihood method and 571 using simple majority method. The average ratio of listeners who marked the same dialogue act for particular sentence using simple majority approach was 81%, when untrustworthy classifications were excluded. Similar measure for maximum likelihood approach cannot be easily computed as the model parameters and the estimate precision depend on number of iteration in the EM algorithm. We decided to use the objective annotation obtained by maximum likelihood method. It is an asymptotically consistent, asymptotically normal and asymptotically efficient estimate. This approach was also successfully used in other works regarding speech synthesis research, see [46]. Further, we need to confirm that the listeners marked the sentences with dialogue acts consistently and achieved some measure of agreement. Otherwise the subjective annotations could be considered as accidental or the dialogue acts inappropriately defined and thus the acquired objective annotation would be false. For this purpose, we make use of two statistical measures for assessing the reliability of agreement among listeners. One of the measures used for such evaluation is Fleiss' kappa [47, 48]. It is a statistical measure for assessing the reliability of agreement between a fixed number of raters when assigning categorical ratings to a number of items or classifying items. We calculated this measure among all listeners separately for each dialogue act. Computation of overall Fleiss' kappa is impossible because the listeners were allowed to mark more than one dialogue act for each sentence. However, the overall value can be evaluated as the mean of Fleiss' kappas of all dialogue acts. Another measure used here is Cohen's kappa [49, 48]. It is a statistical measure of inter-rater agreement for categorical items and takes into account the agreement occurring by chance as well as Fleiss' kappa. However, Cohen's kappa measures the agreement only between two listeners. We decided to measure the agreement between each listener and the objective annotation obtained by maximum likelihood method. Again, calculation of Cohen's kappa was made for each dialogue act separately. Thus, we can find out whether particular listener was in agreement with the objective annotation for certain dialogue act. Finally, the mean of Cohen's kappas of all dialogue acts can be calculated. Results of agreement measures are presented in Table 2. Values of Fleiss' and Cohen's kappas vary between 0 and 1, the higher value the better agreement. More detailed interpretation of measure of agreement is e.g in [50]. The Fleiss' kappa mean value of 0.5434 means that the measure of inter-listeners agreement is moderate. As it is obvious from Table 2, dialogue acts OTHER and NOT-SPECIFIED should be considered as poorly recognizable. It is understandable when taking into consideration their definitions. After eliminating values of these dialogue acts the mean value of 0.6191 is achieved, which means substantial agreement among the listeners. The Cohen's kappa mean value of 0.6632 means that the measure of agreement between listeners and objective annotation is substantial. Moreover, we can again eliminate dialogue acts OTHER and NOT-SPECIFIED as they were poorly recognizable also according to Cohen's kappa. Thus, mean value of 0.7316 is achieved. However, it is still classified as a substantial agreement. As it is shown in Table 2, agreement among listeners regarding classification of consecutive dialogue act was measured too. The listeners agreed on this label moderately among each other and substantially with the objective annotation. There are also shown ratios of the particular dialogue acts occurrence when maximum likelihood method was used for the objective annotation obtaining. It is obvious that dialogue acts SHOW-INTEREST and ENCOURAGE are the most frequent. 6 Unit selection 6.1 General unit selection approach In general, a unit selection algorithm (for our system described e.g. in [51]) is used to form resulting synthetic speech from speech units that are selected from a list of corresponding candidate units. These candidates are stored in a unit inventory which is built up on the basis of a speech corpus. The unit selection process usually respects two various groups of candidates' features. 6.1.1 Concatenation cost Features in one group are used for a concatenation cost computation. This cost reflects continuity distortion, i.e. how smoothly each candidate for unit ui_1 will join with each candidate for unit ui in the sequence. The lower the cost is, the less the unit boundaries are noticeable. In this group of features, there are usually included mostly ordinal values (acoustic and spectral parameters of the speech signal), e.g. some acoustic coefficients, energy values, F0 values, their differences, etc. The concatenation cost for candidate u is then calculated as follows: Ci En =i wj dj En j (1) where Ci is the concatenation cost of a candidate for unit ui, n is a number of features under consideration, wj is a weight of j-th feature and dj is an enumerated difference between corresponding features of two potentially adjacent candidates for units ui_1 and ui — for unit ui the features from the end of the originally preceding (adjacent in the original corpus) unit are compared with the same features from the end of unit ui i . dialogue act Fleiss's kappa Measure of agreement Cohen's kappa Cohen's kappa SD Measure of agreement Occurr. probab. DIRECTIVE 0.7282 Substantial 0.8457 0.1308 Almost perfect 0.0236 REQUEST 0.5719 Moderate 0.7280 0.1638 Substantial 0.0436 WAIT 0.5304 Moderate 0.7015 0.4190 Substantial 0.0073 APOLOGY 0.6047 Substantial 0.7128 0.2321 Substantial 0.0059 GREETING 0.7835 Substantial 0.8675 0.1287 Almost perfect 0.0137 GOODBYE 0.7408 Substantial 0.7254 0.1365 Substantial 0.0164 THANKS 0.8285 Almost perfect 0.8941 0.1352 Almost perfect 0.0073 SURPRISE 0.2477 Fair 0.4064 0.1518 Moderate 0.0419 SAD-EMPATHY 0.6746 Substantial 0.7663 0.0590 Substantial 0.0344 HAPPY-EMPATHY 0.6525 Substantial 0.7416 0.1637 Substantial 0.0862 SHOW-INTEREST 0.4485 Moderate 0.6315 0.3656 Substantial 0.3488 CONFIRM 0.8444 Almost perfect 0.9148 0.0969 Almost perfect 0.1319 DISCONFIRM 0.4928 Moderate 0.7153 0.1660 Substantial 0.0023 ENCOURAGE 0.3739 Fair 0.5914 0.3670 Moderate 0.2936 NOT-SPECIFIED 0.1495 Slight 0.3295 0.2292 Fair 0.0736 OTHER 0.0220 Slight 0.0391 0.0595 Slight 0.0001 mean 0.5434 Moderate 0.6632 Substantial consecutive DA 0.5138 Moderate 0.6570 0.2443 Substantial 0.0374 Table 2: Fleiss' and Cohen's kappa and occurrence ratio for various dialogue acts and for the "consecutive DAs" label. For Cohen's kappa, mean value and standard deviation is presented, since Cohen kappa is measured between annotation of each listener and the reference annotation. 154 Informática 44 (2020) 147-165 M. Grûber et al. 6.1.2 Target cost Features in the other group are used for a target cost computation. This cost reflects the level of an approximation of a target unit by any of the candidates; in other words, how a candidate from the unit inventory fits a corresponding target unit — a theoretical unit whose features are specified on the basis of the sentence to be synthesized. In this group, there are usually included mostly nominal features, e.g. phonetic context, prosodic context, position in word, position in sentence, position in syllable, etc. The target cost for candidate u, is then calculated as follows: Ti zn =i wj dj En i= (2) j wj where T, is the target cost of a candidate for unit u,, n is a number of features under consideration, wj is a weight of j-th feature and dj is an enumerated difference between j-th feature of a candidate for unit u, and target unit tThe differences of particular features (dj) can be also referred to as penalties. For our ARTIC TTS system, the features that are considered when calculating the target cost are shown in Table 3. feature weight position in a prosodic word 7.0 left phoneme context 3.0 right phoneme context 3.0 prosodeme type 14.0 voicing - at the beginning 8.5 voicing - at the end 8.5 Table 3: Prosodic features along with their weights used for target cost calculation in the ARTIC TTS system. 6.3 Advanced target cost for expressive speech synthesis The target cost calculation presented in equation 3 is very simple and it assumes that penalties for different expressive categories (represented by the dialogue acts) are the same. However, this is not true in most cases. For instance, the difference between SAD-EMPATHY and HAPPY-EMPATHY should be probably greater than a difference between SAD-EMPATHY and NEUTRAL — this means that when synthesizing a sentence in the SAD-EMPATHY manner and there is no available or suitable candidate labelled with this dialogue act, it is probably better to consider a candidate labelled with NEUTRAL dialogue act than considering a candidate labelled as HAPPY-EMPATHY. Therefore, it is necessary to enumerate differences between various dialogue acts and use them for the target cost calculation. The basics of the procedure are described in [52], a bit enhanced version is presented here. 6.3.1 General penalty matrix The differences are assumed to be coded in a penalty matrix M, where coefficients mij represents a difference (a penalty) between a dialogue act i and a dialogue act j. To determine coefficients of the matrix, i.e. the differences in dialogue acts, two aspects should be considered: human perception of the speech and acoustic measures calculated from the signal. Thus, two separate matrices are created and then combined. Coefficients of the first matrix P are calculated on the basis of a listening test that was performed to annotate the dialogue acts in the expressive speech corpus [37] (see Section 6.3.2). The second matrix A is then based on results of an acoustic analysis of expressive speech [53] (see Section 6.3.3). The combined final penalty matrix M represents the overall differences (penalties) between various dialogue acts. 6.2 Basic target cost for expressive speech synthesis When using the expressive speech corpus, the set of the features used for the target cost computation is extended with one more feature. Regarding the aforementioned expressivity description, it is called dialogue act. The penalty dda between a candidate ui of a target unit ti can be in the easiest way calculated as follows: 0 if dat = dac 1 otherwise (3) 6.3.2 Listening test based differences Given the annotations of the expressive recordings presented in Section 5, a penalty matrix P was created. Its coefficients pij were calculated according to the following equation: Pij abs(log( - K (4) where dda is a difference (penalty), dat is a dialogue act of the target unit t, and dac is a dialogue act of the candidate u,. Finally, a weight for this penalty needs to be set since the target cost is calculated as a weighted sum of particular penalties. where num,j represents how many times recordings with dialogue act i (according to the objective annotation as presented in Section 5.2) were labelled with dialogue act j (calculated over all listeners and all recordings), max, represents the maximum value of numij for fixed i and K is a constant defined as K > Kmin where: /1/1 / numij v v Kmin = max(abs(log(-J-)), vi,j maxi (5) where max vi,j is the maximum value for all i,j for which the log is defined. For situations where the log is not de- max Dialogue Act-Based Expressive Speech Synthesis in. Informatica 44 (2020) 147-165 155 fined, the pij was set as pij = K. In our experiments, the K = 5 « 2 x Kmin. The log was used to emphasize differences between calculated ratios and we also assumed that the human perception is logarithmic-based (as suggested e.g. by The Weber-Fechner Law). 6.3.3 Acoustic analysis based differences An extensive acoustic analysis of the expressive corpus was performed in [53]. On the basis of this analysis, a penalty matrix A was created. Its coefficients a, were calculated as the Euclidean distance between numeric vectors representing the dialogue acts i and j in a 12-dimensional space. The components of the vector consist of normalized values of 4 statistical characteristics (mean value, standard deviation, skewness, kurtosis) for 3 acoustic parameters (F0 value, RMS energy and unit duration). The acoustic analysis proved that these features can be used as acoustic distance measures for this purpose. It is likely that there other features not considered in this work which may affect the measure in any way and whose influence should be explored in the future. 6.3.4 Final penalty matrix The final penalty matrix containing numeric differences between various dialogue acts is an appropriate combination of two separate penalty matrices (matrix P based on the annotations and matrix A based on the acoustic analysis). The coefficients m,, of matrix M can be calculated as follows: Wp ■ pij + wa ■ aij Wp + Wa ' (6) where p, and a, represent coefficients from matrices P and A, wp and wa are corresponding weights. After several experiments, values wp = 3 and wa = 1 were used as the weights. Using this setting, the best results were achieved when subjectively comparing resulting synthetic speech. We also believe that the perceptual part should be emphasized. The final penalty matrix is depicted in Table 4. 6.3.5 Weight tuning for dialogue act feature Proper setting of a weight for any of the features is not an easy task. Some techniques for automatic settings have also been developed [54, 55]. However, in our system the settings shown in Table 3 is used as it was proved to be appropriate in applications of our TTS. To set the weight for the dialogue act feature, sets of synthetic utterances were generated for various settings. Using a subjective evaluation (a brief listening test) and considering weights for other features, the final weight was defined as wDA = 12.0. When compared with Table 3, this weight is one of the highest among others. 7 HMM algorithm modification/training Along with the concatenative unit selection method, statistical parametric speech synthesis based on using hidden Markov models (abbreviated as HMM-based speech synthesis) is one of the most researched synthesis methods [28]. Several experiments on using this synthesis method for generating expressive speech are described in [14]. In the HMM approach, statistical models (an extended type of HMMs) are trained from natural speech database. Spectral parameters, fundamental frequency and eventually some excitation parameters are modelled simultaneously by the corresponding multi-stream HMMs. The variability of speech is modelled by using models with large context description, i.e. individual models are defined for various phonetic, prosodic and linguistic contexts, that are described by so-called contextual factors. The contextual factors employed in our experiments are listed in Table 5. For more details, see e.g. [56]. To increase the robustness of the estimated model parameters, models of acoustically similar units are clustered by a decision-tree-based context-clustering algorithm. As a result, similar units share one common model. Within the HMM-based speech synthesis, various methods for modelling the expressivity or speaking styles have been introduced. The simplest one uses so-called style dependent models [59], i.e. an independent set of HMMs is trained for each expression. An obvious drawback of this approach is a large amount of training data required for particular expressions. A better solution are so-called style mixed models [59], where one set of HMMs is trained for all expressions together and particular expressions are distinguished by introducing an additional contextual factor. Then, models of units that are acoustically similar for more expressions are clustered. Independent models are trained only when there is a significant difference between particular expressions. Another option of modelling expressions are methods based on model adaptation [60, 61]; they are usually preferred because they allow to control the speech style or expression more precisely and require less training data. However, the style mixed model utilizing an additional contextual factor for dialogue act was used in this work. 8 Evaluation & results This section deals with an evaluation of the procedure described in this paper to verify that it fulfils the goals which were specified at the beginning. Especially, it should be verified that listeners perceive the synthetic speech produced by the developed system as expressive (Section 8.2.1) and also how the quality of synthetic speech changed in comparison with the baseline system (Section 8.2.2). Since the proposed TTS system is focused on a usage in a specified dialogue system, the suitability of the mu = ij 156 Informática 44 (2020) 147-165 M. Grûber et al. l : i i p a i NEUTRAL WAIT THANKS SURPRISE SHOW-INTEREST SAD-EMPATHY REQUEST OTHER NOT-SPECIFIED HAPPY-EMPATHY GREETING GOODBYE ENCOURAGE DISCONFIRM DIRECTIVE CONFIRM APOLOGY 0.78 0.47 0.83 0.86 0.88 0.28 0.84 0.95 0.39 0.44 0.90 0.30 0.83 0.34 0.90 0.96 0.00 APOLOGY 0.72 0.58 0.54 0.29 0.53 0.28 0.93 1.00 0.26 0.25 0.63 0.53 0.50 0.33 0.56 0.00 0.58 CONFIRM 0.41 0.24 0.92 0.47 0.39 0.37 0.30 0.29 0.25 0.58 0.86 0.34 0.48 0.84 0.00 0.71 0.39 DIRECTIVE 0.54 0.89 0.90 0.40 0.62 0.41 0.88 0.97 0.35 0.41 0.86 0.87 0.64 0.00 0.58 0.58 0.40 DISCONFIRM 0.55 0.32 0.53 0.05 0.15 0.26 0.17 0.94 0.12 0.24 0.58 0.47 0.00 0.43 0.26 0.71 0.83 ENCOURAGE 0.61 0.93 0.46 0.82 0.85 0.32 0.82 0.36 0.19 0.39 0.90 0.00 0.83 0.87 0.86 0.95 0.25 GOODBYE 0.42 0.88 0.93 0.87 0.87 0.90 0.35 0.97 0.26 0.89 0.00 0.59 0.60 0.86 0.46 0.92 0.90 GREETING 0.58 0.55 0.88 0.15 0.43 0.33 0.45 0.96 0.15 0.00 0.52 0.19 0.35 0.28 0.50 0.40 0.48 HAPPY-EMPATHY 0.46 0.56 0.91 0.14 0.27 0.28 0.31 0.30 0.00 0.21 0.31 0.17 0.31 0.23 0.36 0.48 0.36 NOT-SPECIFIED 0. 8 0. 8 0. 9 0. 3 0. 5 0. 5 0. 4 0. 0 0. 3 0. 4 0. 6 0. 5 0. 4 0. 4 0. 6 0. 7 0. 5 OTHER CO to -a 00 o o o Cn 00 o CO o Ol to o 0.59 0.29 0.87 0.35 0.32 0.49 0.00 0.94 0.20 0.54 0.89 0.82 0.27 0.44 0.33 0.93 0.84 REQUEST 0.67 0.57 0.89 0.24 0.41 0.00 0.51 0.99 0.22 0.29 0.90 0.25 0.39 0.28 0.49 0.41 0.17 SAD-EMPATHY 0.45 0.39 0.90 0.06 0.00 0.25 0.28 0.94 0.13 0.27 0.53 0.42 0.14 0.42 0.38 0.50 0.45 SHOW-INTEREST 0.51 0.90 0.89 0.00 0.34 0.33 0.80 0.95 0.18 0.28 0.87 0.82 0.27 0.42 0.81 0.53 0.48 SURPRISE 0.86 0.88 0.00 0.59 0.90 0.89 0.87 0.92 0.32 0.46 0.93 0.25 0.75 0.90 0.92 0.74 0.83 THANKS 0. 6 0. 0 0. 8 0. 5 0. 5 0. 5 0. 5 0. 8 0. 3 0. 5 0. 5 0. 5 0. 5 0. 4 0. 5 0. 7 0. 4 WAIT CO o 00 i—1 -a CO 00 CO 00 Ol to Ol CO o -a 0.00 0.63 0.86 0.51 0.45 0.67 0.59 0.84 0.46 0.58 0.42 0.61 0.55 0.54 0.41 0.72 0.78 NEUTRAL Dialogue Act-Based Expressive Speech Synthesis in. Informatica 44 (2020) 147-165 157 Contextual factor Possible values Left and right phonetic context Czech phonetic alphabet [57] Phone position in prosodic word (forward and backward) 1,2,3,4,5 ... Prosodic word position in clause (forward and backward) Prosodeme terminating satisfactorily, terminating unsatisfactorily, non-terminating, null Dialogue act see Section 4 Table 5: A list of contextual factors and their values. Prosodic words, clauses and prosodemes are thoroughly described in [58]. expressive speech synthesis in such a dialogue system is also evaluated (Section 8.4). During the design of the expressive TTS system, it turned out that some of the dialogue acts (further referred to as DAs) appear much more frequently than others, some of them are very rare. Thus, only the most frequent DAs were used to evaluate the system and they were divided into two separate groups: Expressive dialogue acts: - SHOW-INTEREST - relative frequency 34.9 %; - ENCOURAGE - relative frequency 29.4 %; - CONFIRM - relative frequency 13.2 %; - HAPPY-EMPATHY - relative frequency 8.6 %; - SAD-EMPATHY - it was added because it is considered to be an opposite to HAPPY-EMPATHY dialogue act; relative frequency 3.4%; Neutral dialogue acts: - NOT-SPECIFIED - besides it is one of the most frequently occurring DAs, it should also represents the neutral synthetic speech; relative frequency 7.4 %; - NEUTRAL - this is not a DA per se, it is defined here to represent the neutral speech produced by the current baseline TTS system for the purposes of the evaluation. All the listening tests described further were performed using the same system as it was used for the expressive corpus annotation (described in Section 5.1). Of course, the questions and options were different within this evaluation but the core of the system is the same. The majority of listening tests participants were experts in speech or language processing, some of them were university students. Texts of synthesized utterances were not a part of the corpora, new texts were created for this purpose. The content of the texts corresponds to the dialogue act to be synthesized (for expressive synthesis), or it is neutral (for neutral synthesis). 8.1 Expressivity perception in natural speech Before assessing the synthetic expressive speech, a listening test focused on expressivity perception in natural speech was performed. This gives us a brief overview of how the listeners are able to perceive the expressivity and later a comparison between expressivity perception in natural and synthetic speech can be presented. All the listeners were assessing randomly selected utterances form the natural corpora (neutral and expressive) and their task was to mark if they perceive any kind of expressivity or not or if they are not able to make a decision. 14 listeners participated in this test, each listener was presented with 34 utterances - 4 for each expressive dialogue act being evaluated and 7 for each dialogue act considered as neutral (i.e. NOT-SPECIFIED and NEUTRAL). The results are depicted in Figure 6 and also shown in Table 6. 100% dialogue act Figure 6: Expressivity perception in natural speech. dialogue act expressivity perception ratio cannot decide CONFIRM 38% 3% ENCOURAGE 61% 7% HAPPY-EMPATHY 77% 4% SAD-EMPATHY 73% 6% SHOW-INTEREST 18% 11% mean 53 % 6 % NOT-SPECIFIED 42% 13% NEUTRAL 36% 3% mean 39 % 7 % Table 6: Expressivity perception in natural speech. The results are quite surprising, especially for neutral speech. In 39 % of neutral natural utterances (in average, including NOT-SPECIFIED), the listeners perceived an expressivity. It seems that some kind of expressivity is included even in the neutral corpus and the listeners are very 158 Informática 44 (2020) 147-165 M. Grûber et al. sensitive to that, and they are able to perceive it. This fact can be related to the content of speech since as it was described in [62], the content as such might also influence the listeners' expressivity perception. The results for the expressive DAs depends on a particular DA. For instance, utterances marked as HAPPY-EMPATHY and SAD-EMPATHY are mostly recognized as expressive whereas utterances marked as SHOW-INTEREST are not. These results give us a baseline for the evaluation of expressive synthetic speech. Since for some DAs the listeners don't perceive expressivity even in natural speech, it's unlikely that they will perceive it in synthetic speech. 8.2 Evaluation of the unit selection based expressive speech synthesis During the evaluation of expressive synthetic speech, two main factors were investigated - expressivity perception and speech quality. It's supposed that the quality of synthetic speech will be affected by the expressivity integration as the expressive speech is much more dynamic and thus more artificial artifacts may occur. This section deals with the evaluation of expressive synthetic speech produced by the unit selection TTS system. The evaluation of HMM-based TTS system is presented in section 8.3. In the listening tests regarding expressive synthetic speech evaluation, 13 listeners assessed 30 utterances -4 for each DA in question and 2 for natural neutral speech (so that a comparison of speech quality can be performed). 8.2.1 Expressivity perception in synthetic speech The results for expressivity perception in synthetic expressive speech are depicted in Figure 7 and presented in Table 7. dialogue act expressivity perception ratio cannot decide CONFIRM 69% 4% ENCOURAGE 42% 8% HAPPY-EMPATHY 50% 10% SAD-EMPATHY 63% 4% SHOW-INTEREST 46% 4% mean 54 % 6 % NOT-SPECIFIED 10% 0% NEUTRAL 15% 0% mean 13 % 0 % natural speech (neutral) 42% 4% Table 7: Expressivity perception in synthetic speech (unit selection). high ratio (42 %). However, it is consistent with the previous results presented in Table 6 (39 %). For synthetic speech generated as NOT-SPECIFIED and for baseline neutral synthetic speech (marked as NEUTRAL), almost no expressivity was perceived. On the other hand, for expressive DAs, the expressivity perception ratio was quite high (mean value 54 %) and it was even slightly higher than for expressive natural speech (mean value 53 %, see Table 6). To verify that the achieved results are not random, a statistical measure for listeners agreement (the Fleiss' kappa was used here) was calculated. Its value varies in the range < -1,1 > and a positive value indicates an agreement above the chance level. In our experiment, the Fleiss' kappa was calculated as kf =0.37 which means a moderate agreement. In addition, other measures might be used to verify the results; for instance precision, recall, F1 and accuracy measures which are mostly used for evaluation of classifiers in classification tasks. However, the presented listening test can be also viewed as a classification task where the listeners as classifiers classify into two distinct classes: perceive and do not perceive expressivity (the cannot decide answers were not considered in this verification). The measure are determined as follows: F1 P = ÎP Pp ' 2 * P * R P + R ' R = ^ ap A = ^p + ^n ap + an Figure 7: Expressivity perception in synthetic speech (unit selection). Again, a surprising result can be observed for natural neutral speech as an expressivity was perceived at a quite where P is precision, the ability of a listener not to perceive a neutral sentence as expressive; R is recall (also sensitivity), the ability of a listener not to perceive expressive sentences as neutral; A is accuracy, the ability of a listener to perceive expressivity in expressive sentences and not to perceive it in neutral sentences; F1 is the harmonic mean Dialogue Act-Based Expressive Speech Synthesis in. Informatica 44 (2020) 147-165 159 of precision and recall; tp means "true positives" (i.e. the number of expressive sentences correctly perceived as expressive); tn means "true negatives" (i.e. the number of neutral sentences correctly perceived as neutral); pp stands for "predicted positives" (i.e. the number of all sentences perceived as expressive); ap stands for "actual positives" (i.e. the number of all actual expressive sentences); an means "actual negatives" (i.e. the number of all actual neutral sentences). The calculated values of these measures are presented in Table 8 altogether with values that would be achieved in case the expressivity perception is assessed completely at random. measure real listeners random assessment precision 0.92 0.72 recall 0.58 0.50 F1 measure 0.71 0.59 accuracy 0.66 0.50 Table 8: Statistical measures for expressivity perception listening test and comparison with completely random assessment. As the verification indicates, the expressivity perception ratio in synthetic speech is not a result of a random process. It's necessary to note that there are two main facts which affect the expressivity perception. The first one is the TTS system and the synthetic speech whose evaluation is the main goal. The second fact is the listeners - each of them might perceive (assess) various intensity of various expressivity categories differently. However, the main task here is not to evaluate the listeners and if they are or they are not able to perceive an expressivity (which is basically impossible). The listeners are just believed to and the only thing that can be done is to perform some kind of agreement measure calculation. In synthetic expressive speech generated with a particular DA in mind, the relative ratio between units originally coming from utterances labelled with this DA and units coming from other utterances can be measured. The ratio might vary depending on setting of the weight for the dialogue act feature. The calculated ratios for the current weight settings (as designed in Section 6.3.5) are shown in Figure 8. It's worth noting that the measure is very low for NOT-SPECIFIED DA. However, after further investigation, it turned out that when synthesizing utterances for this DA, units coming from the neutral corpus (NEUTRAL) were mostly selected. It supports the assumption that the NOT-SPECIFIED DA represents neutral speech (although in the final penalty matrix M the distance between NOT-SPECIFIED and NEUTRAL was calculated as 0.46 which is quite high). It also seems that there is no strong relation between this measure and the expressivity perception results presented in Table 7. Figure 8: Relative ratio of units coming from utterances labelled with the DA which was intended to be synthesized. 8.2.2 Quality evaluation To investigate whether the synthetic speech quality deteriorated by adding the expressivity, a MOS test evaluation was performed. In the MOS test, the listeners assess the speech quality using a 5-point scale where, in theory, the natural speech should be evaluated as 5 (100 %) and a very unnatural speech as 1 (0 %). The test was running along with the expressivity perception test, i.e. the test conditions, test utterances and the listeners were the same as for the evaluation that is presented in Section 8.2.1. The results of this MOS test are shown in Figure 9 and also in Table 9 altogether with a relative comparison with the natural speech (whose result is evaluated as 100 %). Figure 9: Evaluation of speech quality using a MOS test (unit selection). The results suggest that the quality of expressive synthetic speech is worse than the quality of neutral synthetic speech by 0.49 of the MOS score (13 %) in average. It is almost the same difference as between natural speech and neutral synthetic speech (0.65 of the MOS score). This deterioration is probably caused by greater variability of the acoustic signal of expressive speech. Thus, the artifacts might occur more often than in neutral synthetic speech. An auxiliary measure called smooth joints can be also calculated. A smooth joint is a concatenation point of two 160 Informática 44 (2020) 147-165 M. Grûber et al. dialogue act MOS score comparison with natural speech mean std CONFIRM 3.87 1.11 79% ENCOURAGE 3.48 0.97 68% HAPPY-EMPATHY 3.10 1.00 58% SAD-EMPATHY 3.87 0.94 79% SHOW-INTEREST 3.25 0.92 62% mean 3.51 0.99 69 % NOT-SPECIFIED 3.92 0.78 81% NEUTRAL 4.08 0.78 83% mean 4.00 0.78 82 % natural speech 4.65 0.48 100% HMM approach is also briefly presented in Section 7. The aim is to evaluate the capability of the HMM-based TTS system to produce expressive speech (shown in Table 10) and to evaluate its quality (Table 11). The presented results are summarized and various DAs are not differentiated. There were 12 listeners participating in these listening tests. dialogue act expressivity perception ratio cannot decide expressive 15% 5% NOT-SPECIFIED 8% 3% Table 9: Evaluation of speech quality using a MOS test (unit selection). speech units that were originally adjacent in the speech corpus and thus their concatenation is natural. The smooth joints measure indicates the relative ratio of such joints with respect to the number of all concatenation points. The calculated values are presented in Figure 10. It is assumed that the less smooth joints in synthetic speech, the more artifacts can occur, causing the synthetic speech quality to be worse. Figure 10: Relative ratio of smooth joints. It is obvious that the relative ratio of smooth joints is almost the same regardless of the DA (mean 79 %) and also in comparison with neutral synthetic speech (mean 82 %). Also, this measure seems to be unrelated to the expressivity perception measure or the MOS score. 8.3 Evaluation of the HMM-based expressive speech synthesis Even though this work deals mostly with the unit selection speech synthesis, the results of an experiment with the HMM-based expressive speech synthesis are to be briefly discussed in this section. The used method is based on the HTS system [63] and adapted to the Czech language [56]. The experiment is described in more details in [62] and the Table 10: Expressivity perception in synthetic speech (HMM). dialogue act MOS score comparison with mean natural speech expressive DAs + NOT-SPECIFIED 2.71 50% natural speech 4.44 100 % Table 11: Evaluation of speech quality using a MOS test (HMM). The expressivity perception ratio in synthetic speech produced by the HMM-based expressive TTS system is at a very low level (15 %) in comparison with the unit selection TTS system (54 %). Also the quality of synthetic speech is much worse, 2.7 of the MOS score (50 % of natural speech) for the HMM-based system and 3.5 (69 %) for the unit selection system. Generally, the HMM-based speech synthesis for the Czech language is not yet at such a high level as the unit selection approach is. Moreover, by adding expressive speech into this process, the trained HMM models may in fact mix natural and expressive acoustic signal depending on how the decision trees were created. Thus, in such synthetic speech of a lower quality, it is probably hard to identify any kind of expressivity. 8.4 Evaluation of the expressivity in dialogues Since the unit selection expressive speech synthesis is going to be used in a specific dialogue system (conversations between seniors and a computer; see Section 1 and 2), it is necessary to evaluate it also with respect to this purpose. A preference listening test was used to perform this kind of evaluation. The test stimuli were prepared as follows: - 6 appropriate parts of the natural dialogues (see Section 2), each approximately 1 minute in length, were randomly selected. The appropriateness were determined on the basis of sufficiency of the avatar's interactions within the dialogues. These parts will be further referred to as minidialogues. Dialogue Act-Based Expressive Speech Synthesis in. Informatica 44 (2020) 147-165 161 - The acoustic signal of each minidialogue was split-ted into parts where the person is speaking and parts where the avatar responses are expressed by the neutral speech synthesis. - The text contents of the avatar's responses were slightly modified so that the newly generated responses are really to be synthesized and not only played back. The sense of the utterances was of course kept the same so that the dialogue flow is not disrupted. - The new texts (avatar's responses) were synthesized using both the baseline neutral TTS system and the newly developed expressive TTS system - before the expressive speech synthesis, the texts were labelled by presumably appropriate DAs. - In some parts of the minidialogues where the person is originally speaking, little modifications were done so that the length of the person's speech was shortened -for instance, the parts where the person was speaking for a long time or where a long silence was detected were removed. Again, the natural dialogue flow was not disrupted. - the parts of the minidialogues were joint together so that two versions of each minidialogue were created -the first one with the avatar's responses with neutral synthetic speech and the second one with the avatar's responses with expressive synthetic speech. Each of the 6 minidialoges contains 4 avatar's responses in average expressing various DAs, mostly SHOW-INTEREST or ENCOURAGE. However, each evaluated DA was included at least once in the responses. The minidialogues were then presented to the listeners within a listening test, both minidialogue's variants in a single test query. The task for the listeners was to decide which variant is more natural, more pleasant and which one would they prefer when being in place of the human minidialogue participant. The results of this evaluation are presented in Table 12; there were 11 listeners participating in this listening test. synthesis variant preference neutral 8% expressive 83% cannot decide 9% Table 12: Evaluation of neutral vs. expressive speech synthesis in dialogues. It's obvious that the listeners preferred the expressive speech synthesis to the neutral one (83 % preference ratio). This is one of the most important results indicating that the developed system increases the user experience with the TTS system for this limited domain task. To verify that the avatar's responses were indeed synthesized and not only played back, the measure of smooth joints can be used. The mean value of this measure for the expressive avatar's responses is 86 % which is slightly higher than it was measured in Figure 10 of Section 8.2.1 (mean 82 % for neutral speech and 79 % for expressive speech). However, it still means that the responses were really synthesized. 9 Conclusion It is necessary to incorporate some kind of expressivity into synthetic speech as it improves the user experience with systems using speech synthesis technology. Expressive speech sounds more naturally in dialogues between humans and computers. There are several ways to make the synthetic speech sound expressively. In this work, expressivity described by dialogue acts was employed and the algorithms of the TTS system were modified to use that information when producing synthetic speech. The results presented in Section 8 suggest that in speech produced by the expressive TTS system the listeners perceived some kind of expressivity. More importantly, it was also confirmed that in the dialogues within the discussed limited domain, expressive speech is more suitable and preferred than the pure neutral speech produced by the baseline TTS system even though its quality is little bit worse. Although the development of the expressive TTS system was done within a limited domain of conversations about personal photos between humans and a computer, the whole procedure - data collecting, data annotation, expressive corpus preparation and recording, expressivity description and TTS system modification - can be used within any other limited domain if appropriate expressivity definition is used. Thus, an expressivity can be incorporated to any other dialogue system with a similar structure. Acknowledgement This research was supported by the Czech Science Foundation (GA CR), project No. GA19-19324S and by the Ministry of Education, Youth and Sports of the Czech Republic project No. L01506. References [1] J. D. Williams, S. Young, Partially observable Markov decision processes for spoken dialog systems, Computer Speech and Language 21 (2) (2007) 393-422. https://doi.org/10.1016/j.csl.2006. 06.008 [2] O. Lemon, K. Georgila, J. Henderson, M. Stuttle, An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling 162 Informática 44 (2020) 147-165 M. Grûber et al. in the TALK in-car system, in: Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations, EACL '06, Association for Computational Linguistics, Stroudsburg, PA, USA, 2006, pp. 119-122. https://doi.org/10.3115/160 8 974. 1608986 [3] X. Wu, M. Xu, W. Wu, Preparing for evaluation of a flight spoken dialogue system, in: Proceedings of ISCSLP, 2002, paper 50. [4] J. Svec, L. Smfdl, Prototype of Czech spoken dialog system with mixed initiative for railway information service, in: P. Sojka, A. Horik, I. Kopecek, K. Pala (Eds.), Text, Speech and Dialogue, Vol. 6231 of Lecture Notes in Computer Science, Springer, Berlin-Heidelberg, Germany, 2010, pp. 568-575. https://doi.org/10.10 07/ 978-3-642-15760-8\_72 [5] A. Mestrovic, L. Bernic, M. Pobar, S. Martincic-Ipsic, I. Ipsic, Overview of a croatian weather domain spoken dialog system prototype, in: 32nd International Conference on Information Technology Interfaces (ITI), Cavtat, Dubrovnik, 2010, pp. 103-108. [6] A. W. Black, Unit selection and emotional speech, in: Proceedings of Eurospeech, Geneva, Switzerland, 2003, pp. 1649-1652. [7] M. Bulut, S. S. Narayanan, A. K. Syrdal, Expressive speech synthesis using a concatenative synthesiser, in: Proceedings of the 7th International Conference on Spoken Language Processing - ICSLP, Denver, CO, USA, 2002, pp. 1265-1268. [8] W. Hamza, R. Bakis, E. M. Eide, M. A. Picheny, J. F. Pitrelli, The IBM expressive speech synthesis system, in: Proceedings of the 8th International Conference on Spoken Language Processing - ISCLP, Jeju, Korea, 2004, pp. 2577-2580. https://doi.org/10.1109/tasl.2006. 876123 [9] I. Steiner, M. Schroder, M. Charfuelan, A. Klepp, Symbolic vs. acoustics-based style control for expressive unit selection, in: Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, Kyoto, Japan, 2010, pp. 114-119. [10] J. Lorenzo-Trueba, G. E. Henter, S. Takaki, J. Yam- agishi, Y. Morino, Y. Ochiai, Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis, Speech Communication 99 (2018) 135-143. https://doi.org/10.1016/j.specom. 2018.03.002 [11] S. An, Z. Ling, L. Dai, Emotional statistical parametric speech synthesis using LSTM-RNNs, in: 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE, 2017, pp. 1613-1616. https://doi.org/10.110 9/apsipa. 2017.8282282 [12] H. Li, Y. Kang, Z. Wang, EMPHASIS: An emotional phoneme-based acoustic model for speech synthesis system, in: Proceedings of Interspeech, 2018. https://doi.org/10.214 37/ interspeech.2018-1511 [13] S. Krstulovic, A. Hunecke, M. Schroder, An HMM-based speech synthesis system applied to German and its adaptation to a limited set of expressive football announcements, in: Proceedings of Interspeech, Antwerp, Belgium, 2007, pp. 1897-1900. [14] B. Picart, R. Brognaux, , T. Drugman, HMM-based speech synthesis of live sports commentaries: Integration of a two-layer prosody annotation, in: 8th ISCA Speech Synthesis Workshop, Barcelona, Spain, 2013. [15] H. Yang, H. Meng, L. Cai, Modeling the acoustic correlates of dialog act for expressive Chinese TTS synthesis, IET Conference Publications 2008 (CP544) (2008) 49-53. https://doi.org/10.104 9/cp:20080758 [16] P. Ircing, J. Romportl, Z. Loose, Audiovisual interface for Czech spoken dialogue system, in: IEEE 10th International Conference on Signal Processing Proceedings, Institute of Electrical and Electronics Engineers, Inc., Beijing, China, 2010, pp. 526-529. https://doi.org/10.110 9/icosp.2 010. 5656088 [17] J. F. Kelley, An iterative design methodology for user-friendly natural language office information applications, ACM Transactions on Information Systems 2 (1) (1984) 26-41. https://doi.org/10.114 5/357417. 357420 [18] S. Whittaker, M. Walker, J. Moore, Fish or fowl: A Wizard of Oz evaluation of dialogue strategies in the restaurant domain., in: Language Resources and Evaluation Conference, Gran Canaria, Spain, 2002. [19] M. Hajdinjak, F. Mihelic, The Wizard of Oz system for weather information retrieval, in: V. Matousek, P. Mautner (Eds.), Text, Speech and Dialogue, proceedings of the 6th International Conference TSD, Vol. 2807 of Lecture Notes in Computer Science, Springer, Berlin-Heidelberg, Germany, 2003, pp. 400-405. https://doi.org/10.10 07/ 978-3-54 0-39398-6\_57 Dialogue Act-Based Expressive Speech Synthesis in. Informatica 44 (2020) 147-165 163 [20] J. A. Russell, A circumplex model of affect, Journal of Personality and Social Psychology 39 (1980) 1161-1178. [21] A. Mehrabian, Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament, Current Psychology 14 (1996) 261-292. https://doi.org/10.1007/BF02686918 [22] R. R. Cornelius, The science of emotion: Research and tradition in the psychology of emotions, Prentice-Hall, Englewood Cliffs, NJ, USA, 1996. [23] A. K. Syrdal, A. Conkie, Y.-J. Kim, M. Beutnagel, Speech acts and dialog TTS, in: Proceedings of the 7th ISCA Speech Synthesis Workshop - SSW7, Kyoto, Japan, 2010, pp. 179-183. [24] E. Zovato, A. Pacchiotti, S. Quazza, S. Sandri, Towards emotional speech synthesis: A rule based approach, in: Proceedings of the 5th ISCA Speech Synthesis Workshop - SSW5, Pittsburgh, PA, USA, 2004, pp. 219-220. [25] J. M. Montero, J. Guti6rrez-Ariola, S. Palazuelos, E. Enriquez, S. Aguilera, J. M. Pardo, Emotional speech synthesis: From speech database to TTS, in: Proceedings of the 5th International Conference on Spoken Language Processing - ICSLP, Vol. 3, Sydney, Australia, 1998, pp. 923-926. [26] J. F. Pitrelli, R. Bakis, E. M. Eide, R. Fernandez, W. Hamza, M. A. Picheny, The IBM expressive text-to-speech synthesis system for American English, IEEE Transactions on Audio, Speech, and Language Processing 14 (4) (2006) 1099-1108. https://doi.org/10.1109/tasl.2006. 876123 [27] A. J. Hunt, A. W. Black, Unit selection in a con-catenative speech synthesis system using a large speech database, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, 1996, pp. 373-376. https://doi.org/10.1109/ICASSP. 1996.541110 [28] H. Zen, K. Tokuda, A. W. Black, Statistical parametric speech synthesis, Speech Communication 51 (2009) 1039-1064. https://doi.org/10.1016/j.specom. 2009.04.004 [29] H. Zen, A. Senior, M. Schuster, Statistical parametric speech synthesis using deep neural networks, in: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2013, pp. 7962-7966. https://doi.org/10.1109/ICASSP. 2013.6639215 [30] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu, Wavenet: A generative model for raw audio, in: Arxiv, 2016. arXiv:1609.03499v2. [31] Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Ben-gio, et al., Tacotron: Towards end-to-end speech synthesis, arXiv preprint arXiv:1703.10135 https://doi.org/10.214 37/ interspeech.2017-1452 [32] A. Kain, M. W. Macon, Spectral voice conversion for text-to-speech synthesis, in: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, 1998, pp. 285-288. https://doi.org/10.1109/icassp. 1998.674423 [33] H. Kawanami, Y. Iwami, T. Toda, H. Saruwatari, K. Shikano, GMM-based voice conversion applied to emotional speech synthesis, IEEE Tranactions on Speech and Audio Processing 7 (1999) 2401-2404. [34] J. Parker, Y. Stylianou, R. Cipolla, Adaptation of an expressive single speaker deep neural network speech synthesis system, in: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5309-5313. https://doi.org/10.110 9/ICASSP. 2018.8461888 [35] J. Matousek, D. Tihelka, J. Romportl, Current state of Czech text-to-speech system ARTIC, in: Text, Speech and Dialogue, proceedings of the 9th International Conference TSD, Vol. 4188 of Lecture Notes in Computer Science, Springer, Berlin-Heidelberg, Germany, 2006, pp. 439-446. https://doi.org/10.1007/1184 64 06_55 [36] D. Tihelka, J. Kala, J. Matousek, Enhancements of Viterbi search for fast unit selection synthesis, in: Proceedings of Interspeech, Makuhari, Japan, 2010, pp. 174-177. [37] M. Gruber, M. Legit, P. Ircing, J. Romportl, J. Psutka, Czech Senior COMPANION: Wizard of Oz data collection and expressive speech corpus recording and annotation, in: Z. Vetulani (Ed.), Human Language Technology. Challenges for Computer Science and Linguistics, Vol. 6562 of Lecture Notes in Computer Science, Springer, Berlin-Heidelberg, Germany, 2011, pp. 280-290. https://doi.org/10.10 07/ 97 8-3-64 2-2 0 0 95-3\_2 6 [38] R. Cowie, Describing the emotional states expressed in speech, in: ISCA Workshop on Speech and Emotion, Newcastle, uk, 2000, pp. 11-18. 164 Informática 44 (2020) 147-165 M. Grûber et al. [39] A. K. Syrdal, Y.-J. Kim, Dialog speech acts and prosody: Considerations for TTS, in: Proceedings of Speech Prosody, Campinas, Brazil, 2008, pp. 661665. [40] M. G. Core, J. F. Allen, Coding dialogs with the DAMSL annotation scheme, in: Working Notes of the AAAI Fall Symposium on Communicative Action in Humans and Machines, Cambridge, MA, USA, 1997, pp. 28-35. [41] J. Allen, M. Core, Draft of DAMSL: Dialog act markup in several layers, WWW page, [online] (1997). [42] D. Jurafsky, L. Shrilberg, D. Biasca, Switchboard-DAMSL labeling project coder's manual, Tech. Rep. 97-02, University of Colorado, Institute of Cognitive Science, Boulder, Colorado, USA (1997). [43] S. Jekat, A. Klein, E. Maier, I. Maleck, M. Mast, J. J. Quantz, Dialogue acts in VERBMOBIL, Tech. rep., German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany (1995). [44] J. Alexandersson, B. Buschbeck-Wolf, T. Fuji-nami, M. Kipp, S. Koch, E. Maier, N. Reithinger, B. Schmitz, M. Siegel, Dialogue acts in VERBMOBIL-2 - second edition, Tech. rep., German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany (1998). [45] A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. Ser. B 39 (1) (1977) 138, with discussion. [46] J. Romportl, Prosodic phrases and semantic accents in speech corpus for Czech TTS synthesis, in: Text, Speech and Dialogue, proceedings of the 11th International Conference TSD, Vol. 5246 of Lecture Notes in Artificial Intelligence, Springer, Berlin-Heidelberg, Germany, 2008, pp. 493-500. https://doi.org/10.10 07/ 97 8-3-54 0-87 391-4_63 [47] J. L. Fleiss, Measuring nominal scale agreement among many raters, Psychological Bulletin 76 (5) (1971) 378-382. https://doi.org/10.1037/h0031619 [48] J. L. Fleiss, J. Cohen, The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability, Educational and Psychological Measurement 33 (3) (1973) 613-619. https://doi.org/10.1177/ 001316447303300309 [49] J. A. Cohen, A coefficient of agreement for nominal scales, Educational and Psychological Measurement 20 (1) (1960) 37-46. https://doi.org/10.1177/ 001316446002000104 [50] J. R. Landis, G. G. Koch, The measurement of observer agreement for categorical data., Biometrics 33 (1) (1977) 159-174. https://doi.org/10.2307/2529310 [51] D. Tihelka, J. Matousek, Unit selection and its relation to symbolic prosody: a new approach, INTERSPEECH 2006 - ICSLP, proceedings of 9th International Conference on Spoken Language Procesing 1 (2006) 2042-2045. [52] M. GrUber, Enumerating differences between various communicative functions for purposes of Czech expressive speech synthesis in limited domain, in: Proceedings of Interspeech, Portland, Oregon, USA, 2012, pp. 650-653. [53] M. Gruber, Acoustic analysis of Czech expressive recordings from a single speaker in terms of various communicative functions, in: Proceedings of the 11th IEEE International Symposium on Signal Processing and Information Technology, IEEE, 345 E 47TH ST, NEW YORK, NY 10017, USA, 2011, pp. 267-272. https://doi.org/10.110 9/isspit. 2011.6151576 [54] L. Latacz, W. Mattheyses, W. Verhelst, Joint target and join cost weight training for unit selection synthesis, in: Proceedings of Interspeech, ISCA, Florence, Italy, 2011, pp. 321-324. [55] X. L. F. Alias, Evolutionary weight tuning for unit selection based on diphone pairs, in: Proceedings of Eurospeech, Vol. 2, Geneve, Switzerland, 2003, pp. 1333-1336. [56] Z. Hanzlicek, Czech HMM-based speech synthesis, in: Text, Speech and Dialogue, proceedings of the 13th International Conference TSD, Vol. 6231 of Lecture Notes in Computer Science, Springer, Berlin-Heidelberg, Germany, 2010, pp. 291-298. https://doi.org/10.10 07/ 97 8-3-64 2-157 60-8_37 [57] J. Nouza, J. Psutka, J. Uhlir, Phonetic alphabet for speech recognition of czech, Radioengineering 6 (4) (1997) 16-20. [58] J. Romportl, J. Matousek, D. Tihelka, Advanced prosody modelling, in: Text, Speech and Dialogue, proceedings of the 7th International Conference TSD, Vol. 3206 of Lecture Notes in Artificial Intelligence, Springer, Berlin-Heidelberg, Germany, 2004, pp. 441-447. https://doi.org/10.10 07/ 97 8-3-54 0-3012 0-2_56 Dialogue Act-Based Expressive Speech Synthesis in. Informatica 44 (2020) 147-165 165 [59] J. Yamagishi, K. Onishi, T. Masuko, T. Kobayashi, Modeling of various speaking styles and emotions for HMM-based speech synthesis, in: Proceedings of Eurospeech, Geneva, Switzerland, 2003, pp. 24612464. [60] K. Miyanaga, T. Masuko, T. Kobayashi, A style control technique for HMM-based speech synthesis, in: Proceedings of Interspeech, 2004, pp. 1437-1440. [61] T. Nose, Y. Kato, T. Kobayashi, A speaker adaptation technique for MRHSMM-based style control of synthetic speech, in: Proceedings of ICASSP, 2007, pp. 833-836. https://doi.org/10.1109/icassp. 2007.367042 [62] M. Gruber, Z. Hanzlicek, Czech expressive speech synthesis in limited domain: Comparison of unit selection and HMM-based approaches, in: Text, Speech and Dialogue, Vol. 7499 of Lecture Notes in Computer Science, Springer, Berlin-Heidelberg, Germany, 2012, pp. 656-664. https://doi.org/10.10 07/ 97 8-3-64 2-32 7 90-2_8 0 [63] K. Tokuda, H. Zen, J. Yamagishi, T. Masuko, S. Sako, A. W. Black, The HMM-based speech synthesis system (HTS), [online]. 166 Informática 44 (2020) 147-165 M. Grûber et al. https://doi.org/10.31449/inf.v44i2.3083 Informatica 44 (2020) 167-138 127 Knowledge Redundancy Approach to Reduce Size in Association Rules Julio César Díaz Vera University of Informatics Sciences, Havana, Cuba E-mail: jcdiaz@uci.cu Guillermo Manuel Negrín Ortiz University of Informatics Sciences, Havana, Cuba E-mail: gmnegrin@uci.cu Carlos Molina University of Jaen, Jaen, Spain E-mail: carlosmo@ujaen.es Maria Amparo Vila University of Granada, Granada, Spain E-mail:vila@decsai.ugr.es Keywords: association rule mining, redundant rules, knowledge guided post-processing Received: June 19, 2019 Association Rules Mining is one of the most studied and widely applied fields in Data Mining. However, the discovered models usually result in a very large set of rules; so the analysis capability, from the user point of view, is diminishing. Hence, it is difficult to use the found model in order to assist in the decisionmaking process. The previous handicap is hightened in the presence of redundant rules in the final set. In this work, a new definition of redundancy in association rules is proposed, based on user prior knowledge. A post-processing method is developed to eliminate this kind of redundancy, using association rules known by the user. Our proposal allows finding more compact models of association rules to ease its use in the decision-making process. The developed experiments have shown reduction levels that exceed 90 percent of all generated rules, using prior knowledge always below ten percent. So, our method improves the efficiency of association rules mining and the exploitation of discovered association rules. Povzetek: Opisan je sistem za zmanjševanje števila in dolžine pravil s pomočjo analize redundantnosti za metode asociativnega učenja. 1 Introduction Mining for association rules has been one of the most studied fields in data mining. Its main goal is to find unknown relations among items in a database. Given a set of items I which contains all the items in the domain and a transactional database D where every transaction is composed by a transaction id (tid) and a set of items, subset of I (itemset). An association rule is presented as an implication X ^ Y where X is the antecedent and Y is the consequent of the rule. Both X and Y are itemsets and usually, but not necessarily, they check X n Y = 0 property. Association rules reflect how much the presence of the rule antecedent influences the presence of the rule consequent in the database records. What generally makes a rule meaningful are two statistical factors: support and confidence. The support of a rule supp(X ^ Y) refers to the portion of the database transaction for which X n Y is true while confidence conf (X ^ Y) is a measure of certainty to evaluate the validity of the rule, it is a measure for the portion of record which contains Y from those that contain X. The problem with association rule mining deals with finding all the rules that satisfy a user-given threshold for support and confidence. Most algorithms face the challenge in a two steps procedure 1. Find all the itemsets which support value is equal or greater than the support threshold. 2. Generate all association rules X ^ (Y - X), considering: Y is a frequent itemset, X c Y, and conf (X ^ Y) is equal or greater than the confidence threshold value. The discovering of meaningful association rules can help in the decision-making process but the quite large number of rules usually makes it difficult for decision-makers in order to process, interpret and apply them. A significant part of the rules presented to the user are irrelevant because they are obvious, too general, too specific or because they are not relevant for the decision topic. Several methods were proposed in the literature to overcome this handicap such 168 Informatica 44 (2020) 167-181 J.C.D. Vera et al. as interest measures development, concise representations of frequent itemsets and redundancy reduction. Section 2 discusses some of the most important works in the field. This paper proposes a new approach to deal with redundancy, taking into account user previous knowledge about the studied domain. Previous knowledge is used to detect and prune redundant rules. We adapt the concept of redundancy and we propose a procedure to develop the redundancy reduction process in the post-processing stage. The paper is organized as follows. Section 2 discusses related work. In section 3 we propose an algorithm to find and prune redundant rules. In section 4 the proposed algorithm is used over three datasets one with data about financial investment [1], other with data about the USA census [2] and the other with data about Mushrooms [2]. Section 5 closes the paper with conclusions. 2 Related work Interestingness is difficult to define quantitatively [3] but most interestingness measures are classified in objective measures and subjective measures. Objective measures are domain-independent, one of them is the interestingness which is expressed in terms of statistic or information theory applied over the database. Several surveys [4, 5, 6] summarize and compare objective measures. The explosion of objective measures has raised a new problem: What are the best metrics to use in a specific situation and a particular application field? Several papers attempt to solve it [8, 9] but it is far from being solved. The correlation between 11 objective rule interestingness measures and real human interest over eight different datasets were computed in [10] and there was not a clear "winner", the correlation values associated with each measure varied considerably across the eight datasets. Subjective measures were proposed in order to involve explicitly user knowledge in the selection of interesting rules so that the user can make a better selection. According to [11] subjective measures are classified in: - Unexpectedness: a pattern is interesting if it is surprising to the user. - Actionability: a pattern is interesting if it can help the user to take some actions. Actionability started as an abstract notion, with an unclear definition, but nowadays, several researchers are interested in it. The actionability problem is discussed in [12]. Unexpectedness or novelty [13] was proposed in order to solve the pattern triviality problem, assessing the surprise level of the discovered rules. Several techniques have been used to accomplish this aim: - Templates: Templates are syntactic constraints that allow the user to define a group of rules that are interesting or not to him/her [14, 15]. A template is defined as A1...An ^ An+1 where A is a class name in a hierarchy or an expression E over a class name. Templates may be inclusive or restrictive. A rule is considered interesting if it matches an inclusive template and uninteresting if it matches a restrictive template. The use of templates is quite restrictive because the matching method requires each rule element to be an instance of the elements in templates, and all template elements must have at least one instance in the rule. Moreover, the template definition makes hard to use it for declaring restrictive templates because it should be composed of elements subsuming all attributes of the rule, being in a subsuming relation with the inclusive template elements. The best known form of templates is meta-rules [16, 40] a meta-rule is the relationship between two association rules. The main drawback of this approach is that meta-rules are restricted to having a single rule in their antecedent and consequent, because of this some important information may be lost. - Belief: Silbershatz and Tuzilin [11] defined user knowledge as a set of convictions, denominated belief. They are used in order to measure the unexpectedness of a pattern. Each belief is defined as a predicate formula expressed in first-order logic with a degree of confidence associated, measuring how much the user trusts in the belief. Two types of belief were defined: - Soft belief is that knowledge user accepts to change if new evidence contradicts the previous one. The interestingness of the new pattern is computed by how the new pattern changes the degree of beliefs. - Hard belief is that knowledge user will not change whatever new patterns are extracted. They are constraints that cannot be changed with new evidence. This approach is still in a development stage, no further advances were published, so it is not functional. - General Impressions: were presented in [17] and later developed in [18] and [19]. They developed a specification language to express expectations and goals. Three levels of specification were established: General Impressions, Reasonably Precise Concept and Precise Knowledge. Item taxonomies concept was integrated in the specification languages in order to generalize rule selection. The matching process involved a syntactic comparison between antecedent/consequent elements. Thus, each element in the general impression should find a correspondent in the association rule. - Logical Contradiction: was developed in [20]. It consists in extracting only those patterns which logically contradict the consequent of the corresponding belief. Knowledge Redundancy Approach to Reduce Size in. Informatica 44 (2020) 167-181 169 An association rule X ^ Y is unexpected with respect to some belief A ^ B if: - Y A B = FALSE B and Y are in logical contradiction; - X A B has an important support in the database. This condition eliminates those rules which could be considered unexpected, but not those concerning the same transaction in the database; - A, X ^ B exists. Income Balance Sex Unemployed Loan High High F No Yes High High M No Yes Low Low M No No Low High F Yes Yes Low High M Yes Yes Low Low F Yes No High Low M No Yes High Low F Yes Yes Low Medium M Yes No High Medium M No Yes Low Medium F Yes No Low Medium M No No Table 1: Sample transactions - Preference Model: was proposed in [21]. It is a specific type of user knowledge representing how the basic knowledge of the user, called knowledge rules (K), will be applied over a given scenario or tuples of the database. The user proposes a covering knowledge (Ct) for each tuple (t) - a subset of the knowledge rule set K that the user prefers to apply to the tuple t. The approach validates the transactions which satisfy the extracted rule. All the previously presented works use some kind of knowledge to reduce the number of useless association rules in the final set. In this way, our approach is similar to them but there are some remarkable differences. Like in templates our approach uses the syntactical notation of association rules to represent knowledge. Templates use this knowledge to constraint the structure of selected rules, pruning out those rules which do not satisfy the template but produce a lot of association rules with similar information. On the other hand, we use the knowledge to remove those rules with similar information, presenting to the user a set of unexpected rules that can help him to better understand the underlying domain. The approach followed by Belief tries to find just unknown rules, this is our main goal too but, they use a complex and fixed formal knowledge representation based on first order logic and degrees of belief with no clear way of building and maintaining the belief system. Instead, we use a simpler and natural rule-based form of knowledge, focused on the enhanced capability to increase interactively the knowledge system. 2.1 Rule redundancy reduction Research community accepts the semantical definition of association rule redundancy given in [22] "an association rule is redundant if it conveys the same information - or less general information - than the information conveyed by another rule of the same usefulness and the same relevance". But several formal definitions have been proposed over time. In table 1, a sample transactional database is presented. Defining a support threshold of 0.15 and a confidence threshold of 0.75, an association rule model with 92 rules is obtained. It is used to show redundancy definitions. Definition 1. Minimal non-redundant association rules[22]: An association rule R : X ^ Y is a minimal non-redundant association rule if there is not an association rule Ri : Xi ^ Yi with: - support(R) = support(Ri) - confidence(R) = confidence(Ri) - Xi C X and Y C Yi From data on table 1 we can obtain the rules: R : {[balance].[medium]} ^ {[income].[low], [loan].[no]} supp = 0.25, conf = 0.75 and Ri : {[balance].[medium]} ^ {[loan].[no]} supp = 0.25, conf = 0.75. According to definition 1 R is a redundant rule. No new information is provided by its inclusion into the association rules model. Several works have been developed to prune that kind of redundancy. Mining Closed Associations, uses frequent closed itemsets [23] tries to produce the set of minimal generators for each itemset. The number of closed association rules is linear to the number of closed frequent itemsets. It can be large for sparse and large datasets. The Generic Basis (GB) and the Informative Basis (IB) [22] used the Galois connections to propose two condensed basics that represent non-redundant rules. The Gen-GB and Gen-RI algorithms were presented to obtain a generic basis and a transitive reduction of the IB. The reduction ratio of IB was improved by [24] maximal closed itemsets. The Informative Generic Basis [25] also uses the Galois connection semantics but taking the support of all frequent itemsets as an entry, so it can calculate the support and confidence of derived rules. The augmented Iceberg Galois lattice was used to construct the Minimal Generic Basis (MGB) [26]. The concept of generator was incorporated into high utility itemsets mining in [27]. The redundancy definition presented in definition 1 requires that a redundant rule and its corresponding non-redundant rule must have identical confidence and identical support. From data on table 1 we can obtain the 170 Informatica 44 (2020) 167-181 J.C.D. Vera et al. rules: R : {[income\.[higk], [unemployed].[no]} ^ {[loan].[yes]} supp = 0.33, conf = 1.0, and Ri : {[income].[high]} ^ {\loan].[yes]} supp = 0.41, conf = 1.0 those rules are non-redundant ones, but the consequent of R can be obtained from R1 a rule with the same confidence and fewer conditions. So without R the same results are achieved, rule R must be a redundant rule. Xu [28] formalizes this kind of redundancy in definition 2. Definition 2. Redundant rules[28]: Let X ^ Y and X1 ^ Y1 be two association rules with confidence cf and cfi, respectively. X ^ Y is said to be a redundant rule to Xi ^ Yi if - Xi C X and Y C Yi - cf < cfi Based on definition 2 the Reliable basis was proposed. It consists of two bases the ReliableApprox used in partial rules, and ReliableExact used in exact rules. Frequent closed itemsets are used to perform the reliable redundancy reduction process. It generates rules with minimal antecedent and maximal consequent. The reliable basis removes a great amount of redundancy without reducing the inference capacity of the remaining rules. Phan [29] uses a more radical approach to define redundancy see definition 3. Definition 3. Representative association rules[29]: Let X ^ Y an association rule. X ^ Y is said to be a representative association rule if there is not other interesting rule Xi ^ Yi such that Xi C X and Y C Yi. The redundancy definitions presented above do not guarantee the exclusion of all non-interesting patterns of the final model. Example 1 shows a group of rules with no new information to the user, and they are not classified as redundant by the previous definitions. Example 1. A set of redundant rules from data in table 1 Let's see a subset of association rules obtained from table 1: Ri : {[income].[high]} ^ {\loan].[yes]} R2 : {[sex].[female], [unemployed].[no]} ^ {[income] .[high]} R3 : {[sex].[female], [unemployed].[no]} ^ {[income].[high], \loan].[yes]} R4 : {[sex].[female], [unemployed].[no]} ^ {[loan].[yes]} R5 : {[income].[high], [loan].[yes]} ^ {[unemployed]. [no]} Rq : {[income].[high], [loan].[yes], [sex].[male]} ^ {[unemployed]. [no]} R7 : {[balance].[high], [income].[high], [loan].[yes]} ^ {[unemployed]. [no]} If we analyze the rules Ri and R3 we see that item [loan].[yes] in R3 consequent provides no new information, because this is known by Ri. So rule R3 is redundant but this kind of redundancy is not detected by the previous definitions. Analyzing rules Ri, R2 and R4 we can check that combining, transitively, of Ri and R2 it will produce R4 so, R4 is redundant. One more time this kind of redundancy is not detected by previous definitions. In R5,R6 and R7 antecedent the item [loan]. [yes] provides no new information because this is known by Ri. It is redundant and must be pruned, but it can not be detected by redundancy definitions. 2.2 Post-processing Since the year 2000, the interest in post-processing methods in association rules has been increasing. Perhaps the most accurate definition of post-processing tasks were done by Baesens et al. [30] Post-processing consists of different techniques that can be used independently or together: pruning, summarizing, grouping and visualization. We have a special interest in pruning techniques that prune those rules that do not match to the user knowledge. Those techniques are associated with interestingness measures that may not satisfy the downward closure property, so it is impossible to integrate them in Apriori like extraction algorithms. An element to consider is the nature of Knowledge Discovery in Databases (KDD) as an interactive and iterative user-centered process. Enforcing constraints during the mining runs neglects the character of KDD [31], [32]. A single and possibly expensive mining run is accepted but all subsequent mining questions are supposed to be satisfied with the initial result set. In this work, a method is developed to obtain nonredundant association rules about user knowledge. It is important to ensure the user capability to refine his/her knowledge in an interactive and iterative way, accepting any of the discovered associations or discarding some previous associations and updating prior knowledge. This approach also makes possible to fulfill the mining question of different users, with different domain knowledge, in a single mining run. 3 A knowledge guided approach 3.1 Knowledge based redundancy In example 1, a group of redundant rules, which are currently not covered by the definitions of redundancy, are showed. Our interest is to eliminate these forms of redundancy in association rule models. Based on a core set of rules that represent the user belief; a result of his experience working in the subject area. This knowledge is more general than rules obtained in the mining process which Knowledge Redundancy Approach to Reduce Size in. Informatica 44 (2020) 167-181 171 only represent a particular dataset with partial information so the quality metric value for this kind of rule is considered maximal. This set of rules will be named prior knowledge. A rule that does not contradict prior knowledge of the user will be considered redundant. We formalize the notion of prior knowledge redundancy in definition 4. User can represent previous knowledge in different ways like semantic networks, ontologies, among others. Considering that, the expert is interested in association rules discovering, prior knowledge is incorporated to the model using association rules format. For example an expert working with the dataset presented in table 1 knows that customers with high income ([income].[high]) pay their loans on time and therefore these must be approved. This knowledge can be represented as the association rule {[income].[high]} ^ {[loan].[yes]}. Definition 4. Knowledge Based Redundancy: Let S be a set of association rules and Sc a set of prior known rules, defined over the same domain of S. An association rule R : X ^ Y € S is redundant with respect to Sc if there is a rule R : X ^ Y € Sc and fulfills some of the following conditions. 1. X' C X A Y' n Y = {0} A rule is redundant if there is another rule presented in Sc that contains more general information. 2. X' C X A 3R" : X'' ^ Y'' € Sc : X'' C Y' A Y C Y'' A rule R is redundant if there is a rule R in Sc that contains part or the whole antecedent and there is a third rule R in Sc that shares information with R and its consequent contains R consequent. 3. X' C X A Y' n X = {0} A rule is redundant if its antecedent contains a part or the whole information of a previously known rule. 4. X' C Y A Y' n Y = {0} A rule is redundant if its consequent contains a part or the whole information of a previously known rule. Reviewing rules in example 1 with definition 4 we have: Sc = {{[income].[high]} ^ {[loan].[yes]}, {[sex]. [female], [unemployed, ]. [no]} ^ {[income]. [high]}} Rule R3 : {[sex].[female], [unemployed].[no]} ^ {[income].[high], [loan].[yes]} fulfills condition 1 in definition 4 because: 1. [sex].[female], [unemployed].[no] C [sex]. [female], [unemployed]. [no] 2. [income].[high] C [income].[high], [loan].[yes] Rule R3 : {[sex].[female], [unemployed].[no]} ^ {[income].[high], [loan].[yes]} fulfills condition 4 in definition 4 because: 1. [income].[high] C [income].[high], [loan].[yes] 2. [loan].[yes] Ç [income].[high], [loan].[yes] Rule R4 : {[sex].[female], [unemployed].[no]} ^ {[loan].[yes]} fulfills condition 2 in definition 4 because: 1. [sex].[female], [unemployed].[no] Ç [sex]. [female], [unemployed]. [no] 2. [income].[high] Ç [income].[high] 3. [loan].[yes] Ç [loan].[yes] Rule R5 : {[income].[high], [loan].[yes]} ^ {[unemployed]. [no]} fulfills condition 3 in definition 4 because: 1. [income].[high] Ç [income].[high], [loan].[yes] 2. [loan].[yes] Ç [income].[high], [loan].[yes] Rule Re : {[income].[high], [loan].[yes], [sex].[male]} ^ {[unemployed]. [no]} fulfills condition 3 in definition 4 because: 1. [income].[high] [income].[high], [loan].[yes], [sex].[male] C 2. [loan].[yes] C [income].[high], [loan].[yes], [sex].[male] Rule R7 : {[balance].[high], [income].[high], [loan].[yes]} ^ {[unemployed]. [no]} fulfills condition 3 in definition 4 because: 1. [income].[high] C [balance].[high], [income].[high], [loan].[yes] 2. [loan].[yes] C [balance].[high], [income].[high], [loan].[yes] Armstrong's axioms [33] are a set of inference rules. They allow to obtain the minimum set of functional dependencies that are maintained in a database. The rest of functional dependencies can be derived from this set. They are part of clear mechanisms designed to find smaller subsets of a larger set of functional dependencies called "covers" that are equivalent to the "bases" in Closure Spaces and Data Mining. Armstrong's axioms can not be used as an inference mechanism in association rules [34] because it is impossible to obtain the values of support and confidence in the derived rules: - Reflexivity (if B c A then A ^ B) holds because conf(A ->• B) = suPP(AnB) = suPP(a) = 1 •'V > supp(A) supp(A) app(A) - Transitivity if A ^ B and B ^ C both hold with confidence > threshold we can not know the value for conf (AD ^ C) so the Transitivity does not hold. - Augmentation (if A ^ B then AC ^ B) does not hold. Enlarging the antecedent of a rule may give a rule with much smaller confidence, even zero: think 172 Informatica 44 (2020) 167-181 J.C.D. Vera et al. of a case where most of the times X appears it comes with Z, but it only comes with Y when Z is not present; then the confidence of X ^ Z may be high whereas the confidence of XY ^ Z may be null. Our intention is to use Armstrong's axioms in order to assess if a rule has Prior Knowledge Redundancy over a set of rules Sc from previous knowledge. So they must verify the condition presented in definition 4. Condition X' C X A Y' n Y = {0} represents the classical definition of redundancy like in definition 1, definition 2 and definition 3. This condition is fulfilled if a single attribute in Y is redundant. Armstrong's axioms can be used to perform this operation. Let R1 : X ^ Y and R2 : X ^ Y be association rules. Suppose Y n Y = Y1. Then by the reflexivity axiom on R2 consequent R3 : Y ^ Y1 and by reflexivity on R1 consequent R4 : Y ^ Y1. By transitivity between R1 and R3 we have R5 : X ^ Y1, applying transitivity between R2 and R4 we have R6 : X ^ Y1. X C X by statement condition, applying augmentation in R6 until X = X, R7 : X ^ Y1. Therefore Armstrong's axioms check the condition. For example, the rule R : {[income].[high], [sex].[male]} ^ {[loan].[yes], [unemployed].[no]} is part of the association model generated from the dataset in table 1. This rule can be classified as redundant by condition 1 of definition 4 with respect to prior knowledge. Sc = {Rs1 : [income]. [high] ^ [loan].[yes], Rs2 : [sex].[female], [unemployed].[no] ^ [income].[high]}. By the application of Reflexivity, we have that R1 : [loan].[yes] ^ [loan].[yes] by Augmentation of [unemployed]. [no] on R1 we have R2 : [loan].[yes], [unemployed].[no] ^ [loan].[yes] and by Transitivity between R and R2 we have R3 : [income].[high], [sex].[male] ^ [loan].[yes], the same procedure must be followed to [unemployed].[no]. Now by Augmentation of [sex]. [male] in rule [income].[high] ^ [loan].[yes] G Sc we have R4 : [income].[high], [sex].[male] ^ [loan].[yes] R4 = R3 so item [loan].[yes] is redundant in R and therefore R is also redundant. Condition X' C X A 3R'' : X'' ^ Y'' G Sc : X'' C Y' A Y C Y'' represents the notion of transitivity a common term in human thinking. This condition is fulfilled if a single attribute in Y is redundant. Let R1 : X ^ Y, R2 : X' ^ Y' and R3 : X'' ^ Y'' be rules. Suppose Y n Y = Y1. Then by the re-flexitivity axiom on R1 consequent R4 : Y ^ Y1 by transitivity between R1 and R4 we have R5 : X ^ Y1. By statement condition X'' C Y' so by reflexivity on R2 consequent we have R6 : Y ^ X . By transitivity between R2 and R6 we have R7 : X ^ X now by transitivity between R2 and R7 we have R8 : X ^ Y . Applying augmentation in R8 until we have Rg : X ^ Y .By reflexivity in R9 consequent R1o : Y ^ Y1 and by transitivity between Rg and R1o we have R11 : X ^ Y1. Therefore Armstrong's axioms check the condition. For example, taking into account rule R : {[sex].[female], [unemployed].[no]} ^ {[loan].[yes]} and prior knowledge Sc = {Rs1 : [income].[high] ^ [loan].[yes], Rs2 : [sex].[female], [unemployed].[no] ^ [income].[high]}. R is classified as redundant according to condition 2 in definition 4. R is a single consequent rule so no separation is needed. By the application of Transitivity between [income].[high] ^ [loan].[yes] and [sex].[female], [unemployed].[no] ^ [loan].[yes] both in Sc the rule Ri : [sex].[female], [unemployed].[no] ^ [loan]. [yes] is obtained R = R1 so R is a redundant rule. Condition X ç X A Y n X = {0} represents the case when any item in the antecedent of a rule is a redundant one. Let R1 : X ^ Y and R2 : X' ^ Y' be rules. Suppose Y' n X = X1. Then by augmentation of X1 in R2 we have R3 : X X1 ^ X1Y and by transitivity between R3 and R1 R4 : X ^ Y . Therefore Armstrong's axioms fulfill the condition. For example, with R : {[income].[high], [loan].[yes]} ^ {[unemployed].[no]} and Sc = {Rs1 : [income].[high] ^ [loan].[yes], Rs2 : [sex].[female], [unemployed].[no] ^ [income].[high]} R is classified as redundant by condition 3 in definition 4. Applying Reflexivity of [income].[high] in [income].[high] ^ [loan].[yes] rule R1 : [income].[high] ^ [income].[high], [loan].[yes] is obtained by Transitivity between R1 and R we have R2 : [income].[high] ^ [income].[high] R2 is simpler than R with the same information so R is a redundant rule. However, by Augmentation of [loan].[yes] in R2 we have R3 : [income].[high], [loan].[yes] ^ [unemployed].[no] R = R3. Condition X' ç Y A Y' n Y = {0} represents the case when any item in the consequent of R is redundant with respect to other item in consequent. This condition is fulfilled if a single attribute in Y is redundant. Let R1 : X ^ Y and R2 : X' ^ Y' be rules. Suppose Y n Y' = Y1. Then by the reflexivity axiom on R2 consequent R3 : Y ^ Y1 by transitivity between R2 and R3 we have R4 : X ^ Y1. By statement condition we have X ç Y so by transitivity between R1 and R4 we have R5 : X ^ Y1. Therefore Armstrong's axioms fulfill the condition. For example, R : {[balance].[high], [unemployed].[no]} ^ {[income].[high], [loan].[yes]} and Sc = {Rs1 : [income]. [high] ^ [loan].[yes], Rs2[sex].[female], [unemployed].[no] ^ [income].[high]}. R is redundant according to condition 4 in definition 4. Applying Reflexiv-ity, Augmentation and Transitivity we obtain R1 : [balance].[high], [unemployed].[no] ^ [income].[high] and R2 : [balance].[high], [unemployed].[no] ^ [loan].[yes] now by Transitivity between R1 and [income].[high] ^ [loan].[yes] G Sc we have R3 : [balance].[high], [unemployed].[no] ^ [loan].[yes]. R2 = R3 so R is a redundant rule. Knowledge Redundancy Approach to Reduce Size in. Informatica 44 (2020) 167-181 173 We do not use Armstrong's Axioms as an inference mechanism so, we do not worry if it is not able to ensure the support and confidence threshold in the inferred rules. R2 : {[income].[high], [loan].[yes]} ^ {[unemployed]. [no]} to show the performance of algorithm 1. For Ri we have: 3.2 Algorithm to eliminate prior knowledge redundancy in association rules In this section we present an algorithm to determine if a rule contains redundant items, see Fig. 1. The closure algorithm presented in [35] is used to compute X +. Require: Set of previous knowledge rules Sc A rule R in form X ^ Y Ensure: Boolean value to indicate if the rule is redundant i = 0 n = |Y | while i < n do if Y[i] e X+CUX^(Y-{Y[i]}) then return true end if i = i +1 end while i=0 n = |X | while i < n do if X[i] e (X - X[i])+cU(x-x[i])^ return true end if i = i +1 end while return false then Algorithm 1: Prior Knowledge Redundancy detection To determine the redundancy of a rule X ^ Y we have to prove if any item A in the rule's antecedent is redundant or if an item W in the consequent is redundant. The item A is redundant if the consequent can be derived from the prior knowledge without A. The first part of algorithm 1 performs this task for all items A e X by calculating the closure of the new antecedent X - {A} over the previous knowledge rules joined to the studied rule focus, and comparing results with the closure of the same antecedent over the set of previous rules joined to a new rule, where the item A is not a part of the antecedent. If both results are equal, then the item A is redundant and the entire rule is also redundant. To test if item W is redundant we have to apply a similar procedure, the second part of algorithm 1 performs this task. Example 2. Prior Knowledge Redundancy detection: We use the following Prior Knowledge Sc = {Rsi : [income].[high] ^ [loan].[yes], Rs [sex]. [female], [unemployed]. [no] [income].[high]} and the rules Ri : {[balance].[high], [unemployed].[no]} ^ {[income].[high], [loan].[yes]} and The first step is to compute F = Sc U Ri for Ri F = {Rfi : [income].[high] ^ [loan].[yes], Rf2 : [sex].[female], [unemployed].[no] ^ [income].[high], Rf3 : [balance].[high], [unemployed].[no] ^ [income].[high], [loan].[yes]}. Second, checks the redundancy in the antecedent, computing closure of [balance].[high] over F. This is [balance]. [high] + = [balance].[high] and comparing with closure of [balance]. [high] over G where G = ((F - {Ri}) U ([balance].[high]) ^ [income].[high], [loan].[yes]), [balance]. [high]G = [balance].[high], [income].[high], [loan].[yes]. They are different so [unemployed]. [no] is not redundant. The item [balance].[high] is also non-redundant. And last, checks the redundancy in the consequent. F' = {(F - Ri U ([balance].[high], [unemployed].[no] ^ [income].[high])} [balance].[high], [unemployed]. [no] + = [balance]. [high], [unemployed]. [no], [income].[high], [loan].[yes], [balance].[high], [unemployed].[no] + = [balance].[high], [unemployed].[no], [income]. [high], [loan].[yes]. They are the same so the item [loan].[yes] and the rule Ri are redundant. For R2 we have: - F = (F - Ri) U [income].[high] ^ [unemployed]. [no]. F = {(F - Ri) : [income]. [high] ^ [loan]. [yes], Rf2[sex].[female], [unemployed].[no] ^ [income].[high], Rf3[income].[high], [loan].[yes] ^ [unemployed]. [no]}. - [income].[high]+ = [income].[high], [loan].[yes], [unemployed].[no], [income]. [high]+' = [income].[high], [loan].[yes], [unemployed].[no]. They are the same so the rule is redundant. 3.2.1 Correctness We first prove that closure algorithm [35] can be used to detect redundancy according to definition 4. Closure algorithm applies Armstrong's axioms to find all items implied by a given itemset. Theorem 1. Let Sc be a set of prior known rules and R : X ^ Y an association rule. If there is a rule R : X ^ Y' e Sc and X' C X A Y' n Y = {0} then Y' n Y e X+ Proof. Assume X' C X A Y' n Y = {0}. Then X' e X+ by assumption X C X and reflexivity axiom. So 2 174 Informatica 44 (2020) 167-181 J.C.D. Vera et al. Y € X+ by transitivity between X ^ X' and X' ^ Y'. Therefore Y' n Y € X+ by definition of set intersection. □ Theorem 2. Let Sc be a set of prior known rules and R : X ^ Y one association rule. If there is a rule R : X ^ Y' € Sc and X' C XA 3R'' : X'' ^ Y'' € Sc : X'' C Y' A Y C Y'' then Y € X+. Proof. Assume X' C X A 3R'' : X'' ^ Y'' € Sc : X'' C Y' A Y C Y''. Then X' € X+ by assumption X' C X and reflexivity axiom. Y € X+ by transitivity between X ^ X' and X' ^ Y'. X'' € X+ by assumption X ' C Y' and subset definition. So Y' € X+ by transitivity between X ^ X and X ^ Y . Therefore Y € X+ by assumption Y C Y' and subset definition. □ Theorem 3. Let Sc be a set of prior known rules and R : X ^ Y one association rule. If there is a rule R : X ^ Y' € Sc and X' C X A Y' n X = {0} then Y' n X € (X - (Y' n X))+ . Proof. AssumeX' C XAY'nX = {0}. ThenX' € (X-(Y nX))+ by assumption X C X and reflexivity axiom. Y' € (X - (Y' n X))+ by transitivity between X ^ X' and X' ^ Y'. Therefore Y' n X € (X - (Y' n X))+ by definition of set intersection. □ Theorem 4. Let Sc be a set of prior known rules and R : X ^ Y one association rule. If there is a rule R : X ^ Y' € Sc and X' C Y A Y' n Y = {0} then Y' n Y € X+ 5CUX^(Y— (Y' nY)) Proof. Assume X' C Y A Y' n Y = {0}. Then X' € high-level properties based on a domain theory [39] covering specifics of the application area. We say {P}C{Q} is true, if whenever C is executed in a state satisfying {P} and if the execution of C finishes, then the state in which C execution finishes satisfies Q. If there is a loop in C, loop invariants must be used to prove correctness. If loop invariants are proved to be true after each loop iteration then the postcondiction must be proven true. In algorithm 1 lines one through eight and lines nine through sixteen perform basically the same operation, one over the rule antecedent and the other over the rule consequent. So we analize them only one time. Line four checks if Y [¿j is subset of the closure. So closure algorithm must be computed, this algorithm has been proved as correct[35]. The search of Y[¿j within closure can be done by a well known linear search algorithm, we assume it is correct. Precondictions: - Sc is a set of previous knowledge rules. - X ^ Y is an association rule with X = Xi,..,Xn and Y = Yi,..,Ym Postcondition: If (3A, € X A A € (X - Al)+cU(X — A^Y) V (3Wi € Y A € X+uX^Y — Wi) the return value is true. Loop invariants: If the loop is executed j or more times, then after j executions - i = j - 0 < i < n X+cUX^(Y—(Y'nY)) by assumption X' C Y and associa- - Y[h] € x+ux^(Y—{Y[i]»for 0 < h n or Y[î] G —Y¡¿j) - Case 1 (j > n): loop invariant implies that Y [h] G X+ ux^(Y_Y¡hj) for 0 < h < n, so no element in cosequent is a redundant one. - Case 2 (j < n): loop invariant implies that Y [«] G X+ ux^(Y_y¡¿j) and true is returned Conclusions: Poscondition is satisfied in either case, so the algorithm is correct. 4 Experimental results 4.1 Methodology In order to verify the effectiveness of our approach we performed experiments with four datasets. The first one with data about USA census[2], the second one with data about stock market investments [1], the third one with data about hypothetical samples of mushroom[2] and the last one with data about breast cancer[2]. Prior knowledge consists of 6 rules for each dataset. We use Pruning Ratio metric PR = (PrunedRules/TotalRules) x 100 to evaluate our results. 3.2.2 Complexity analysis Time complexity of an algorithm is a function T(n) limiting the maximum number of steps in the algorithm for an input size n. T(n) depends on what is counted as one computation step, the random access machine (RAM) model is the most extended one. RAM is a model for a simple digital computer with random access memory. For the sake of simplicity T (n) is approximated by a simplest function, it is written T (n) = O(f (n)) if there are constants c > 0 and ni > 0 such that: T(n) < cf (n) for all n > ni. For algorithm in Fig 1 we considered a as the number of different attribute symbols in Sc and p the number of previous knowledge rules presented in Sc. The complexity order to compute the closure is O(n) see [35]. The execution time of the first while loop (the consequent of the rule) takes a * p since the number of rules in F is p, and we compute the closure with a cost of O(p). The execution time of the second while loop (the antecedent of the rule) takes the same value of a * p because it performs the same operation and in the same way the complexity of the steps is O(ap). To compute the complexity of the entire algorithm, the complexity of the first and second while loops must be added so it is O(ap) + O(ap) = 2O(ap) but the constant 2 can be ignored and the final value for complexity of the algorithm is O(ap). Association rules extraction algorithms have much higher complexity [36] than the reduction approach presented here. This difference led us to propose a reduction mechanism in which rule extraction algorithm is executed once and then, in the post processing stage, the reduction algorithm is fired to prune the redundant rules, rather than applying prior knowledge as restriction within the extraction algorithms, which would force to execute it for each different user and even for each change on a user's prior knowledge. The computational cost for the constraint approach is very high. However, our approach, in post processing stage, allows us to run a simpler routine when the user changes or the user prior knowledge is updated. The temporal cost of this approach did not exceed 5 seconds in any of the applied tests. Table 2 shows the result of the experiments. Each row corresponds to an experiment following the next steps: 1. Find the complete set of rules using as support threshold the value in column 2 and confidence threshold the value in column 3. The number of rules is showed in column 4. 2. Apply the steps presented in algorithm 1. The number of pruned rules are presented in column 5 of Table 2. 3. After applying the algorithm to the dataset, the final number of rules is presented in column 6 of Table 2 while column 7 contains the pruning ratio. The execution time is presented in column 8. 4.2 Results and discussion Pruning Ratio changes according to support in Census and Stocks datasets, first increasing while the support increases, but when the support is greater than 0.07 for the Census dataset and greater than 0.5 for Stocks dataset, the Pruning Ratio decreases while the support increases. The behavior in Mushroom dataset is the opposite, the Pruning Ratio decreases while support increases until the support reaches the 0.5 value then the Pruning Ratio increases while the support value increases. This behavior shows a relation between support and previous knowledge patterns. If the support is increased, then a number of rules do not meet the support threshold and they are discarded. Hence the discarded rules have no major impact on the rules derived from previous knowledge, Pruning Ratio will be increased, but as the support increases it starts to reduce the rules derived from previous knowledge, so the Pruning Ratio will be decreased. In Fig 1, Fig 2 and Fig 3 the mean value of Pruning Ratio is shown for several support values in Census, Stocks and Mush datasets respectively using combination of all six rules in Sc. 176 Informatica 44 (2020) 167-181 J.C.D. Vera et al. Dataset Support Confidence Rules Pruned Rules Final Rules Pruning Ratio Time Census 0.01 0.4 3408 942 2466 27 0.589 Census 0.03 0.4 835 242 593 28 0.079 Census 0.05 0.4 458 158 300 32 0.043 Census 0.07 0.4 229 79 150 34 0.021 Census 0.09 0.4 163 51 112 31 0.015 Census 0.11 0.4 114 23 91 20 0.010 Stocks 0.2 0.4 11010 5592 5418 50 2.170 Stocks 0.3 0.4 3314 2225 1089 67 0.536 Stocks 0.4 0.4 1230 904 326 73 0.116 Stocks 0.5 0.4 349 294 55 84 0.039 Stocks 0.6 0.4 212 64 148 30 0.020 Mushroom 0.3 0.5 78998 29154 49844 36 11.245 Mushroom 0.4 0.5 5767 1225 4542 21 0.852 Mushroom 0.5 0.5 1148 200 948 17 0.098 Mushroom 0.6 0.5 266 88 178 33 0.025 Mushroom 0.7 0.5 180 83 97 46 0.017 Breast 0.01 0.4 210500 98582 111918 47 27.732 Breast 0.1 0.4 28808 13695 15113 47 4.190 Breast 0.2 0.4 6092 2982 3110 49 0.859 Breast 0.3 0.4 5284 2398 2886 45 0.798 Breast 0.4 0.4 1246 449 797 36 0.118 Table 2: Experiment's result 4.3 Traditional vs. knowledge based reduction The approach developed in this paper differs from those published until now. Previous woks are concerned with the structural relationship between association rules and mechanisms to reduce redundancy using inference rules and maximal itemsets. We use the user experience to prune rules that do not bring new knowledge to the user, simplifying decision making. Both approaches are not comparable in essence, but we carried out experiments to compare KBR's pruning ratio with previous works. Fig 4 shows the pruning ratio of some relevant works in redundancy reduction, over a Mushroom dataset with a support value of 0.3. We used Mushroom dataset because we can access to author experiments and it is sufficient to test our case. The values for pruning ratio are taken from the author's papers: MinMax, Reliable, GB, CHARM, CRS and MetaRules.[40] Reliable has the best Pruning Ratio, see Fig 4, so we compare it with our approach at different support values, see Table 3. Reliable Pruning Ratio is the best of KBR6ruies, KBRgruies and KB Ri2rui.es ■ Nevertheless, KB Routes reaches better Pruning Ratio than Reliable for all supports except 0.4, see Fig. 6. A previous knowledge of 15 rules is equivalent to 0.018% of the whole rule set, for a support value of 0.3, and 7.9% for a support value of 0.7. With very few rules in KBR is possible to exceed the Pruning Ratio of previous works. Of course there is a narrow relationship between the Pruning Ratio and the repercussion of the previous knowledge rules over the whole set of rules. The Pruning Ratio of knowledge rules increases in the same way that they are able to describe the domain under study. The better KBR results are, the better the user will know the domain under study. Our approach has the possibility to determine when a model can not be improved like in the case of KBR15ruies for a support value of 0.7 where the Pruning Ratio is 100%. 4.4 Knowledge vs knowledge based reduction In section 2 we surveyed some works that used knowledge to reduce the number of association rules presented to the final user. The main goal of those papers is to obtain a set of association rules that satisfies some constraint provided by users, using different forms of knowledge representation. They are able to reduce the association rules set cardinality but generate a lot of rules that represent the same knowledge. Strictly speaking we can not compare our proposal with those ones because of the difference between goals, but we want to test the association rules model cardinality reduction capability of our approach with template, the best known form of knowledge approach. We compare the pruning ratio of our approach with the template implementation proposed in [41] that up-perform the implementation proposed in [16] across five dataset from [2]. Knowledge Redundancy Approach to Reduce Size in. Informatica 44 (2020) 167-181 177 35 30 25 a & 20 eg ö "3 3 15 10 2 3 4 5 Rules in previous knowledge 1 6 Figure 1: Rules pruned in census dataset —o— supp 0.01 —D- supp 0.03 —®— supp 0.05 - supp 0.07 -o- supp 0.09 --0-- supp 0.11 Support Reliable K B R6rules KBRgrules KBRi2rules KBRi5rules 0.3 95 36 76 80 96 0.4 90 21 37 47 84 0.5 89 17 30 44 93 0.6 74 33 40 62 97 0.7 78 46 46 75 100 Average 85 32,5 45,8 61,5 94 Table 3: Pruning Ratio - Mushroom data (mush) - Johns Hopkins University Ionosphere data (ion) - Statlog Project Heart Disease data (hea) - Thyroid Disease data (thy) - Attitudes Toward Workplace Smoking Restrictions data (smo) The continuous attributes in the data sets used were dis-cretized using a 4-bin equal-frequency discretization. Support and Confidence were set to the same values used in [16]. In table 4 we present the result of our pruning approach (KBR) and compare it with the previous work (MetaRules) [41]. Each row in table 4 represents an experiment where column Dataset contains the dataset id, column TotalRules shows the total number of rules produced by extraction algorithms, MetaRules presents the remaining rules after the application of the aplgorithm proposed in [41] while column KBR contains the average of remaining rules of ten runs of knowledge based redundancy elimination algorithm using a random knowledge of ten rules for each execution. The remaining rules in our approach are lower than the number of rules in metarules approach for all datasets. Dataset TotalRules MetaRules KBR mush 1374 138 120.2 ion 1215 452 402.6 hea 371 246 176.7 thy 1442 502 431.6 smo 797 300 283.3 Table 4: Remaining rules 5 Conclusion The fundamental idea in this work is linked to the main definition of data mining: analysis of large amount of data to extract interesting patterns, previously unknown and the consideration that an association rule that correspond to prior knowledge is a redundant one[37]. Our approach prunes those rules, presenting a simpler model to the final user. The main contribution in this work is the definition of redundancy of association rules with respect to prior knowledge, and the definition of a mechanism to eliminate this kind of redundancy from the final model of association 178 Informatica 44 (2020) 167-181 J.C.D. Vera et al. 80 70 60 0 1 50 eg 1 40 3 & 30 20 10 0 - supp 0.2 - supp 0.3 - supp 0.4 - supp 0.5 supp 0.6 2 3 4 5 Rules in previous knowledge 1 6 Figure 2: Rules pruned in stocks dataset rules presented to the end user. The redundancy elimination is performed in two procedures, the first one to detect and prune redundant element in rules antecedent and consequent, and the second one to detect if all information provided by a rule is redundant with respect to prior knowledge and then to prune it. The results of this study confirm it is possible to use prior knowledge of experts to reduce the volume of association rules. Models of association rules with fewer rules can be interpreted more clearly by specialists so they can generate advantages in decision making process. The experimental results show that prior knowledge of less than 10% can reach a reduction ratio above 90%. Acknowledgement This research is partially funded by the research project TIC1582 by "Consejería de Economía, Innovacion, Ciencia y Empleo from Junta de Andalucia" (Spain). References [1] J. Núñez, (2007), "Empleo de Fuzzy OLAP para Obtener Reglas que Caractericen Estrategias de Inversión". [2] D. J. Newman, (2007), "UCI Repository of Machine Learning Databases",University of California, School of Information and Computer Science, Irvine, CA. [3] Sisodia, Dilip Singh and Singhal, Riya and Khan-dal, Vijay, (2018), "Comparative performance of in-terestingness measures to identify redundant and noninformative rules from web usage data", International Journal of Technology. https://doi.org/10. 14716/ijtech.v9i1.1510 [4] Ali Yousif Hasan, (2019), "Evaluation and Validation of the Interest of the Rules Association in Data-Mining", International Journal of Computer Science and Mobile Computing, Vol.8 Issue.3, pp. 230-239. [5] N. Bhargava, M. Shukla, (2016), "Survey of Interest-ingness Measures for Association Rules Mining: Data Mining, Data Science for Business Perspective", International Journal of Computer Science and Information Technology (IJCSITS), Vol.6, No.2, Mar-April 2016, pp. 74-80. [6] Sudarsanam, Nandan and Kumar, Nishanth and Sharma, Abhishek and Ravindran, Balaraman, (2019), "Rate of change analysis for interestingness measures", Knowledge and Information Systems. https: //doi.org/10.1007/s10115-019-01352-3 [7] J. Blanchard, F. Guillet, P. Kuntz, (2009), "Semantics-based classification of rule interestingness measures in Post-mining of association rules: techniques for effective knowledge extraction", IGI Global, pp. 56-79. https://doi.org/10.4018/ 97 8-1-60566-4 04-0.ch0 04 [8] V. de Carvalho, V. Oliveira, R. de Padua, S. Oliveira, (2016), "Solving the Problem of Selecting Suitable Objective Measures by Clustering Association Rules Through the Measures Themselves", SOFSEM 2016: Theory and Practice of Computer Science. Springer Berlin Heidelberg. pp. 505-517. https://doi. org/10.10 07/97 8-3-662-4 9192-8_41 Knowledge Redundancy Approach to Reduce Size in. Informatica 44 (2020) 167-181 179 50 1 2 3 4 5 Rules in previous knowledge 0 6 Figure 3: Rules pruned in Mushroom dataset —o— supp 0.3 -□— supp 0.4 —®— supp 0.5 - supp 0.6 —o— supp 0.7 [9] V. Oliveira, D. Duarte, M. Violante, W. dos Santos, R. de Padua, S. Oliveira, (2017), "Ranking Association Rules by Clustering Through Interestingness", in Mexican International Conference on Artificial Intelligence, pp 336-351. Annals of Data Science 1.1 (2014): pp. 25-39. [10] D. R. Carvalho, A. A. Freitas, N. Ebecken, (2005), "Evaluating the correlation between objective rule interestingness measures and real human interest", Knowledge Discovery in Databases: PKDD 2005, Springer, pp. 453-461. https://doi.org/10. 1007/11564126_45 [11] A. Silberschatz, A. Tuzhilin, (1996), "What makes patterns interesting in knowledge discovery systems", IEEE Trans. Knowledge Data Eng, vol. 8, no. 6, pp. 970-974. https://doi.org/10.1109/69. 553165 [12] R. Batra, M. A. Rehman, (2019), "Actionable Knowledge Dsicovery for Increasing Enterprise Profit, Using Domain Driven Data Mining.", IEEE Acces vol.7, pp. 182924-182936. https://doi.org/ 10.1109/access.2019.2959841 [13] R. Sehti, B. Shekar, (2019), "Subjective interest-ingness in Association Rule Mining: A Theoretical Analysis", Digital Business, Springer Charm, pp. 375-389. https://doi.org/10 .1007/ 97 8-3-319-9394 0-7_15 [14] L. Greeshma, G. Pradeepini, (2016), "Unique Constraint Frequent Item Set Mining", Advanced Computing (IACC), 2016 IEEE 6th International Confer- ence on pp. 68-72. IEEE. https://doi.org/10. 1109/iacc.2016.23 [15] A. Kaur, V. Aggarwal, S. K. Shankar, (2016), "An efficient algorithm for generating association rules by using constrained itemsets mining", Recent Trends in Electronics, Information Communication Technology (RTEICT), IEEE International Conference on (pp. 99102). IEEE. 2016. https://doi.org/10.1109/ rteict.2016.7807791 [16] Berrado, G. C. Runger, (2007), "Using metarules to organize and group discovered association rules", Data Mining and Knowledge Discovery, vol. 14, no. 3, pp. 409-431. https://doi.org/10 . 10 07/ s10618-006-0062-6 [17] W. Liu, W. Hsu, S. Chen, (1997), "Using General Impressions to Analyze Discovered Classification Rules", KDD, pp. 31-36 [18] W. Liu, W. Hsu, K. Wang, S. Chen, (1999), "Visually aided exploration of interesting association rules", Methodologies for Knowledge Discovery and DataMining, Springer, pp. 380-389. https://doi. org/10.10 07/3-54 0-4 8912-6_52 [19] B. Liu, W. Hsu, S. Chen, Y. Ma, (2000), "Analyzing the subjective interestingness of association rules", Intell. Syst. Their Appl. IEEE, vol. 15, no. 5, pp. 47-55. https://doi.org/10.1109/5254.889106 [20] B. Padmanabhan, A. Tuzhilin, (2000), "Small is beautiful: discovering the minimal set of unexpected patterns", Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and 180 Informatica 44 (2020) 167-181 J.C.D. Vera et al. 95 90 a g 85 "3 s 80 75 95 79 94 94 92 75 MinMax Reliable CRS MetaRules Closed Methods GB Figure 4: Pruning Ratio for different approaches data mining, pp. 54-63. https://doi.org/10. 1145/347090.347103 [21] K. Wang, Y. Jiang, L. V. Lakshmanan, (2003), "Mining unexpected rules by pushing user dynamics", Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 246-255. https://doi.org/10.114 5/ 956750.956780 [22] Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal, (2000), "Mining minimal nonredundant association rules using frequent closed itemsets", Proc. International Conference on Computational Logic (CL 2000), pp. 972-986. https:// doi.org/10.1007/3-54 0-44 957-4_65 [23] M. Quadrana, A. Bifet, R. Gavalda, (2015), "An efficient closed frequent itemset miner for the MOA stream mining system", AI Communications 28.1: pp. 143-158. https://doi.org/10. 3233/aic-140615 [24] L. Greeshma, G. Pradeepini, (2016), "Mining Maximal Efficient Closed Itemsets Without Any Redundancy", Information Systems Design and Intelligent Applications. Springer India. pp. 339-347. https:// doi.org/10.1007/978-81-322-2755-7_36 [25] G. Gasmi, S. B. Yahia, E. M. Nguifo, Y. Slimani, (2005), "A New Informative Generic Base of Association Rules", Advances in Knowledge Discovery and Data Mining, pp. 81-90, Springer Berlin Heidelberg. https://doi.org/10.1007/11430919_11 [26] C. L. Cherif, W. Bellegua, S. Ben Yahia, G. Guesmi, (2005), "VIE-MGB: A Visual Interactive Exploration of Minimal Generic Basis of Association Rules", Proc. International Conferences on Concept Lattices and Applications (CLA 2005), pp.179-196. [27] P. Fournier-Viger, Wu C.-W., V. S. Tseng, (2014), "Novel Concise Representations of High Utility Itemsets using Generator Patterns", Proc. 10th International Conference on Advanced Data Mining and Applications, Springer LNAI. https://doi.org/10. 10 07/97 8-3-319-14 717-8_3 [28] Y. Xu, Y. Li, G. Shaw, (2011), "Reliable representations for association rules", Data and Knowledge Engineering, vol. 70, no. 6, pp. 555-575. https://doi. org/10.1016/j.datak.2 011.02.003 [29] Phan-Luong, (2001), "The representative basis for association rules", Proc. IEEE. International Conference on Data Mining (ICDM 2001), pp. 639-640. https: //doi.org/10.110 9/icdm.2 0 01.98 958 8 [30] B. Baesens, S. Viaene, and J. Vanthienen, (2000), "Post-processing of association rules", DTEW Res. Rep. 0020, pp. 118. [31] J. Hipp, U. Gntzer, (2002), "Is pushing constraints deeply into the mining algorithms really Knowledge Redundancy Approach to Reduce Size in. Informatica 44 (2020) 167-181 181 100 80 a CÜ eg ö "(3 3 60 40 20 Reliable K B R6rules ■ KBRgrul es ■ KBR\2rul es ■ K B Ri5rul es 0.3 0.4 0.5 0.6 Support 0.7 Figure 5: Reliable vs KBR pruning ratio what we want?: an alternative approach for association rule mining", ACM SIGKDD Explorations 4(1), pp.50-55. https://doi.org/10.114 5/ 568574.568582 [32] R. J. Bayardo, (2005), "The Hows, Whys, and Whens of Constraints and Itemset and Rule Discovery", Constraint-Based Mining and Inductive Databases LNCS3848, Springer: pp.1-13. https://doi. org/10.1007/11615576_1 [33] W. Armstrong, (1974), "Dependency structures of database relationships", IFIP Congress, pp. 580-583. [34] Tirnuc, Cristina and Balcázar, José L. and Gómez-Pérez, Domingo, (2020), "Closed-SetBased Discovery of Representative Association Rules", International Journal of Foundations of Computer Science, vol. 31, no.1, pp. 143-156. https://doi.org/ 10.1142/s0129054120400109 [35] D. Maier, (1983), "Theory of Relational Database". [36] W. A. Kosters, W. Pijls, V. Popova, (2003), "Complexity analysis of depth first and fp-growth implementations of apriori", Machine Learning and Data Mining in Pattern Recognition, Springer, pp. 284-292. https: //doi.org/10.1007/3-54 0-4 50 65-3_2 5 [37] H. Toivonen, M. Klemettinen, P. Ronkainen, K. Ht-nen, H. Mannila, (1995), "Pruning and grouping discovered association rules", MLnet Wkshp. on Statistics, Machine Learning, and Discovery in Databases. [38] C. A. R. Hoare, (1972), "An axiomatic basis for computer programming", Communications of the ACM, 12, pp. 334-341. [39] C. A. Furia, B. Meyer, S. Velder, (2014), "Loop invariants: Analysis, classification, and examples", ACM Computing Surveys (CSUR), vol. 46, no 3, p. 34. https://doi.org/10.114 5/250 6375 [40] Y. Xu, Y. Li, G. Shaw, (2011), "Reliable representations for association rules". Data & Knowledge Engineering, 70(6), 555-575. Elsevier. https://doi. org/10.1016/j.datak.2 011.02.003 [41] Djenouri, Y., Belhadi, A., Fournier-Viger, P., Lin, J. C. W. (2018). Discovering strong meta association rules using bees swarm optimization. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (2018, June) (pp. 195-206). Springer, Cham. https://doi.org/10 . 1007/ 978-3-030-04503-6_21 182 Informatica 44 (2020) 167-181 J.C.D. Vera et al. https://doi.org/10.31449/inf.v44i2.2385 Informatica 44 (2020) 183-198 183 Hybrid Bees Approach Based on Improved Search Sites Selection by Firefly Algorithm for Solving Complex Continuous Functions Mohamed Amine Nemmich and Fatima Debbat Department of Computer Science, University Mustapha Stambouli of Mascara, Mascara, Algeria E-mail: amine.nemmich@gmail.com, debbat_fati@yahoo.fr Mohamed Slimane Université de Tours, Laboratoire d'Informatique Fondamentale et Appliquée de Tours (LIFAT), Tours, France E-mail: mohamed. slimane@univ-tours.fr Keywords: optimization, bees algorithm, firefly algorithm, memory scheme, adaptive neighborhood search, swarm intelligence, bio-Inspired computation Received: July 25, 2018 The Bees Algorithm (BA) is a recent population-based optimization algorithm, which tries to imitate the natural behavior of honey bees in food foraging. This meta-heuristic is widely used in various engineering fields. However, it suffers from certain limitations. This paper focuses on improvements to the BA in order to improve its overall performance. The proposed enhancements were applied alone or in pair to develop enhanced versions of the BA. Three improved variants of BA were presented: BAMS-AN, HBAFA and HFBA. The new BAMS-AN includes memory scheme in order to avoid revisiting previously visited sites and an adaptive neighborhood search procedure to escape from local optima during the local search process. HBAFA introduces the Firefly Algorithm (FA) in local search of BA to update the positions of recruited bees, thus increasing exploitation in each selected site. The third improved BA, i.e. HFBA, employs FA to initialize the population of bees in the BA for a best exploration and to start the search from more promising regions of the search space. The proposed enhancements to the BA have been tested using several continuous benchmark functions and the results have been compared to those achieved by the standard BA and other optimization techniques. The experimental results indicate that the improved variants of BA outperform the standard BA and other algorithms on most of the benchmark functions. The enhanced BAMS-AN performs particularly better than others improved BAs in terms of solution quality and convergence speed. Povzetek: Za reševanje kompleksnih zveznih funkcij so razvili nov pristop na osnovi hibridnega čebeljega algoritma (BA) in algoritma Firefly. 1 Introduction Metaheuristic algorithms generally mimic the more successful behaviors in nature. These methods present best tools in reaching the optimal or near optimal solutions for complex engineering optimization problems [1]. As a branch of metaheuristic methods, swarm intelligence (SI) is the collective behavior of populated, self-organized and decentralized systems. It concentrates specifically on insects or animals behavior in order to develop different metaheuristics that can imitate the capabilities of these agents in solving their problems like nest building, mating and foraging. These interactions have been effectively appropriated to solve large and complex optimization problems [2]. For instance, the behaviors of social insects, such as ants and honey bees, can be patterned by the Ant Colony Optimization (ACO) [3] and Artificial Bee Colony (ABC) [4] algorithms. These methods are generally utilized to describe effective food search behavior through self-organization of the swarm. In SI, honey bees are one of the most well studied social insects. Furthermore, it is in a growing tendency in the literature for the past few years and it will continue. Many intelligent popular search algorithms are developed such as Honey Bee Optimization (HBO), Beehive (BH), Honeybees Mating Optimization (HBMO), Bee Colony Optimization (BCO), Artificial Bee Colony (ABC) and the Bees Algorithm (BA) [5]. The Bees Algorithm (BA) is one of optimization technique that imitates the foraging behavior of honeybee in nature to solve optimization problems [6]. The main advantage of Bees Algorithm is has power and equilibrate in local search (exploitation) and global random search, (exploration), where both are completely decoupled, and can be clearly varied through the learning parameters [7]. It is very efficient in finding optimal solutions and overcoming the problem of local optima, easy to apply, and available for hybridization combination with other methods [5]. The BA has been successfully applied in many different engineering problems, such as supply chain 184 Informatica 44 (2020) 183-198 M.A. Nemmich et al. optimization [8], production scheduling [9], numerical functions optimization [5, 6, 7, 10], solving timetabling problems [11], control system tuning [12], protein conformation search [13], test form construction [14], Placements of FACTS devices [15], pattern recognition [16], robotic swarm coordination [17], data mining [18, 19, 20], chemical process [21], mechanical design [22], wood defect classification [23], Printed Circuit Board (PCB) assembly optimization [24], image analysis [25], and many other applications [26]. The BA has attracted attention of many researchers in different fields since it has been proved to be efficient and robust optimization tool. It can be split up into four components: parameter tuning, initialization of population, exploitative neighborhood or local search (intensification) and exploratory global search (diversification) [26]. However, despite different successful applications of the BA, the algorithm has some limitations and weaknesses. The algorithm efficiency is much influenced by initialization of the different parameters that need to be tuned. Additionally, the BA suffers from slow convergence caused by many repeated iterations in local search [5]. Many different investigations have been made to improve BA performance. Certain of these studies focus on the parameter tuning and setting component [27, 28]. Others developed different concepts and techniques for the local search neighborhood stage [5, 7], or global search stage [9] or for both the local and global stage [29]. Nonetheless, limited interest has been given to the improvement of the initialization stage. In order to deal with some weaknesses of BA, this paper considers different improvements based on other strategies and procedures. Hybridization of different techniques may improve the solutions quality and enhance the efficiency of the algorithms. The present work is an extension of the methods presented in [30]. The Firefly Algorithm (FA), which is swarm intelligence metaheuristic for optimization problems that based upon behavior and motion of fireflies [31], is applied to initialize the bee population of Basic BA for a best exploration of research space and start the search from more promising locations. FA is also introduced in the local search part of Basic BA in order to increase exploitation in each research zone. As a result, two BA variants called the Hybrid Firefly Bee Algorithm (HFBA) and Hybrid Bee Firefly Algorithm (HBAFA), respectively. We also investigate the behavior of local search and global search of the BA by introducing two strategies: memory scheme (local and global memories) to overcome repetition and unnecessarily spinning inside the same neighborhood and avoid revisiting previously visited sites, and adaptive neighborhood search procedure to jump from the sites of similar fitness values, thus improving the convergence speed of the BA to the optimal solution. This improved variant of BA is called, Bees Algorithm with Memory Scheme and Adaptive Neighborhood (BAMS-AN). The rest of this article is organized as follows. Section 2 provides a description of Basic BA, FA and three improved variants of BA with different strategies. Section 3 presents and discusses the computational simulations and results obtained on benchmark instances, while Section 4 presents our conclusions and highlights some suggestions and future research directions. 2 Materials and methods 2.1 The standard bees algorithm The Bees Algorithm (BA) is a bee swarm-based metaheuristic algorithm, which is derived from the food foraging behavior of honey bees in nature. It was originally proposed by Pham et al. [6]. The BA starts out by initializing the population of scouts randomly on the space of search. Then the fitness of the points (i.e. solutions) inspected by the scouts is evaluated. The scouts with the highest fitness are selected for neighborhood search (i.e. local search) as "selected bees" [6]. To avoid duplication, a neighborhood called a "flower patch" is created around every best solution; furthermore the forager bees are recruited and affected randomly within the neighborhoods. If one of the recruited bees lands on a patch of higher fitness than the scout, that recruited bee turns into the new scout and participates in the waggle dance in the next generation. This step is called local search or exploitation [7]. Finally, the remaining of the colony bees (i.e. population) is assigned around the space of search scouting in a random manner for new possible solutions. This activity is called global search (i.e. exploration). In order to establish the exploitation areas, these solutions with those of the new scouts are calculated with a number of better solutions being reserved for the succeeding learning cycle of the BA. This procedure is repeated in cycles until it is necessary to converge to the optimal global solution [5]. The Bees Algorithm detects the most promising solutions, and explores in selective manner their neighborhoods to find the global minimum of the objective function. When the best solutions are selected, the BA in its basic version makes a good balance between a local (or exploitative) search and a global (or exploratory) search. The both develop random search [7]. The pseudo-code of the Standard Bees Algorithm is given below in Figure 1. A certain number of parameters are required for the BA, called: number of the scout bees or sites (n), (m) sites selected for local search among n sites, (e(2«Xi ) min F = -20e ' 10 - e 10 + 20 + e G [-32, 32] MN l Griewank TOD minF = 1 f(x, lOO)2 ñco^' 4000' ' U \ 4Ï+T J G [-600, 600] MN B Hypersphere TOD 10 min F = £ x f i = 1 G [-100, 100] US 9 Rastrigin TOD 10 . . min F = 100+£ (x2 - lOcos^,. )) i=1 G [-5.12, 5.12] MS TO Rosenbrock TOD min F = £ [l00(x2 - x,+1 )2 +(l - x, )2 ] i=1 G [-50, 50] UN Interval, full equation, dimensions, theory global optimal solutions and properties of these functions are shown in Table 1 as reported in [4]. Readers are referred to [47] for more details on the benchmark functions used in present investigation and its characteristics. 3.2 Performance measures Performance assessment of the different algorithms is based on two metrics: namely, the accuracy and the average evaluation numbers and results were compared to Standard Bees Algorithm (BA), Firefly Algorithm (FA), and other well known optimization techniques such as Particle Swarm Optimization (PSO), Evolutionary Algorithm (EA). These are given in Tables 3, 4 and 5. The accuracy of algorithms (E) was defined as the difference of the fittest solution obtained by algorithms and the value of the global optimum. It was chosen as a performance indicator to ascertain the quality of the solutions obtained. According to this approach, the more accurate results are closer to zero. However, the number of function evaluations (NFE), which is the number of times any benchmark problem had to be assessed to grasp the optimal solution, was used to evaluate the convergence speed. If the algorithm could not find E less than 0.001, the final fitness value is recorded with maximum function evaluations. Lower value of NFE for a method on a problem means the method is faster in solving that problem. The process will run until stopping criteria are met. Each time, the optimization algorithm is run until the accuracy (error) is less than 0.001, or the maximum number of iterations (5000) is elapsed. For each configuration, 50 independent minimization trials are performed on each benchmark function. 3.3 Parameters settings In order to obtain a reliable and fair comparison amongst standard and novel improved BA, the same parameters and values were utilized for all benchmarks to achieve acceptable results within the required tolerance without careful tuning, except for the parameter a, p and y which were only set for the two variants of BA based on FA and FA. The Firefly Algorithm (FA) was implemented in this study according to the method described in [31]. The parameter configuration recommended by [32] is used. The parameters of the implemented BA used in this paper have been empirically tuned and the optimal parameter settings are used to find the optimal solution. The Standard BA implemented in this work is called BA1. The simulation results of our experiment are compared with ones reported in [7]. This comparison was carried out between the BA1, FA and the proposed Hybrid Bees Approach Based on Improved Search Sites ... Informatica 44 (2020) 183-198 193 variants of BA with the standard BA introduced in [7] (noted by BA2) and other well-known optimization techniques such as EA and PSO. Pham and Castellani (2009) were analyzed in [7] the learning results of EA, PSO and BA2 algorithms, and the parameter setting that gives the most consistent performance across all the benchmarks was found for each algorithm. The parameter settings of the algorithms are given in Table 2. Given an optimization algorithm, the No Free Lunch Theorem of optimization (NFLT) entails that there is no universal tuning that guarantees top performance on all possible optimization problems [48]. The algorithms developed in this study were implemented using Octave programming language and all experiments were performed on Intel Core i3-370M 2.4 GHz and 4 GB RAM running a 64-bit operating system. 4 Results and discussion In this work, initialization stage, local search and global search parts of BA were investigated. The attention was on improving the performance of the BA by increasing the accuracy and the speed of the search. The comparisons were carried out between improved variant of BAs (BAMS-AN, HFBA and HBAFA), the standard BA (BA1) and FA. Then, the performance of these algorithms is compared against other well-known optimization techniques presented in [7] such as standard BA (BA2), EA and PSO. The same stopping criterion is used for all the algorithms (see Section. 3.2). The Average accuracy (^E) and their standard deviation (cE), and the mean numbers of function evaluations (Mean) and standard deviation (Std) for 50 runs are compared in Tables 3 and 4, respectively. Bold values represent the best results. The second best results are in italics. Additionally, Table 5 displays the percentage of improvements in term of reducing the number of evaluations (NFE) when comparing the variants BAs with FA and the Standard BA. First, we compare the performance of our implemented BA (BA1) to the Standard BA exposed by Pham and Castellani (BA2) in [7]. The two algorithms use different combinations of parameters (Table 2). After finishing the simulation, it is found from Table 3 that the two parameter combinations are capable to find the global optimum on 7 cases out of 10 benchmark functions. For the rest, the average accuracy (^E) found for the Rastrigin function with 10 dimensions was better for the BA1 compared with the BA2 with 7.8601 against 8.8201 respectively. However, BA2 performed slightly better than BA1 on Rosenbrock 10D with 0.0293 against 0.1093, respectively. For Griewank 10D, the result is almost slightly the same for both. In order to examine and compare the convergence speed of these algorithms, the average number of function evaluations (Mean) is considered. On the other hand, evaluate which parameter combination is better than another. Parameter combination with the minimum number of function evaluations achieving global optimum over all benchmarks and all dimensions is distinguished as the best. It is immediately obvious from Table 4, that BA1 outperformed BA2, except for Schaffer 2D and Griewank 10D functions. Hence, the combination of parameters used for our implementation of BA (BA1) algorithm gave the best results in terms of number of function evaluations. For this reason, it has been selected for the proposed variants of BA. This suggests that if a user faces a problem, this parameter combination ought to be used as default setting. The second comparison of performances is achieved between the improved algorithms (BAMS-AN, HBAFA, HFBA), BA1 and FA and others well-known optimization algorithms EA, PSO and BA2. Algorithm Parameters Value BA1, Scout bees (n) 12 BAMS-AN, Elite sites (e) 2 HBAFA, Best sites (m) 6 HFBA - Recruited elite (nre) 29 Common Recruited best (nrb) 9 Parameters Stagnation limit (stlim) 10 Neighborhood size (nghinit) (Search range)/2 Shrinking factor (sf) 0.8 Evolution cycles (ec) 5000 FA, HFBA, Population size 40 HBAFA - Initial attractiveness (P0) 1 Common Minimum value of beta (Pmin) 0.2 Parameters Light absorption coefficient (y) 1 Control parameter (a) 0.2 FA cycles (MaxIterFA) 12500 HFBA & FA cycles in inner loop of [3 10] HBAFA HFBA FA cycles in inner loop of 5 HBA-FA BA2 [7] Scout bees (n) 11 Elite sites (e) 2 Best sites (m) 6 Recruited elite (nre) 30 Recruited best (nrb) 10 Stagnation limit (stlim) 10 Neighborhood size (nghinit) Not presented Shrinking factor (sf) 0.8 Evolution cycles (ec) 5000 EA [7] Population size 100 Evolution cycles (max number) 5000 Children per generation 99 Mutation rate (variables) 0.8 Mutation rate (mutation width) 0.8 Initial mutation interval width a 0.1 (variables) Initial mutation interval width p 0.1 (mutation width) PSO [7] Population size 100 Connectivity (no. of neighbors) 2 Maximum particle velocity (u) 0.05 cl 2.0 c2 2.0 wmax 0.9 wmin 0.4 PSO cycles 5000 Table 2: Parameters setting of the algorithms. 194 Informatica 44 (2020) 183-198 M.A. Nemmich et al. The algorithms are tested first for accuracy, and compared on the benchmark functions. According to Table 3, BA and its variants found the minimum results for the most of the functions with good standard deviations. BAMS-AN is the best performing approach which have been successful in achieving the minimum error in 8 functions out of 10 benchmarks followed by standard HBAFA, HFBA and BA in 7 out of 10 functions. Each of PSO, EA and FA has been capable of obtaining the theoretical value of 0 in, respectively, 6, 5 and 4 functions out of 10 benchmarks. BAMS-AN did not reach the minimum average accuracy only in Schwefel 2D, Rastrigin 10D functions while the other variants of BA (Standard BA, HBAFA and HFBA) are not capable of finding function minimum on Griewank, Rastrigin and Rosenbrock benchmarks with 10 dimensions. PSO cannot find the global minimum for Schwefel 2D, Rastrigin 10D, Griewank 10D and Rosenbrock 10D functions. In addition to these benchmarks, EA is not capable of finding function minimum on Schaffer 2D benchmark and FA on Easom 2D, Schwefel 2D and Ackley 10D benchmark functions. All BA algorithms managed to achieve the theoretical optimal value with the Schwefel 2D function except for BAMS-AN where the mean value obtained is 0.0003 which is very close to the theoretical optimum value. However, BAMS-AN can find the optimal values for Griewank 10D and Rosenbrock10D functions or near optimal value for Rastrigin 10D function with 0.0002, while the other variants of BA and other algorithms fail. HBAFA comes in second place (after BAMS-AN) and outperformed other compared methods (EA, PSO, FA, BA1, BA2 and HFBA) when solving the Rosenbrock 10D function with 0.0234 which is closest to the theoretical optimum value of 0, followed by HFBA with 0.1062. BA2 and BA1 become the third and the fourth best algorithm with 0.0241 and 0.1093, respectively. However, the results found using the FA and EA were not good and far from the theoretical optimal value (0.00) with 10.1469 and 61.5213, respectively. The Rosenbrock 10D benchmark was hard for the EA and the FA algorithms. The Rastrigin 10D function proved to be difficult tasks, particularly for the EA and none of the algorithms achieved the best minimization performance. HBAFA has become the second best algorithm after BAMS-AN algorithm with 2.9848 and PSO has become the third best with 4.8162. The best fourth one is FA where it managed to achieve 6.0693. The second best results for the Griewank benchmark function in 10 dimensions are when applying BA2 with 0.0089. HFBA, HBAFA and BA1 perform almost the same on Griewank 10D function with 0.0120 but slightly better than FA with 0.0178. The results in Goldstein & Price 2D, Martin & Gaddy 2D and Hypersphere 10D functions indicated that all algorithms have achieved the optimal values. Therefore, it is noticeable that BA, HBAFA and HFBA share the only performing approaches in Schwefel 2D function where all of them managed to find the theoretical optimal solution. For Easom 2D function, all algorithms managed to achieve the theoretical optimal value, except for FA where the mean value obtained is -0.7996. Besides the average error, the assessment of the performance involves also the average number of evaluations. To compare the convergence speed of the algorithms (FA, BA1, BAMS-AN, HBAFA and HFBA), we calculated and compared the means and standard deviation of number of evaluations (Mean and Std.) generated by each algorithm after 50 runs. It can be clearly observed in Table 4 that the implementation of BA that utilizes the memory scheme and adaptive neighborhood search (BAMS-AN) achieved the smallest expected numbers of function evaluations (the smallest values of Mean) on 6 out of the 10 tested problems, followed by the modified BA by improving the local search part with FA (HBAFA) on 4 cases out of 10. The proposed BAMS-AN performed significantly better than the other methods on most of the test functions such as Easom 2D, Goldstein & Price 2D, Griewank 10D, Hypersphere 10D, Rastrigin 10D and Rosenbrock 10D (see Table 4). However, the results found from Martin & Gaddy 2D, Schaffer 2D, Schwefel 2D and Ackley 10D using the HBAFA algorithm were better than the proposed BAMS-AN, basic BA and other approaches. Thus, it can be concluded that BAMS-AN and HBAFA algorithms were able to converge to the optimal solution much faster than HFBA, BA, FA and other approaches. The third best result is for HFBA algorithm which was capable to reduce NFE and performed better in 8 and 7 out of 10 benchmark functions compared to BA1 and BA2, respectively. BA2 produces better results than HFBA on Schwefel 2D, Griewank 10D and Rosenbrock 10D benchmarks. In the meantime, BA1 is selected as the best performing method in only Schwefel 2D and Griewank 10D functions compared to HFBA algorithm. Comparing HFBA, BA1 and with FA, PSO and EA algorithms, it is apparent that on the most benchmarks, the majority of the results for HFBA and BA were found to be better than the results of the other algorithms, except for Rastrigin 10D function where PSO followed by FA excelled better. On the whole, BAMS-AN and HBAFA can be classified as the first and the second best performers in terms of reduced number of evaluations, respectively, followed by HFBA and BA. From the observed Table, it is also obvious that PSO, EA and FA have a fairly low convergence rate during the entire process compared with BA variants but PSO performs slightly better than EA and FA. Table 5 presents the percentage of improvements in term of reducing NFE when comparing the enhanced variants of BA with the Standard BA. Based on these results, BAMS-AN is the most efficient approach. If NFE of all the functions were totaled, BAMS-AN is capable to minimize the total NFE to 86.39% and 52.62% compared to FA and Standard BA, respectively. The improvement percentage shows the superiority of the BAMS-AN algorithm and how the integration of memory scheme in local and global Hybrid Bees Approach Based on Improved Search Sites ... Informatica 44 (2020) 183-198 195 No Benchmark EA PSO FA BA2 BA1 BAMS-AN HFBA HBAFA ^E ctE ^E ctE ^E ctE ^E ctE ^E ctE ^E ctE ^E ctE ^E ctE 1 Easom 2D 0.0000 0.0000 0.0000 0.0000 -0.7996 0.4039 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 0.0000 2 Goldstein & Price 2D 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.0000 3 Martin &Gaddy 2D 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 4 Schaffer 2D 0.0009 0.0025 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 5 Schwefel 2D 9.4751 32.4579 4.7376 23.4448 59.2194 64.4283 0.0000 0.0000 0.0000 0.0000 0.0003 0.0007 0.0000 0.0000 0.0000 0.0000 6 Ackley 10D 0.0000 0.0000 0.0000 0.0000 0.0010 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0000 0.0000 0.0000 0.0000 7 Griewank 10D 0.0210 0.0130 0.0199 0.0097 0.0178 0.0223 0.0089 0.0059 0.0120 0.0081 0.0000 0.0001 0.0120 0.0075 0.0127 0.0080 8 Hypersphere 10D 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 9 Rastrigin 10D 17.4913 7.3365 4.8162 1.4686 6.0692 2.7579 8.8201 2.2118 7.8601 2.1010 0.0002 0.0004 7.2632 2.1010 2.9848 0.8287 10 Rosenbrock 10D 61.5213 132.6307 1.7879 1.5473 10.1469 39.2052 0.0293 0.0068 0.1093 0.5627 0.0000 0.0003 0.1062 0.5637 0.0234 0.0022 Table 3: Accuracy of improved BA algorithms compared with FA and BA and other well-known optimization techniques. No Benchmark EA PSO FA BA2 BA1 BAMS-AN HFBA HBAFA Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. Mean Std. 1 Easom 2D 36440 28121 97136 45642 189158.40 159102.21 3866 819 3326.9 415.2 2126 2260 3316.90 469.69 2658 858 2 Goldstein & Price 2D 5816 2259 4836 2361 57360 26870.34 2714 454 2658.4 383.4 1 522 660 2568.4 372.1 2332 479 3 Martin &Gaddy 2D 3248 1602 2512 781 12729.60 9644.23 2248 329 2180.3 363.7 2130 3034 1992.8 585.4 880 532 4 Schaffer 2D 219376 183373 35474 27151 121040.80 100516.67 27890 27335 41472.7 38237.9 12618 4330 37077.2 34805.7 10298 11777 5 Schwefel 2D 51468 133632 84572 90373 301726.40 193824.47 5006 2110 3963.2 590.1 124108 0 4027.1 397.9 2390 1080 6 Ackley 10D 50344 3949 261608 9165 497459.20 4413.18 12186 3553 11836.2 3532.3 18398 7302 10402.2 516.4 9938 422 7 Griewank 10D 490792 65110 497714 16164 474366.40 43760.27 447064 126512 460894.8 102457.2 245 426,34 198326,6 470787.2 118.6821 455307 127061 8 Hypersphere 10D 36376 2736 223082 10872 356958.40 6627.86 8288 403 8212 425 5944 3926 7670.02 384.4 6962 332 9 Rastrigin 10D 500000 0 500000 0 500000 0.00 500000 0.00 500180 9.9 15 118 14186 500300 7.92 500000 0.00 10 Rosenbrock 10D 500000 0 500000 0 500000.00 0.00 500000 0 500000 0 19 642 6666 491924.1 57954.4 500000 0.00 Table 4: Mean number of function evaluations of improved BA algorithms compared with FA and BA and other well-known optimization techniques. searches with adaptive neighborhood search procedure in basic BA affects the results. The second best algorithm is HBAFA with the average percentage improvement of 67.84% and 23.94% compared to FA and BA, respectively. This variant of BA uses FA to improve the local search stage of Basic BA. On average, the capability to reduce the total NFE when applying Firefly Algorithm in initialization step of BA is 64.42% and 3.95% compared to FA and Standard BA respectively, which ranks HFBA in the third place. Therefore, the enhanced variants of BA delivered a highly significant improvement in terms of overall performance. An interesting finding is that the introduced strategies and procedures helped BA to converge to good solutions quickly and robustly. 196 Informatica 44 (2020) 183-198 M.A. Nemmich et al. No Benchmark Improvement % BAMS-AN Improvement % HFBA Improvement % HBAFA FA BA FA BA FA BA 1 Easom 2D 98.88 36.08 98.25 0.30 98.59 20.11 2 Goldstein & Price 2D 97.35 42.74 95.52 3.39 95.93 12.28 3 Martin &Gaddy 2D 83.27 2.31 84.34 8.60 93.09 59.64 4 Schaffer 2D 89.58 69.58 69.37 10.60 91.49 75.17 5 Schwefel 2D 58.867 -96.81 98.67 -1.61 99.21 39.70 6 Ackley 10D 96.30 55.44 97.91 12.12 98.00 16.04 7 Griewank 10D 48.26 46.75 0.75 -2.15 4.02 1.21 8 Hypersphere 10D 98.33 27.62 97.85 6.60 98.05 15.22 9 Rastrigin 10D 96.98 96.98 -0.06 -0.02 0.00 0.04 10 Rosenbrock 10D 96.07 96.07 1.62 1.62 0.00 0.00 Average improvement (%) 86,39 52,62 64,42 3,95 67,84 23,94 Table 5: Percentage of improvements of BA variants in term of Mean numbers of function evaluations in comparison to the BA and FA. 5 Conclusion In this paper, enhancements to the Bees Algorithm (BA) have been presented for solving continuous optimization problems. The basic BA was modified first to find the most promising patches, by using memory scheme in order to avoid revisiting previously visited sites, thus increasing the accuracy and the speed of the search, followed by adaptive neighborhood search procedure to escape from local optima during the local search process. This implementation is called BAMS-AN. The second improved version of BA, called HBAFA, uses Firefly Algorithm (FA) to update the positions of recruited bees in the local search part of BA, thus improving the convergence speed of the BA to good solutions. The third variant of BA, i.e. HFBA, developed a new strategy based on Firefly Algorithm (FA) to initialize the population of bees in the Basic BA, enhance the population diversity and start the search from more promising locations. We evaluated the improved algorithms on several widely used benchmark functions and compared the results with those from the basic BA, FA and other state-of-the-art algorithms found in the literature. These benchmarks cover a range of characteristics including unimodal, multimodal, separable, and inseparable. The results have shown that BAMS-AN followed by HBAFA could track the optimal solution and give reasonable solutions most of the time. By including the improvements, both the search speed was improved and more accurate results were obtained. The comparisons among BA-based algorithms showed that BAMS-AN outperformed HBAFA, HFBA and the conventional BA. The experiments have also indicated that the improved variants of BA performed much better than the standard BA, PSO, FA and EA algorithms. Testing the improved algorithms further on real world optimization problems and looking for algorithmic enhancements remains as future work. 6 References [1] Mehmet Polat Saka, O. Hasangebi, and Zong Woo Geem. Metaheuristics in structural optimization and discussions on harmony search algorithm. Swarm and Evolutionary Computation, 28: 88-97, 2016. https://doi:10.1016/j.swevo.2016.01.005 [2] Lale Ozbakir, and Pinar Tapkan. Bee Colony Intelligence in Zone Constrained Two-Sided Assembly Line Balancing Problem. Expert Systems with Applications, 38(9): 11947-11957, 2011. https://doi:10.1016/j.eswa.2011.03.089 [3] Marco Dorigo, and Gianni Di Caro. Ant Colony Optimization: A New Meta-heuristic. Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), Washington, DC, USA, https://doi:10.1109/cec.1999.782657 [4] Dervis Karaboga, and Bahriye Akay. A Comparative Study of Artificial Bee Colony Algorithm. Applied Mathematics and Computation, 214(1): 108-132, 2009. https://doi:10.1016/j.amc.2009.03.090 [5] Baris Yuce Michael S. Packianather, Ernesto Mastrocinque, Duc Truong Pham, and Alfredo Lambiase. Honey Bees In-spired Optimization Method: The Bees Algorithm. Insects, 4(4): 646662, 2013. https://doi:10.3390/insects4040646 [6] Duc Truong Pham, Afshin Ghanbarzadeh, Ebubekir Kog, Sameh Otri, Shafqat Rahim, and Muhamad Zaidi. The Bees Algorithm, A Novel Tool for Complex Optimization Problems. Proceedings of the Second International Virtual Conference on Intelligent production machines and systems (IPROMS 2006), Elsevier, Oxford. 454-459, 2006 https://doi.org/10.1016Zj.engappai.2018.04.012 [7] Duc Truong Pham, and Marco Castellani. The Bees Algorithm: Modeling Foraging Behavior to Solve Continuous Optimization Problems. Proceedings of Hybrid Bees Approach Based on Improved Search Sites ... the Institution of Mechanical Engineers Part C, Journal of Mechanical Engineering Science, 223(12): 2919-2938, 2009. https://doi:10.1243/09544062JMES1494 [8] Nuntana Mayteekrieangkrai, and Wuthichai Wongthatsanekorn. Optimized Ready Mixed Concrete Truck Scheduling for Uncertain Factors Using Bee Algorithm. Songklanakar in Journal of Science & Technology, 37(2), 2015. https://doi.org/10.4271/j2967_201308 [9] Michael S. Packianather, Baris Yuce, Ernesto Mastrocinque, Fabio Fruggiero, Duc Truong Pham, and Alfredo Lambiase. Novel Genetic Bees Algorithm Applied to Single Machine Scheduling Problem. World Automation Congress (WAC), IEEE, 906-911, 2014. https://doi:10.1109/wac.2014.6936194 [10] Mohamed Amine Nemmich, and Fatima Debbat. Bees Algorithm and its Variants for Complex Optimization Problems. Proceedings of the second International Conference on Applied Automation and Industrial Diagnostics (ICAAID17), Djelfa, Algeria. [11] Khang Nguyen, Phuc Danh Nguyen, and Nuong Tran. A hybrid algorithm of Harmony Search and Bees Algorithm for a University Course Timetabling Problem. International Journal of Computer Science Issues, 9(1): 12-17, 2012. [12] Duc Truong Pham, Ahmed Haj Darwish, and Eldaw Elzaki Eldukhri. Optimization of A Fuzzy Logic Controller Using the Bees Algorithm. International Journal of Computer Aided Engineering and Technology, 1(2): 250-264, 2009. https://doi:10.1504/ijcaet.2009.022790 [13] Nanda Dulal Jana, Jaya Sil, and Swagatam Das. Improved Bees Algorithm for Protein Structure Prediction Using AB Off-Lattice Model. Advances in Intelligent Systems and Computing Mendel, 3952, 2015. https://doi:10.1007/978-3-319-19824-8_4 [14] Pokpong Songmuang, and Maomi Ueno. Bees Algorithm for Construction of Multiple Test Forms in E-Testing. IEEE Transactions on Learning Technologies, 4(3): 209-221, 2011. https://doi:10.1109/tlt.2010.29 [15] Razali bin Idris, Azhar Khairuddin, and Mohd Wazir Mustafa. Optimal Choice of FACTS Devices for ATC Enhancement Using Bees Algorithm. International Journal of Electrical and Computer Engineering, 3(6): 1-9, 2009. https://doi.org/10.2316Zp.2012.785-026 [16] Salima Nebti, and Abdellah Boukerram. Handwritten Characters Recognition Based On Nature-Inspired Computing AndNeuro-evolution. Applied Intelligence, 38(2): 146-159, 2013. https://doi:10.1007M0489-012-0362-z [17] Aleksandar Jevtic, Álvaro Gutiérrez, Diego Andina, and Mo M. Jamshidi. Distributed Bees Algorithm for Task Allocation in Swarm of Robots. IEEE Systems Journal, 6(2): 296-304, 2012. https://doi:10.1109/jsyst.2011.2167820 Informatica 44 (2020) 183-198 197 [18] Er. Poonam, and Rajeev Dhaiya. Artificial Intelligence Based Cluster Optimization for Text Data Mining. International Journal of Computer Science and Mobile Computing, 4(9): 8-15, 2015. [19] Mohamed Amine Nemmich, Fatima Debbat, and Mohamed Slimane. A Data Clustering Approach Using Bees Algorithm with a Memory Scheme. Lecture Notes in Networks and Systems, 261-270, 2018. https://doi.org/10.1007/978-3-319-98352-3_28 [20] Hadj Ahmed Bouarara, Reda Mohamed Hamou, and Abdelmalek Amine. Text Clustering using Distances Combination by Social Bees. International Journal of Information Retrieval Research, 4(3): 34-53, 2014. https://doi:10.4018/ijirr.2014070103 [21] Marco Castellani, Q. Tuan Pham, Duc Truong Pham. Dynamic Optimization by A Modified Bees Algorithm. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, 226(7): 956-971, 2012. https://doi:10.1177/0959651812443462 [22] Abbas Moradi, Ali Mirzakhani Nafchi, and A. Ghanbarzadeh. Multi-Objective Optimization of Truss Structures Using The Bee Algorithm. ScientiaIranica. Transaction B, Mechanical Engineering, 22(5): 1789-1800, 2015. [23] Michael S. Packianather, and Bharat Kapoor. A Wrapper-Based Feature Selection Approach Using Bees Algorithm for A Wood Defect Classification System. Proceedings of Conference the 10th System of Systems Engineering Conference (SoSE), 498-503, 2015. https://doi:10.1109/sysose.2015.7151902 [24] Duc Truong Pham, Sameh Otri, and Ahmed Haj Darwish. Application of the Bees Algorithm to PCB Assembly Optimization. Proceedings of the 3rd virtual international conference on intelligent production machines and systems (IPROMS 2007), 511-516, 2007. Whittles, Dunbeath, Scotland [25] Milad Azarbad, Attaollah Ebrahimzade, and Vahid Izadian. Segmentation of infrared Images and Objectives Detection Using Maximum Entropy Method Based on the Bee Algorithm. International Journal of Computer information Systems and industrial Management Applications (IJCISIM), 3: 026-033, 2011. [26] Wasim Abdulqawi Hussein, Shahnorbanun Sahran, and Siti Norul Huda Sheikh Abdullah. The Variants of the Bees Algorithm (BA): a survey. Artificial Intelligence Review, 1-55, 2017. https://doi.org/10.1007/s10462-016-9476-8 [27] Azar Imanguliyev. Enhancements for the Bees Algorithm [dissertation]. Cardiff University at Cardiff; UK, 2017. [28] Duc Truong Pham, and Ahmed Haj Darwish. Fuzzy Selection of Local Search Sites in the Bees Algorithm. Proceedings of the 4th International Virtual Conference on Intelligent Production Machines and Systems (IPROMS 2008), 1 -14, 2008. 198 Informatica 44 (2020) 183-198 [29] Wasim Abdulqawi Hussein, Shahnorbanun Sahran, and Siti Norul Huda Sheikh Abdullah. An Improved Bees Algorithm for Real Parameter Optimization. International Journal of Advanced Computer Science and Applications, 6(10), 2015. https://doi.org/10.14569/ijacsa.2015.061004 [30] Mohamed Amine Nemmich, Fatima Debbat, and Mohamed Slimane. Hybridizing Bees Algorithm with Firefly Algorithm for Solving Complex Continuous Functions. International Journal of Applied Metaheuristic Computing (IJAMC), 11(2): 27-55, 2020. https ://doi:10.4018/IJAMC.2020040102 [31] Xin-She Yang. Nature-Inspired Metaheuristic Algorithms, 2008 [32] Xin-She Yang. Firefly Algorithms for Multimodal Optimization. Stochastic Algorithms: Foundations and Applications Lecture Notes in Computer Science, 169-178, 2009. https://doi:10.1007/978-3-642-04944-6_14 [33] Praveen Ranjan Srivatsava, B. Mallikarjun, and Xin-She Yang. Optimal Test Sequence Generation Using Firefly Algorithm, Swarm and Evolutionary Computation, 8: 44-53, 2013. https://doi.org/10.1016Zj.swevo.2012.08.003 [34] Adil Baykasoglu, and Fehmi Burcin Ozsoydan. An Improved Firefly Algorithm for Solving Dynamic Multidimensional Knapsack Problems. Expert Systems with Applications, 41(8), 3712-3725, 2014. https://doi.org/10.1016/j.eswa.2013.11.040 [35] Xin-She Yang, and Suash Deb. Eagle Strategy Using Levy Walk and Firefly Algorithms for Stochastic Optimization. Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) Studies in Computational Intelligence, 101111, 2010. https://doi.org/10.1007/978-3-642-12538-6_9 [36] Krishnanand N. Kaipa, Debasish Ghose. Glowworm Swarm Based Optimization Algorithm for Multimodal Functions with Collective Robotics Applications, Multiagent and Grid Systems, 2(3): 209-222, 2006. https://doi.org/10.3233/mgs-2006-2301 [37] K. Chandrasekaran, and Sishaj P. Simon. Network and Reliability Constrained Unit Commitment Problem Using Binary Real Coded Firefly Algorithm. International Journal of Electrical Power & Energy Systems, 43(1): 921-932, 2012. https://doi.org/10.1016/j.ijepes.2012.06.004 [38] Narwant Singh Grewal, Munish Rattan, and Manjeet Singh Patterh. A Linear Antenna Array Failure Correction with Null Steering using Firefly Algorithm. Defence Science Journal, 64(2): 136142, 2014. https://doi.org/10.14429/dsj.64.4250 [39] Anurag Mishra, Charu Agarwal, Arpita Sharma, and Punam Bedi. Optimized Gray-scale Image Watermarking Using DWT-SVD and Firefly Algorithm. Expert Systems with Applications, 41(17): 7858-7867, 2012. https://doi:10.1016/j.eswa.2014.06.011 M.A. Nemmich et al. [40] Leandro dos Santos Coelho, and Viviana Cocco Mariani. Firefly Algorithm Approach Based on Chaotic Tinkerbell Map Applied to Multivariable PID Controller Tuning. Computers & Mathematics with Applications, 64(8): 2371-2382, 2012. https://doi.org/10.1016/j.camwa.2012.05.007 [41] Mohammad Kazem Sayadi, Ashkan Hafezalkotob, and Seyed Gholamreza Jalali Naini. Firefly-Inspired Algorithm for Discrete Optimization Problems: an Application to Manufacturing Cell Formation. Journal of Manufacturing Systems, 32(1): 78-84, 2013. https://doi.org/10.1016/jjmsy.2012.06.004 [42] Ahmad Kazem, Ebrahim Sharifi, Farookh Khadeer Hussain, Morteza Saberi, and Omar Khadeer Hussain. Support Vector Regression with Chaos-Based Firefly Algorithm for Stock Market Price Forecasting. Applied Soft Computing, 13(2): 947958, 2013. https://doi:10.1016/j.asoc.2012.09.024 [43] Mimoun Younes, Fouad Khodja, and Riad Lakhdar Kherfane. Multi-Objective Economic Emission Dispatch Solution Using Hybrid FFA (Firefly Algorithm) and Considering Wind Power Penetration. Energy, 67: 595-606, 2014. https://doi.org/10.1016/j.energy.2013.12.043 [44] Qiang Fu, Zheng Liu, Nan Tong, Mingbo Wang, and Yiming Zhao. A Novel Firefly Algorithm based on Im-proved Learning Mechanism. Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science, 2015. https://doi:10.2991/lemcs-15.2015.268 [45] Sankalap Arora, and Satvir Singh. The Firefly Optimization Algorithm: Convergence Analysis and Parameter Selection. International Journal of Computer Applications, 69(3): 48-52, 2015. https://doi:10.5120/11826-7528 [46] Shuhao Yu, Shenglong Zhu, Yan Ma, and Demei Mao. A Variable Step Size Firefly Algorithm for Numerical Optimization. Applied Mathematics and Computation, 263: 214-220, 2015. https://doi:10.1016/j.amc.2015.04.065 [47] Momin Jamil, and Xin She Yang. A Literature Survey Of Benchmark Functions for Global Optimization Problems. International Journal of Mathematical Modelling and Numerical Optimization, 4(2): 150, 2013. https://doi:10.1504/ijmmno.2013.055204 [48] David Wolpert, and William Macready. (1997) No Free Lunch Theorems for Optimization. IEEE Trans-actions on Evolutionary Computation, 1(1): 67-82, 1997. https://doi:10.1109/4235.585893 https://doi.org/10.31449/inf.v44i2.2385 Informatica 44 (2020) 199-198 183 Computing Dynamic Slices of Feature-Oriented Programs with Aspect-Oriented Extensions Madhusmita Sahu and Durga Prasad Mohapatra Department of Computer Science and Engineering National Institute of Technology, Rourkela-769008 Rourkela, Odisha, India E-mail: madhu_sahu@yahoo.com and durga@nitrkl.ac.in Keywords: feature-oriented programming (FOP), aspect-oriented programming (AOP), composite feature-aspect dependence graph (CFADG), mixin layer, refinement chain Received: September 10, 2018 This paper proposes a technique to compute dynamic slices of feature-oriented programs with aspect-oriented extensions. The technique uses a dependence based intermediate program representation called composite feature-aspect dependence graph (CFADG) to represent feature-oriented software that contain aspects. The CFADG of a feature-oriented program is based on the selected features that are composed to form a software product and the selected aspects to be weaved. The proposed dynamic slicing technique has been named feature-aspect node-marking dynamic slicing (FANMDS) algorithm. The proposed feature-aspect node marking dynamic slicing algorithm is based on marking and unmarking the executed nodes in the CFADG suitably during run-time. The advantage of the proposed approach is that no trace file is used to store the execution history. Also, the approach does not create any additional nodes during run-time. Povzetek: Prispevek predstavlja izvirni pristop pri programiranju na osnovi sprejemljivk z aspektno orientiranimi podaljški. Gre za računanje dinamičnih odsekov omenjenih programov. 1 Introduction Weiser [33] first introduced the concept of a program slice. Program slicing decomposes a program into different parts related to a particular computation. A slicing criterion is used to construct a program slice. A slicing criterion is a tuple, < s, v >, consisting of a statement s, in a program and a variable v, used or defined at that statement s. Program slicing technique is employed in many areas of software engineering including debugging, program understanding, testing, reverse engineering, etc. Feature-oriented programming (FOP) is concerned with the separate definition of individual features and the composition of required features to build varieties of a particular software product. The functionalities of a software product are identified as features in FOP paradigm. FOP is used to implement software product lines and incremental designs. A family of software systems constitutes a software product line [20]. Motivation: Today, the variability of software products is crucial for successful software development. One mechanism to provide the required variability is through Software Product Lines, which is inspired by product lines used in the industry, like product lines used in the production of a car or a meal at some fast-food restaurant. Feature-Oriented Programming (FOP) approach is used to implement software product lines. Despite the advan- tages, feature-oriented programming (FOP) yields some problems in expressing features such as lack of expressing crosscutting modularity. During software evolution, a software should adapt the unanticipated requirements and circumstances. This leads to modifications and extensions that crosscut many existing implementation units. The problem of crosscutting modularity is solved by using aspect-oriented programming (AOP). Kiczales et al. [8] proposed AOP paradigm to separate and modularize the crosscutting concerns like exception handling, synchronization, logging, security, resource sharing, etc. The modularity of crosscutting concerns in FOP can be improved by integrating AOP paradigm into FOP paradigm. In dynamic slicing techniques, first an intermediate representation of the program is statically created in the form of a dependence graph. Then, the dynamic slice is computed by traversing the graph, starting from the point specified in the slicing criterion, using an algorithm. For the programs in general languages like C/C++, Java etc., a single dependence graph is created. There is no composition of features in these languages. FOP is used to develop a family of software products. In FOP (with AOP extensions), multiple dependence graphs are created depending upon the composition of features and aspects. For example, if there are four features and two aspects in a product line out of which two features and one aspect are mandatory, then there are eight possible combinations of features 200 Informatica 44 (2020) 199-224 M. Sahu et al. and aspects. Each possible combination of features and aspects creates a different product. Thus, there are eight software products in the product line. Accordingly, there are eight dependence graphs, one graph for each product. Dynamic slice for each possible combination of features and aspects is computed using the corresponding dependence graph. The dynamic slice consists of statements from the composed program that is generated after composition of features and aspects. These statements are again mapped back to the program used for composition. This mapping is not required in general languages like C/C++, Java etc. Again, feature-oriented programs have some special characteristics such as mixins, mixin layers, refinements etc. which are not present in case of general languages like C/C++, Java etc. These characteristics of feature-oriented programs require inclusion of some new nodes/edges in the dependence graph. Similarly, these characteristics require introduction of some new steps/phases in the slicing algorithm (e.g., the handling mixins, the handling of mixin layers, etc.), which are not required in the case of general languages like C/C++, Java, etc. The existing dynamic slicing algorithms for aspect-oriented programs cannot be directly applied for slicing of feature-oriented programs with aspect-oriented extensions due to the specific features of feature-oriented programs such as the presence of mixin layers, refinements of classes, refinements of constructors etc. These characteristics of feature-oriented programs requires inclusion of some new nodes/edges in the dependence graph. Similarly, these characteristics require the introduction of some new steps/phases in the slicing algorithm. Although FOP is an extension of OOP, the existing dynamic slicing algorithms for C/C++, Java cannot be directly applied for slicing of feature-oriented programs due to the presence of aforementioned specific features. Since, program slicing has many applications including testing, software maintenance etc., there is an increasing demand for slicing of feature-oriented programs. Objective: The main objectives of this work are to develop a suitable intermediate representation of feature-oriented programs with aspect-oriented extension and to propose an efficient slicing algorithm to compute dynamic slices for the above types of programs using the developed intermediate representation. A dependence graph is used to signify the intermediate representation. For a single feature-oriented program, more than one dependence graph can be obtained depending on the number of features to be composed and the number of aspects to be captured. We also aim at calculating the slice computation time for different compositions of features and different aspects captured. Organization: The organization of rest of the paper is as follows. Section 2 provides a brief introduction to feature-oriented programming (FOP) and program slicing. Section 3 discusses the construction of composite feature-aspect dependence graph (CFADG), which is a dependence based intermediate representation of feature-oriented programs containing aspects. In Section 4, the details of our proposed algorithm named feature-aspect node marking dy- namic slicing (FANMDS) algorithm, is discussed. This section also presents the space and time complexity of FANMDS algorithm. Section 5 furnishes a brief overview of the implementation of FANMDS algorithm along with experimental results. A brief comparison of the proposed work with some other related work is furnished in Section 6. Section 7 concludes the paper along with some possible future work. 2 Basic concepts In this Section, we provide some basic concepts of feature-oriented programming and outline the features of Jak language, which is a feature-oriented programming language. We also discuss the problems of feature-oriented programming and solutions to these problems through aspect-oriented programming extensions. 2.1 Feature-oriented programming (FOP) Prehofer [1] was the pioneer to coin the term feature-oriented programming (FOP). The key idea behind FOP is to build the software by composing features. Features are the basic building blocks that are used to satisfy user requirements on a software system. The step-wise refinement where features incrementally refine other features leads to a stack of features that are arranged in layers. One suitable technique to implement the features is through the use of Mixin Layers. A Mixin Layer is a static component that encapsulates fragments of several different classes (Mix-ins) to compose all fragments consistently. Several languages like Jak [2],1, Fuji2, FeatureHouse3, FeatureRuby [40, 41], FeatureC++ [5, 3, 4] support the concept of features explicitly. FeatureIDE [6]4 is a tool that supports Feature-Oriented Software Development (FOSD) in many languages. We have taken Jak program as input in our proposed approach as it is supported by Algebraic Hierarchical Equations for Application Design (AHEAD) tool suite, which is a composer. AHEAD tool suite is a group of tools to work with Jak language. Other languages have their own composers, and those composers are not a group of tools. Jak (short for Jakarta) is a language that extends Java by incorporating feature-oriented mechanisms [2]. Jak-specific tools like jampack and mixin are invoked to compose Jak files. A Jak file is translated to its Java counterpart using another tool called jak2java. The different features supported by Jak language are Super() references, an extension of constructors, declaration of local identifiers, etc. The details of Jak language and its features can be found in [2]. 1https://juliank.files.wordpress.com/2 012/0 4/ jlang1.pdf 2http://www.infosun.fim.uni-passsau.de/spl/ apel/fuji 3http://www.fosd.de/fh 4http://wwwiti.cs.uni-magdeburg.de/iti\_db/ research/featureide/deploy Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 201 Legend: f Mandatory tf Optional I I Abstract I I Concrete Figure 1: Features supported by Calculator Product Line (CPL) Legend: ^ Mandatory d Optional .Abstract I I Concrete Figure 2: Aspects captured in Calculator Product Line (CPL) Example 2.1. Calculator Product Line (CPL) [45]: This program calculates the factorial, square root and logarithmic value with base 10 of a number. This program is referred to as Calculator Product Line (CPL). A feature-tree depicts various features supported by a product line in a hierarchical manner. The feature-tree for Example 2.1 is given in Figure 1. The various aspects that are captured for Example 2.1 are given in Figure 2. Figure 3 depicts the source code for each feature given in Figure 1 and each aspect given in Figure 2. Figure 4(a) - Figure 4(b) show the resultant files generated after the composition of all features. 2.2 Problems in feature-oriented programming (FOP) Feature-oriented programming (FOP) suffers from many problems in modularization of crosscutting concerns [3, 4, 5]. The presence of these problems leads to degradation in modularity of program family members and also decrease in maintainability, customizability, and evolvability. Some of the problems of FOP are discussed below. 1. FOP is unable to express dynamic crosscutting concerns that affect the control flow of a program. It can only express static crosscutting concerns that affect the structure of the program. AOP languages can handle dynamic crosscutting concerns in an efficient manner through the use of pointcuts, advices etc. 2. FOP languages support only heterogeneous crosscut-ting concerns where different join points are provided with different pieces of codes. In contrast, AOP languages support homogeneous crosscutting concerns where different join points are provided with the same piece of code. 3. FOP suffers from excessive method extension problem when a feature crosscuts a large fraction of existing classes because of refinements. A lot of methods are to be overridden for each method on which a crosscut depends. This is because FOP is unable to modularize homogeneous crosscutting concerns. AOP uses wildcards in pointcuts to deal with this problem. 2.3 AOP extensions to FOP AOP can be used to solve the above problems of FOP by integrating AOP language features like wildcards, pointcuts, and advices into FOP languages. The different approaches used for integrating AOP language features into FOP languages are Multi Mixins, Aspectual Mixin Layers, Aspectual Mixins. More details about these approaches can be found in [5, 3, 4]. The Aspectual Mixin Layers approach is a popular one amongst all the approaches since this approach overcomes all the aforementioned problems. Other approaches overcome some of the problems. We have used the approach of aspectual mixin layers in our work. We have separated the aspects from mixin layers for easy understanding of our approach. Our mixin layers contain only a set of classes. Aspects are designed as different layers. 2.4 Program slicing Program slicing is a technique which is employed to analyze the behavior of a program on the basis of dependencies that exist between various statements. It takes out statements from a program related to a specific computation. The extracted statements constitute a slice. Thus, a slicing criterion is employed to compute a slice. A slicing criterion consists of a statement s (or location in the program) and a variable v (or set of variables), and it is represented as a tuple < s,v >. Program slicing technique can be either static or dynamic based on the input to the program. A program slicing technique is said to be static when it extracts all the statements from a program with respect to a slicing criterion irrespective of the input to the program. On the other hand, a program slicing technique is said to be dynamic when all the statements from a program are extracted with respect to a slicing criterion for a specific input to the program. The difference between static slicing and dynamic slicing can be understood by taking an example. Let us consider the example C program given in Figure 5. The static slice with respect to slicing criterion < 11, y > is depicted as the bold italic statements in Figure 6. It includes statements 1, 2, 3, 4, 6, 7, 9, and 11. The dynamic slice with respect to slicing criterion < {x = 10}, 11, y > is depicted as the bold italic statements in Figure 7. It includes statements 1, 2, 3, 6, 9, and 11. For finding the slices of a program, first an intermediate representation of the program is constructed. Then, the slices are found out by using some algorithm on the intermediate representation. There are many slicing algorithms found in the literature calcu latorAspect 202 Informatica 44 (2020) 199-224 M. Sahu et al. import java.util.Scanner; public class calc { doable n; boolean result; void enter (H System.out.println("Enter a number"); Scanner sc=new Scanner(System.in); n=sc.nextDouble(); } boolean isnegs(double x){ if(x<0) result=true; else result=false; return result; (a) Base/calc.jak public class test { static calc c=new calc() static void printtest ( ) •{ 0 : Y (b) Base/test.jak (c) sqrt/calc.jak (d) sqrt/test.jak (e) log/calc.jak (f) fact/calc.jak (g) log/test.jak (h) fact/test.jak (i) Print/test.jak ,egs(double)) S& args(n); public aspect error { pointcut pc(double n):call (boolea before (double n):pc(n){ System.out.println("Entering Error Handling Aspect"); > after (double n) returning (boolean b):pc(n){ if(b==true) System.out.println(n+" is negative"); else System.out.println(n+" is positive"); System.out.println("Exiting Error Handling Aspect"); i }_ jublic aspect print { pointcut pc():call (void *.printtest()); befort ():pc(){ System.out.println("Entering Printing Aspect")] } after () :pc<){ System.out.println("Exiting Printing Aspect"); } (j) error.aj (k) print.aj Figure 3: Jak program for Calculator Product Line (CPL) along with aspect code. Figure 3(a)-Figure 3(i) represent base code or non-aspect code written in Jak language and Figure 3(j)-Figure 3(k) represent aspect code written in AspectJ Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 203 inport java.util.Scanner; inpoirt java.lang.Math; abstract class caleiiBase { double r; boolean result; void enter(}{ System.out.println("Enter a number"); Scanner sc=new Scanner(System.in); n=sc.nextDoubleOJ sc.closeQ; } boolean isnegs(double x}{ if(x 12 abstract class calc$$sqrt extends calc$$Base { nl3 double squrt(){ double answer; 14 result=isnegs(n); 15 if(!result) 16 answer=Math.sgrt(n); else 17 answer=-l; IB return answer; } } 19 abstract class caleiflog extends calciisqrt { n20 double logten{){ double answer; result=is negs(n); lf(!result) an swer=M at h. Logie (n ) ; else } } 26 public class calc extends calcîîlog { n27 long fact(){ long answer; result=isnegs(n); if(!result){ answer=l; if[n>0){ int i=l; while(i<=n}{ answer=answer*i; i++; } } } else answer=-l; return answer; c38 abstract class testSSBase { s39 static calc c=new calc(); n48 static void printtest(){ s41 c.enter(); } } c42 abstract class test$£sqrt extends testSSBase { i»43 static void printtest(){ 44 testSSBase .printtest (); s45 System .out .prirvtln(" Finding square root"); s46 double r=c. squrt(); s47 if(r==-l) s48 System .out. print In ("not valid number"); else System.out.printIn("Square Root is "+r); c5© abstract class testSSlog extends testSSsqrt { in51 static void printtest(){ s52 testSSsqrt.printtest(); s53 System.out.printIn("Finding logarithm"); s54 double r=c,logten(); s 55 if(r==-l) s56 System.out.printIn("not valid number"); else System.out.printIn("Logarithm base 10 is "+r); c58 abstract class testSSfact extends testSSlog { n59 static void printtest(){ s68 test£Slog.printtest(); s&l System, out. println ("Finding -factorial"); s62 long r=c.fact(); s53 if(r==-l) sS4 System.out.printIn("not valid number"); else System.out.println("Factorial is "+r); } > c66 public class test extends testSSfact { m&7 static void printtest(){ 58 testiSfact .prir?ttest(); } n59 public static void main(String[ ] args)-[ s70 test.priuttestO; } J_ (a) calc.java (b) test.java as/1 public aspect error { p72 pointcut pc(double n):call (boolean *.isnegs(double)) && args(n) a73 before (double n):pc(n){ s74 System.out.printIn("Entering Error handling Aspect"); a75 after (double n) returning (boolean b):pc(n){ s76 if(b==true) s77 System.out.printIn(n+" is negative"); else 578 System.out.println(n+" is positive"); 579 System.out.printIn("Exiting Error Handling Aspect"); } (c) error.aj (d) print.aj Figure 4: Java codes generated (Figure 4(a)-Figure 4(b)) after the composition of all features depicted in Figure 1. Figure 4(c)-Figure 4(d) are the AspectJ codes for the aspects captured. 204 Informatica 44 (2020) 199-224 M. Sahu et al. #include 1 main (){ int x,y,z; 2 scanf("%d", Sx); 3 if(x<0){ 4 y=x+5; 5 z=x* 2; } 6 else if(x==0){ 7 y=x+20; 8 z=x* 8; } else { 9 y=x+4; 10 z=x*3; } 11 printf("y= %d". y); 12 } printf("z= %d". z) ; Figure 5: An example C program # include i main () { int x,y,z ; 2 scanf ("%d", &x) 3 if(x<0) { 4 y=x+5; 5 z=x* 2; } 6 else if(x== 0) { 1 y=x+20; 8 z=x* 8; } else { 9 y=x+4; 10 z=x* 3; } 11 printf(ny= %d", 7); 12 printf("z= %d", z) ; } Figure 6: Static slice with respect to slicing criterion < 11 ,y> #include 1 main () { int x,y,z; 2 scanf ("%d",&x) ; 3 if(x<0) { 4 y=x+5; 5 z=x+2; } 6 else if(x==0){ 7 y=x+20; 3 z=x*8; i i else { 9 y=x+4; 10 z=x*3; 11 i printf ("y= %d", y); 12 } printf("z= %d", z); Figure 7: Dynamic slice with respect to slicing criterion < {x = 10}, 11,y > [10, 11, 12, 14, 15, 16, 17, 19, 21, 25, 26, 37, 38, 42]. For the details of the intermediate program representations and different slicing algorithms, the readers may refer to [10, 11, 12, 14, 15, 16, 17, 19, 21, 25, 26, 37, 38, 42]. In the next section, we propose an intermediate program representation for feature-oriented programs, on which our slicing algorithm can be applied. 3 Composite feature-aspect dependence graph (CFADG): an intermediate representation for feature-oriented programs We have proposed an intermediate representation for feature-oriented programs, called Composite Feature-Aspect Dependence Graph (CFADG). CFADG is an arc-classified digraph, G = (N, E), where N is the set of vertices depicting the statements and E is the set of edges symbolizing the dependence relationships between the statements. The set E captures various dependencies that exist between the statements in various mixin layers and aspects in a feature-oriented program. CFADG is constructed based on the composition of different features and aspects captured. Thus, there will be different types of CFADGs according to the features composed and aspects captured. Figure 8 shows the CFADG for the composition given in Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 205 Figure 3. The square box with a1_in: n_in=n etc. specifies the actual and formal parameters. For example: a1_in: n_in=n specifies that n is an actual-in parameter. Similarly, a2_in: b_in=b specifies that b is an actual-in parameter, f1_in: x=n_in specifies that x is a formal-in parameter, f2_in: b=b_in specifies that b is a formal-in parameter. These notations are adopted from Horwitz et al. [47]. The construction of CFADG consists of the following steps: - Constructing Procedure Dependence Graph (PDG) for each method in a mixin. - Constructing Mixin Dependence Graph (MxDG) for each mixin. - Constructing System Dependence Graph (SDG) for each mixin layer. - Constructing Advice Dependence Graph (ADG) for each advice. - Constructing Introduction Dependence Graph (IDG) for each introduction. - Constructing Pointcut Dependence Graph (PtDG) for each pointcut. - Constructing Aspect Dependence Graph (AsDG) for each aspect. - Constructing Composite Feature Aspect Dependence Graph (CFADG) by combining all the SDGs and As-DGs. Below, we briefly explain the steps for constructing the CFADG and the pseudocode. (1) Construction of Procedure Dependence Graph (PDG) A procedure dependence graph (PDG) depicts the control and data dependence relationships that exist between the statements in a program with only one function/method/procedure. The nodes in the graph correspond to the program statements and edges correspond to the dependence relationships between the statements. (2) Construction of Mixin Dependence Graph (MxDG) A mixin dependence graph (MxDG) is used to capture all dependencies within a mixin. A MxDG has a mixin entry vertex that connects the method entry vertex of each method in the mixin by a mixin membership edge. Each method entry in the MxDG is associated with formal-in and formal-out parameter nodes. The interactions among methods in a mixin occur by calling each other. This effect of method calls is symbolized by a call node in a MxDG. Actual-in and actual-out parameter nodes are created at each call node corresponding to formal-in and formal-out parameter nodes. The effect of return statements in a MxDG is represented by joining each return node to its corresponding call node through a return dependence edge. (3) Construction of System Dependence Graph (SDG) for each Mixin Layer A single mixin layer may contain more than one mixin. A mixin may derive another mixin through inheritance. The MxDG for the derived class is constructed. The mixin membership edges connect the mixin entry vertex of derived class to the method entry vertices of all those methods that are defined and inherited in the derived class. The SDG for a mixin layer is constructed by joining all the mixin dependence graphs for that mixin layer through parameter edges, call edges and summary edges. (4) Construction of Advice Dependence Graph (ADG) An advice dependence graph (ADG) represents an advice in an aspect. The statements or predicates in the advice are represented as vertices and dependencies amongst statements are represented as edges in an ADG. Each ADG is associated with a unique vertex called advice start vertex to signify entry into the advice. (5) Construction of Introduction Dependence Graph (IDG) An introduction dependence graph (IDG) represents an introduction in an aspect. If an introduction is a method or constructor, then its IDG is similar to PDG of a method. A unique vertex, called introduction start vertex, is used in IDG to signify the entry into the introduction. (6) Construction of Pointcut Dependence Graph (PtDG) Pointcuts in an aspect contain no body. Therefore, to represent pointcuts, only a pointcut start vertex is created to denote the entry into the pointcut. (7) Construction of Aspect Dependence Graph (AsDG) An aspect dependence graph (AsDG) is used to represent a single aspect. It consists of a collection of ADGs, IDGs, PtDGs that are connected by some special kinds of edges. Each AsDG is associated with a unique vertex called aspect start vertex, to represent entry into the aspect. An aspect membership edge is used to represent the membership relationships between an aspect and its members. This edge connects the aspect start vertex to each start vertex of an ADG, IDG or PtDG. Each pointcut start vertex is connected to its corresponding advice start vertex by call edges. (8) Construction of Composite Feature-Aspect Dependence Graph (CFADG) 206 Informatica 44 (2020) 199-224 M. Sahu et al. The CFADG is constructed by combining the SDGs for all mixin layers present in the composition and the AsDGs through special kinds of edges. The SDGs for all mixin layers in a composition are connected using refinement edges, mixin call edges, mixin data dependence edges, and mixin return dependence edges. The AsDGs are connected to all the SDGs through weaving edges and aspect data dependence edges. The mixin membership edges and aspect membership edges along with mixin start vertices and aspect start vertices are removed during construction of CFADG. The CFADG for the program given in Figure 3 is shown in Figure 8. A CFADG contains the following types of edges: (a) Control dependence edge: A control dependence edge in a CFADG from a node n to a node n2 indicates that either node n2 is under the scope of node n1 or node n1 controls the execution of node n2 where node ni is a predicate node. In Figure 8, edge (m20, s21) is a control dependence edge. (b) Data dependence edge: A data dependence edge in a CFADG from a node n1 to a node n2 indicates that node n2 uses a variable that is assigned a value at node n1 or n1 creates an object o and o is used at n2. In Figure 8, edges (s21,s22), (s39, s41), (p72, a73) are data dependence edges. (c) Mixin data dependence edge: A mixin data dependence edge in a CFADG from a node n1 to a node n2 indicates that node n2 in a mixin layer defines a variable which is used at node n1 in another mixin layer. In Figure 8, edges (s5, s16) and (s39, s54) are mixin data dependence edges. (d) Aspect data dependence edge: An aspect data dependence edge in a CFADG from a node n1 to a node n2 indicates that node n2 in an aspect uses the value of a variable and that variable is defined at node n1 in a mixin. In Figure 8, edge (s5, a1_in) is an aspect data dependence edge. (e) Call edge: A call edge in CFADG from a node n1 to a node n2 indicates that node n1 calls a method defined at node n2. Both the nodes n1 and n2 are in same mixin layer. In Figure 8, edge (s41, m2) is a call edge. (f) Mixin call edge: A mixin call edge in CFADG from a node n1 to a node n2 indicates that node n1 in a mixin layer calls a method that is defined in a different mixin layer at node n2. In Figure 8, edge (s28, m7) is a mixin call edge. (g) Return dependence edge: A return dependence edge in a CFADG from node n1 to node n2 indicates that node n1 in a mixin layer returns a value to node n2 in the same mixin layer and node n2 calls a method where node n1 is present. In Figure 8, edge (s18, s46) is a return dependence edge. (h) Mixin return dependence edge: A mixin return dependence edge in a CFADG from node n1 to node n2 indicates that node n1 in one mixin layer returns a value to node n2 in another mixin layer and node n2 calls a method where node n1 is present. In Figure 8, edge (s11, s21) is a mixin return dependence edge. (i) Parameter-in edge: Parameter-in edge in CFADG is added from actual-in parameters to corresponding formal-in parameters to indicate the receipt of values from the calling method to the called method. In Figure 8, edges (s14 ^ a1_m, m7 ^ f 1_in), (s21 ^ a1_m, m7 ^ f 1_in), and (s28 ^ a1_m, m7 ^ f 1_in) are parameterin edges. (j) Parameter-out edge: Parameter-out edge is added from formal-out parameters to corresponding actual-out parameters to indicate the return of values from the called method to the calling method. If an actual parameter is modified inside a method, then the modified value becomes an actual-out parameter and the original value becomes an actual-in parameter. The parameter used to hold the value of actual-in parameter in method definition becomes a formalin parameter and the parameter used to hold the modified value becomes a formal-out parameter. In Figure 8, there are no parameter-out edges, since, in our example, no parameter is modified inside a method. (k) Summary edge: The summary edge is used to represent the transitive flow of dependence between an actual-in parameter node and an actual-out parameter node if the value of the actual-in parameter node affects the value of the corresponding actual-out vertex. In Figure 8, edges (s14 ^ a1_rn, s14), (s21 ^ a1_rn, s21), and (s28 ^ a1_rn, s28) are summary edges. (l) Message dependence edge: A message dependence edge from a node n1 to another node n2 in a dependency graph signifies that node n1 represents a statement outputting some message without using any variable and node n2 represents an input statement, a computation statement, a method call statement, or a predicate statement. In Figure 8, there exists a message dependence edge (s3, s4). Similarly, edges (s45, s46), (s53, s54), and (s61, s62) are message dependence edges. Computing Dynamic Slices of Feature-Oriented Programs with... (m) Refinement edge: A refinement edge in a CFADG from a node n1 to a node n2 indicates that node n1 in child mixin layer calls a method k() by prefacing Super() call and k() is defined at node n2 in parent mixin layer. In Figure 8, the edge (s44, m40) is a refinement edge. Similarly, edges (s68, m59), (s60, m51), and (s52, m43) are refinement edges. (n) Weaving edge: A weaving edge from a node n1 to node n2 indicates that - node n1 is a method call node and node n2 is a before advice node capturing the method called at n1 and node n2 executes before the method called by n1 executes. OR - node n1 is the last statement in before advice and node n2 is the method entry node of the method captured by the advice and node n2 executes after node n1 executes. OR - node n2 is an after advice node and node n1 is the last statement in the method captured by node n2 and node n2 executes after node n1 executes. OR - node n1 is the last statement in an after advice and node n2 is the statement followed by method call node and the method is captured by the advice and node n1 executes before node n2 executes. In Figure 8, edge (s14, a73) is a weaving edge. The brief pseudocode for constructing the CFADG for a feature-oriented program is given below, and the complete algorithm is given in Algorithm 8 in Appendix A. CFADG construction Algorithm (1) For each mixin layer (a) For each mixin i. Create mixin entry vertex ii. For each method A. Compute control and data dependences. B. Construct PDG using control & data dependence edges. iii. For each method call A. Create actual parameter vertices. iv. For each method definition A. Create method entry vertex. B. Create formal parameter vertices. v. Construct MxDG by connecting all PDGs through method call edges, parameter edges and summary edges and connecting each method vertex to mixin start vertex through mixin membership edges. (b) Construct SDG by connecting all MxDGs through method call edges, parameter edges. (2) For each aspect (a) Create aspect entry vertex. (b) For each advice Informatica 44 (2020) 199-224 207 i. Create advice start vertex. ii. Compute control and data dependences. iii. Construct ADG using control & data dependence edges. (c) For each introduction i. Create introduction start vertex. ii. If introduction is a field then Do not create any dependence graph. Else if introduction is a method then Construct IDG using control and data dependence edges. (d) For each pointcut i. Create pointcut start vertex. ii. Construct PtDG. (e) Construct AsDG by connecting advice start vertices, introduction start vertices, pointcut start vertices to aspect start vertex through aspect membership edges. (3) Remove mixin membership edges, aspect membership edges, mixin start vertices, and aspect start vertices. (4) Connect all SDGs through refinement edges, mixin call edges, mixin data dependence edges, and mixin return dependence edges. (5) Connect all AsDGs to all SDGs through weaving and aspect data dependence edges. 4 Feature-aspect node-marking dynamic slicing (FANMDS) algorithm In this section, we present our proposed algorithm for computing dynamic slices of feature-oriented programs using CFADG. We have named our algorithm Feature-Aspect Node-Marking Dynamic Slicing (FANMDS) algorithm as it is based on marking and unmarking the nodes of CFADG. Before presenting our FANMDS algorithm, we first introduce some definitions which will be used in our algorithm. 4.1 Definitions Definition 1: Defn(v): Let v be a variable or an object in program P. A node u in the CFADG is said to be Defn(v) node if u corresponds to a definition statement that defines a value to variable v or u represents a statement that creates object v. In the CFADG given in Figure 8, nodes s23, and s24 represent Defn(answer) nodes in the method logten() in mixin calc in log mixin layer. Definition 2: DefnSet(v): The set of all Defn(v) nodes is referred to as DefnSet(v). In the CFADG given in Figure 8, DefnSet(answer) = {s23, s24} in the method logten() in mixin calc in log mixin layer. Definition 3: RecDefn(v): For each variable v, RecDefn(v) represents the node corresponding to the most recent definition of v with respect to some point s in an execution. 208 Informatica 44 (2020) 199-224 M. Sahu et al. Figure 8: Composite Feature-Aspect Dependence Graph (CFADG) for the program given in Figure 4 Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 209 In the CFADG of Figure 8, RecDefn(i) is at statement s32 before while loop and it is at statement s35 during execution of while loop. Definition 4: Usage(v): Let v be a variable or an object in program P. A node u in the CFADG is said to be Usage(v) node if u represents a statement that uses the variable v or u represents a statement that uses the object v to call a method on that object or to assign the object v with another object. In the CFADG given in Figure 8, nodes s47, and s49 represent Usage(r) nodes. Similarly, node s77, and s78 are Usage(n) nodes. Definition 5: UsageSet(v): The set of all Use(v) nodes is referred to as UsageSet(v). In the CFADG given in Figure 8, and UsageSet(r) = {s47, s49}, UsageSet(n) = {s77, s78}. 4.2 Overview of FANMDS algorithm Before execution of a feature-oriented program FP, the features required for composition and the aspects to be captured are selected. Then, the selected features are composed and selected aspects are weaved. The CFADG is constructed statically only once based on the composition of selected features and weaving of selected aspects. The program is executed for a specified input. The executed nodes in CFADG are marked and unmarked during program execution depending upon the arise and cease of dependences respectively. When a statement executes a Su-per() node, it is marked by the algorithm. Also the corresponding method entry node, the associated actual and formal parameter nodes are marked. When there is an invocation of a method, the corresponding call node, the corresponding method entry node, the associated actual and formal parameter nodes are also marked. Whenever a pointcut is executed, the corresponding advice nodes are marked. When an advice is executed, the corresponding formal parameter nodes are marked. During execution, the dynamic slice of each executed statement is computed. After execution of each node and computation of dynamic slice at that node, the algorithm unmarks it. Let dyn_slice(u) denote the dynamic slice with respect to the most recent execution of node u. Let (ei, u), (e2,u),..., (ek, u) be all the marked predecessor nodes of u in the CFADG after execution of node u. The dynamic slice with respect to the present execution of node u is computed as dyn_slice(u) = {u, ei, e2,. .., ek} U dyn_slice(ei) U dyn_slice(e2) U ... U dyn_slice(ek). Our FANMDS algorithm computes the dynamic slice with respect to the specified slicing criterion by simply looking up the corresponding dyn_slice computed during run-time. Below, we present the pseudocode of our FANMDS algorithm in brief. Algorithm 9 in Appendix B presents our FANMDS algorithm in detail. Feature-Aspect Node-Marking Dynamic Slicing (FANMDS) Algorithm (1) CFADG Construction: Construct the CFADG for the given feature-oriented program with aspect-oriented extensions, statically only once. (2) Initialization: Do the followings before each execution of FP. (a) Unmark all nodes of CFADG. (b) Set dyn_slice(u) = for every node u. (c) Set RecDefn(v) = NULL for every variable v of the program FP. (3) Run time updations: Execute the program for the given set of input values and carry out the followings after each statement s of the program FP is executed. Let node u in CFADG corresponds to the statement s in the program FP. (a) For every variable v used at node u, Update dyn_slice(u) = {u,ei,e2,... ,ek} U dyn_slice(ei) U dyn_slice(e2) U ... U dyn_slice(xk) where ei, e2, ... , ek are the marked predecessor nodes of u in CFADG. (b) If u is defn(v) node, then i. Unmark the node RecDefn(v). ii. Update RecDefn(v) = u. (c) Mark node u. (d) If u is a method call node or new operator node or polymorphic node or mixin call node, then i. Mark node u. ii. Mark the associated actual-in and actual-out nodes corresponding to the present execution of u. iii. Mark the corresponding method entry node for the present execution of u. iv. Mark the associated formal-in and formal-out parameter nodes. (e) If u is a Super() method node i. Mark node u. ii. Mark the associated actual-in and actual-out nodes corresponding to the present execution of u. iii. Mark the corresponding method entry node present in the parent mixin layer for the present execution of u. iv. Mark the formal-in and formal-out parameter nodes associated with the method entry node. (f) If u is a pointcut node i. Mark node u. ii. Mark the corresponding advice nodes for present execution of u. (g) If u is an advice node i. Mark node u. ii. Mark the formal-in and formal-out parameter nodes associated with the advice node. (h) If u is an introduction node such that u is a method i. Mark node u. ii. Mark the formal-in and formal-out parameter nodes. (i) If u is an introduction node such that u is a field i. Mark node u. ii. Mark the node that defines a value to u for the current execution of u. iii. Mark the node that uses the value of u for the current execution of u. (4) Slice Look Up (a) For a given slicing command < u,v >, do 210 Informatica 44 (2020) 199-224 M. Sahu et al. i. Look up dyn_slice(u) for variable v for the content of the slice. ii. Map the Java statements included in the computed dynamic slice to the corresponding composed Jak statements to get the final dynamic slice iii. Display the resulting slice. (b) If the program has not terminated, go to Step 3. Working of the Algorithm The working of FANMDS algorithm is illustrated through an example. Consider the feature-oriented program given in Figure 3 and the selected features for composition and aspects given in Figure 1 and Figure 2 respectively. After the composition of the selected features, the files that are generated are depicted in Figure 4. The corresponding CFADG is shown in Figure 8. During the initialization step, our algorithm first unmarks all the nodes of the CFADG and sets dyn_slice(u) = ^ for every node u of the CFADG. Now, for the input data n = 5, the program will execute the statements m69, s70, p81, a82, s83, m67, s68, a82, s83, m59, s60, a82, s83, m51, s52, a82, s83, m43, s44, a82, s83, s39, m40, s41, m2, s3, s4, s5, s6, a84, s85, s45, s46, m13, s14, p72, a73, s74, m7, s8, s10, s11, a75, s76, a78, s79, s15, s16, s18, s47, s49, a84, s85, s53, s54, m20, s21, a73, s74, m7, s8, s10, s11, a75, s76, s78, s79, s22, s23, s25, s55, s57, a84, s85, s61, s62, m27, s28, a73, s74, m7, s8, s10, s11, a75, s76, s78, s79, s29, s30, s31, s32, s33, s34, s35, s37, s63, s65, a84, s85, a84, s85 in order. So, our FANMDS algorithm marks these nodes. Our algorithm also marks the associated actual parameter vertices at the calling method and the formal parameter vertices at the called method. Now, the dynamic slice is to be computed with respect to variable n at statement s78, i.e., with respect to slicing criterion < {n = 5}, s78, n > by traversing the CFADG in backward manner. According to the FANMDS algorithm, the dynamic slice with respect to variable n at statement s78 is given by the expression dyn_slice(s78) = {s78, s76, a75 ^ f 1_in} U dyn_slice(s76) U dyn_slice(a75 ^ f 1_in). By evaluating the above expression in a recursive manner, we get the final dynamic slice consisting of the statements corresponding to the nodes m2, s3, s4, s5, m7, s8, s10, s11, m13, s14, m20, s21, m27, s28, s39, m40, s41, m43, s44, s45, s46, s47, s49, m51, s52, s53, s54, s55, s57, m59, s60, s61, s62, m67, s68, m69, s70, p72, a73, s74, a75, s76, s78, p81, a82, s83, a84, s85. These are indicated as bold vertices in Figure 9 and the corresponding statements are indicated in rectangular boxes in Figure 10. Similarly, dynamic slice with respect to any slicing criterion can be computed using FANMDS algorithm. 5 Implementation This section briefly describes the implementation of FANMDS algorithm. A dynamic slicing tool has been developed to implement the algorithm which has been named feature-aspect dynamic slicing tool (FADST). Figure 11 depicts the architectural design of the slicing tool FADST. The working of our slicing tool is depicted in Figure 12, through a flow chart. In Figure 11, the executable components are depicted in rectangular boxes and the passive components are depicted in ellipses. First, the features required to compose and aspects to be captured are selected. The selected features, the selected aspects, and the slicing criterion consisting of input, line number, and variable are provided to FADST through the Graphical User Interface (GUI) component. The Dynamic Slicer component interacts with the GUI component and produces the required result as output back to GUI. The AHEAD composer [2] composes the selected features to generate a set of Java programs. These Java programs and the selected aspects are fed to AspectJ composer. AspectJ composer weaves the aspects at the appropriate join points, and the result is a composed AspectJ program. The lexical analyzer component reads the composed AspectJ program and generates tokens from these programs. Upon encountering a useful token, the lexical analyzer component returns the token along with its type to the parser and semantic analyzer component. The parser and semantic analyzer component takes the token and analyzes it using the grammatical rules designed for the input programs. The code instrumentor component instruments the composed AspectJ programs. The classes are instrumented with line numbers prefixed with c, the aspects are instrumented with line numbers prefixed with as, the methods are instrumented with line numbers prefixed with m, the pointcuts are instrumented with line numbers prefixed with p, the advices are instrumented with line numbers prefixed with a, and the statements containing assignments, computations, predicates are instrumented with line numbers prefixed with s. The CFADG constructor component constructs the CFADG using the required program analysis information such as type of statement, sets of variables defined or used at a statement etc. The dynamic slicer component implements the FANMDS algorithm. We have used Java language for our implementation. A compiler writing tool, ANTLR (Another Tool for Language Recognition)5, has been used for lexical analyzer, parser and semantic analyzer components of FADST. An adjacency matrix adj[][] has been used for storing the CFADG with respect to a selected composition of features of the given feature-oriented program. Arrays are used to store the sets Defn(v), Usage(v), RecDefn(v), and dyn_slice(u). 5www.antlr.org Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 211 Figure 9: CFADG showing statements included in dynamic slice as bold nodes 212 Informatica 44 (2020) 199-224 M. Sahu et al. SoUrCe RooT Base "../features/Base/test.jak"; abstract class testSSBase { |static calc c—new calc();| [static void printtest (}1 | fc".enter () fl } ! SoUrCe sqrt "../features/sqrt/test.jak"; abstract class test$$sqrt extends test$$Base { ¡static void printtest (){ | | Super () . print-test () ; | System.out.println("Finding 3quare root");| double r=c.squrt() ; | |if(r==-l)| System.out.println("not valid number"); else ¡System.out.println("Square Root is ,r+r) ; | } } SoUrCe log "../features/log/test.jak"; abstract class testSSlog extends testSSsqrt { [static void printtest() { | | Super () . printte st () System.out.printin("Finding logarithm"); double r=c.logten(); if (r=—-1) System.out.println("not valid number"); else System.out.println("Logarithm base 10 is "+r) : :■ SoUrCe fact "../features/fact/test.jak"; abstract class testSSfact extends test$$log { static void printtest(){ | ¡Super () .printtest () ; | System.out.println("Finding factorial"); long r=c.fact{); if{r=-l) System.out.println("not valid number"); System.out.println("Factorial is "+r); } } SoUrCe Print "../features/Piint/test.jak"; public class test extends test$$fact { static void printtest(}{ | Super () .printtest () ;| i_ public static void main (String [ ] arg3){| |test .printtest () ;| ! i_ (b) test.jak as86 public aspect print { p81 pointcut pc():call (void *,printtest());| aS2 before ():pcXTTI by stem .out, print In ("bntenng Printing Aspect"J; ;_ aB4 after (I ipciHI_ 555 System, out ■ print inT'Exitins Printing Aspect"'); > }_ (d) print.aj Figure 10: Dynamic slice with respect to slicing criterion < {n = 5}, s78,n > depicted as statements in rectangular boxes import java. util.Scanner; import java. lang.Math; SoUrCe RooT Base "../features/Base/calc.jak"; abstract cla ss calc$$3ase ( double n; boolean result; |void enter () { | |System. out .println ("Enter a number") ;| ¡Scanner sc=new Scanner (System, in) ; n—sc. nextDouble () ; | ) SC.close Or | boo lea n isnegs(double x){| |if(x<0H result—true; else |result=false ;| } return result; | SoUrCe sqrt "../features/sqrt/calc.jak"; abstract cla ss calcSSsqrt extends calcS£3ase { |double squrt () { | double answer; result=isnegs(n);| if (! result )| |answer=Math. sgrt (n) ; | else answer—-1; } return answer;] ) SoUrCe log ./features/log/calc.jak"; abstract cla ss calcSSlog extends calcSSsqrt { double logtenl)( double ans we r; result=isnegs(n); if('result) answer=Math.loglO(n); else answer=-l; } return answer; } 5oUrCe fact " . . /features/fact/calc.j ak"; public class calc extends calcSSlog { long f act() { long answer; result=isnegs(n); if('result){ answer—1; if(n>0){ int i=l; while (i<=n) { ans we r=answer*i; i++; > else answer=-l; } } re turn ans we r; (a) calc.jak as71 public aspect error { p72 pointcut pc(double n): call (boolean .isnegs(double)) && args(n);| ¡a73 before (double n> :pc( n){ | |s74 System.out.println("Entering Error Handling Aspect"); | } a75 afte (double n) returning (boolean b]:pc[r]f |s76 i+ib==true)I s77 System.out.println(n+" else is negative"); |s7S System.out.println (n-J-" is positive");| s 79 } > System.out.println("Exiting E rror Handling Aspect")^ (c) error.aj Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 213 Figure 11: Architecture of the slicing tool L Set of features and aspects 7 1 Select the features to be composed and aspects to be weaved Compose the selected featu res usi ng AHEAD composer ) Weave the selected aspects into composed Java program using Aspect J composer 1 f L Composed As pect J prog am 7 Perform lexical analysis and generate tokens using Lexical Analyzer by taking resultant AspecU program Generate program analysis information and construct abstract syntax tree using Parser and Semantic Analyzer by providing tokens Instrument code with line numbers using Code Instrumenter by taking tokens Construct C FAD G using CFADG Constructor by taking program analysis information and instrumented code Specify slicing criterion consisting of input, line number, variable Compute dynamic slice using Dynamic Slicer component Display computed dynamic slice L Resultant Dynamic Slice 7 Figure 13: CFADG generation time and Average slice computation time for Calculator Product Line 5.1 Case studies and experimental results We have applied our algorithm to some product lines 6,7. We have also taken some open-source Java programs8 9 10. We have developed few product lines by identifying various features and converting these available Java programs into corresponding Jak programs. It may be noted that Jak is one of the feature-oriented programming languages. We have also taken the models of few product lines (such as calculator product line, stack product line, graph product line) from the work of different researchers [45,44,43,46] and developed the corresponding Jak programs. These may be considered as representative feature-oriented programs with aspect-oriented extensions. In all the product lines, we have identified the aspects that are scattered throughout the program. The product lines we have taken as our case studies have various features and aspects which can be used for composing a variety of software product lines. We have taken fifteen product lines as our case studies. The characteristics of our software product lines are depicted in Table 1. These programs are executed for different compositions of features with different aspects weaved for different inputs. Also, the algorithm has been tested for different slicing criteria for different compositions of features and different inputs. The CFADG generation time and average slice computation time for various compositions of features in different product lines are depicted in Figures 13-27. It can be inferred from Figures 13-27 that different compositions of features result in different slice computation times. The aspects weaved at more number of join points take more time than the aspects weaved at less number of join points. For example, in Calculator Product Line (CPL), the number of join points where the aspect Print is weaved is more than that of aspect Error. That's why the slice computation time for the program where Print aspect Figure 12: Flowchart for working of the slicing tool given in Figure 11 6http://spl2go.cs.ovgu.de/projects 7http://www.infosun.fim.uni-passau.de/spl/ apel/fh 8http://www.sanfoundry.com/ java-program-implement-avl-tree/ 9http://www.geeksforgeeks.org/ avl-tree-set-2-deletion/ 10https://ankurm.com/implementing-singly-linked-list-in-ja Table 1: List of Case Study Software Product Lines SI. No. Name of Product Line Description Total No. of Features Supported Total No. of Aspects weaved Total No. of lines in composed AspectJ file Mandatory Optional Mandatory Optional 1 Calculator Product Line (CPL) Calculates the factorial, square root and logarithmic value with base 10 of a number. 2 (Base, Print) 3 (sqrt, log, fact) 1 (Error) 1 (Print) 85 2 Stack Product Line (SPL) Models variations of stacks. 1 (Stack) 3 (Counter, Lock, Undo) 1 (Size) 1 (Top) 130 3 Graph Product Line Models variability for different types of graphs such as colored, weighted etc. 1 (Base) 4 (Weight, Color, Recursive, PrintHeader) 1 (Print) 1 (AddNode) 150 4 AVL Tree Product Line (AVLTPL) Simulates the various operations on an AVL tree. 2 (Base, Display) 4 (Insert, Delete, Count, Search) 1 (ComputeHeight 1 (Rotate) 315 5 Single Linked List Product Line (SLLPL) Simulates the various operations on a single linked list. 2 (Base, Display) 7 (InsertBegin, In-sertEnd, InsertAfter, DeleteBegin, Dele-teEnd, DeleteAfter, Count) 2 (Insert) 0 195 6 Desktops earcher Program for indexing and content based searching in files 7 9 3 4 2516 7 TankWar A Game 12 19 6 7 3746 8 GPL Graph and algorithm library 12 24 5 12 801 9 MobileMedia MobileMedia is a Software Product Line (SPL) that manipulates photo, music, and video on mobile devices. It is a multimedia management for phones 10 37 5 10 4669 10 Digraph A library for representing and manipulating directed graph structures. Beside basic graphs, it supports various operations such as removal, traversal, and transposition, implemented as optional features. 1 3 1 2 1733 11 Elevator Simulates various operations of an Elevator 3 3 2 1 873 12 Vistex VisTex is a product line that features graphical manipulation of graphs and their textual representation. It is designed to be easily extendible for specific graph-based applications such as UML. 5 11 2 3 1890 13 Violet Graphical model editor 15 73 6 9 9203 14 Notepad Graphical text editor 6 7 3 3 1672 15 PkJab Instant messaging client for labber. 3 5 2 2 3963 Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 215 Figure 14: CFADG generation time and Average slice computation time for Stack Product Line Figure 15: CFADG generation time and Average slice computation time for Graph Product Line Figure 16: CFADG generation time and Average slice computation time for AVL Tree Product Line 216 Informatica 44 (2020) 199-224 M. Sahu et al. 1S333 DesktopSearcher B -FACGgereration tin; _ — = I = = = E .... F «33 ¡333 = = = 1 2 3 Composed Feature 1 Figure 18: CFADG generation time and Average slice computation time for DesktopSearcher Figure 22: CFADG generation time and Average slice computation time for Digraph Figure 19: CFADG generation time and Average slice computation time for TankWar Figure 23: CFADG generation time and Average slice computation time for Elevator Figure 20: CFADG generation time and Average slice computation time for GPL Figure 24: CFADG generation time and Average slice computation time for Vistex Figure 21: CFADG generation time and Average slice com- Figure 25: CFADG generation time and Average slice computation time for MobileMedia putation time for Violet Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 217 14000 12000 10000 I &000 I 6000 ^ 4003 2000 0 Figure 26: CFADG generation time and Average slice computation time for Notepad PkJab oCFADG generation lime □ Average slice computation time 20000 | 15000 = F 1 2 Composed Feature ID Figure 27: CFADG generation time and Average slice computation time for PkJab is weaved is more than that of the program where Error aspect is weaved. The features containing more number of loops take more time. The composed features containing less number of executable statements take less time compared to those containing more number of executable statements. 6 Comparison with related work Several works have been carried out on slicing of procedure-oriented programs [34, 32, 33, 30, 47], object-oriented programs [11,21, 39, 22,15], aspect-oriented programs [37, 9,10,16,18,23]. But very, few work have been carried out on slicing of feature-oriented programs [35]. Zhao [9] was the first to develop a two-phase slicing algorithm to compute static slices of aspect-oriented programs. Later, Zhao et al. [10] developed an efficient algorithm for constructing system dependence graph for aspect-oriented programs. Ray et al. [16] developed an algorithm to compute dynamic slices of aspect-oriented programs by constructing Aspect System Dependence Graph (AOSG). They had introduced a new logical node called C-node to capture communication dependencies among the non-aspect code and aspect code. They had also introduced a new arc called aspect-membership arc to connect the dependence graphs of the non-aspect code and aspect code. They had not shown the actual parameters in the pointcuts. Singh et al. [18] proposed a method to compute slices depending upon the slice point location in the program. Their computed slice was an executable slice. Munjal et al. [23] automated the generation of system dependence graphs (SDG) for aspect-oriented programs by analysing the bytecode of aspect-oriented programs. Then, they proposed a three-phase slicing algorithm to compute static slices using the intermediate graph for a given aspect-oriented program. All the above works [9, 15, 16, 18, 23] have not considered feature-oriented programs. Apel et al. [3] presented a novel language for FOP in C++ namely FeatureC++. They also mentioned few problems of FOP languages. Apel et al. [4] demonstrated FeatureC++ along with its adaptation to Aspect-Oriented Programming (AOP) concepts. They discussed the use of FeatureC++ in solving various problems related to incremental software development using AOP concepts. They also discussed the weaknesses of FOP for modularization of crosscutting concerns. Apel et al. [5] discussed the limitations of crosscutting modularity and the missing support of C++. They also focused on solutions for ease evolv-ability of software. Batory [2] presented basic concepts of FOP and a subset of the tools of the Algebraic Hierarchical Equations for Application Design (AHEAD) tool suite. Apel et al. [7] presented an overview of feature-oriented software development (FOSD) process. They had identified various key issues in different phases of FOSD. Thum et al. [6] developed an open source framework for FOSD namely FeaturelDE that supported all phases of FOSD along with support for feature-oriented programming languages, and delta-oriented programming languages, aspect-oriented programming languages. Pereira et al. [20] discussed the findings of SPL management tools from a Systematic Literature Review (SLR). These works [7, 5, 3, 4, 2, 20, 6] discussed only the programming and development aspects of FOP and did not consider the slicing aspects. We have presented a technique for dynamic slicing of feature-oriented programs with aspect-oriented extensions using Jak as the FOP language. Very few work have been carried out on slicing of feature-oriented programs [35]. Sahuetal. [35] suggested a technique to compute dynamic slices of feature-oriented programs. Their technique first composed the selected features of feature-oriented programs. Then, they used an execution trace file and a dependence-based program representation namely dynamic feature-oriented dependence graph (DFDG). The dynamic slice was computed by traversing DFDG in breadth-first or depth-first manner and mapping the traversed vertices to the program statements. They had missed some of the dependences such as mixn call edge, refinement edge, and mixin return dependence edge, etc. that might arise in feature-oriented programs. The drawback of their approach is the use of execution trace file which may lead to more slice computation time. They had not considered the aspect-oriented extensions of feature-oriented programs. In our approach, we have not used any execution trace file. Usually, the execution trace file is used to store the execution history of each executed statement for a given input. Much time is required to store and retrieve the executed statements. The statements are then used for calculation of dynamic slice for each statement. Thus, extra time is required to perform I/O operations on an execution trace 218 Informatica 44 (2020) 199-224 M. Sahu et al. file. We do not use any execution trace file. During execution of the program for a given input, the dynamic slice for each statement is computed by marking and unmarking process. Thus, there is no requirement of any execution trace file for storing the executed statements. So, our proposed approach does not take any extra time to read from or write into the execution trace file, thereby reducing the slice extraction time. Also, we have considered the aspects that are scattered throughout the code. Our algorithm does not create any new node in the intermediate representation CFADG during runtime. This results in faster computation of slices. 7 Conclusion and future work We have presented an approach to compute dynamic slices of feature-oriented programs with aspect-oriented extensions. The features required for composition are first selected and composed using Algebraic Hierarchical Equations for Application Design (AHEAD) composer. Then, the aspects are weaved into the generated composed Java program using AspectJ composer to produce the resultant AspectJ program. The intermediate dependence based representation of the program containing Jak code and AspectJ code is constructed and it is called Composite Feature-Aspect Dependence Graph (CFADG). The program is executed for an input. During execution, the nodes of CFADG are marked and unmarked according to our feature-aspect node marking dynamic slicing (FANMDS) algorithm. We have developed a tool to implement our FANMDS algorithm and named it FADST. Our tool FADST computes the dynamic slices and the average slice extraction times for various compositions of features and aspects weaved for various product lines. Currently, our tool is able to handle various compositions for few product lines with few aspects captured. Also, current evaluation only uses primitive feature-oriented programs. In future, we will extend our tool to handle more number of product lines with more number of compositions. Our algorithm may easily be extended to compute dynamic slices of other feature-oriented languages like Fea-tureC++, FeatureRuby, FeatureHouse, Fuji, etc. Also, the extension of the algorithm can be used to compute conditioned slices, amorphous slices for feature-oriented programs with various aspects captured. We will also find out the differences in the performance of different aspects. References [1] Christian Prehofer (1997) Feature-Oriented Programming: A Fresh Look at Objects, Proceedings of 11th European Conference on Object-Oriented Programming (ECOOP), Springer, Berlin, Heidelberg, pp. 419-443. https://doi.org/10. 1007/bfb0053389 [2] Don Batory (2006) A Tutorial on Feature-Oriented Programming and the AHEAD Tool Suite, Proceedings of the 2005 International Conference on Generative and Transformational Techniques in Software Engineering (GTTSE'05), Springer-Verlag, Berlin, Heidelberg, pp. 3-35. https://doi.org/10. 1007/11877028_1 [3] Sven Apel and Thomas Leich and Marko Rosen-muller and Gunter Saake (2005) FeatureC++: Feature-Oriented and Aspect-Oriented Programming in C++, Tech. rep., Department of Computer Science, Otto-von-Guericke University, Magdeburg, Germany. [4] Sven Apel and Thomas Leich and Marko Rosen-muller and Gunter Saake (2005) FeatureC++: On the Symbiosis of Feature-Oriented and Aspect-Oriented Programming Proceedings of the International Conference on Generative Programming and Component Engineering (GPCE'05), Springer, pp. 125-140. https://doi.org/10.1007/11561347_10 [5] Sven Apel and Thomas Leich and Marko Rosen-muller and Gunter Saake (2005) Combining Feature-Oriented and Aspect-Oriented Programming to Support Software Evolution, Proceedings of the 2nd ECOOP Workshop on Reflection, AOP and MetaData for Software Evolution (RAM-SE), School of Computer Science, University of Magdeburg, July, pp. 3-16. [6] Thomas Thum and Christian Kastner and Fabian Benduhn and Jens Meinicke and Gunter Saake and Thomas Leich (2014) FeaturelDE: An extensible framework for feature-oriented software development, Science of Computer Programming, 79, pp. 70-85. https://doi.org/10.1016/ j.scico.2012.06.002 [7] Sven Apel and Christian Kastner (2009) An Overview of Feature-Oriented Software Development. Journal of Object Technology, 8(5), pp. 49-84, July-August. https://doi.org/10. 5381/jot.2009.8.5.c5 [8] Gregor Kiczales and John Irwin and John Lamping and Jean Marc Loingtier and Cristiana Videira Lopes and Chris Maeda and Anurag Mendhekar (1997) Aspect-Oriented Programming, Proceedings of the European Conference on Object-Oriented Programming (ECOOP), Finland, June, pp. 220-242. https://doi.org/10.1007/bfb0053381 [9] Jianjun Zhao (2002) Slicing Aspect-Oriented Software, Proceedings of 10th International Workshop on Program Comprehension, pp. 251-260, June. https://doi.org/10.110 9/wpc.2 0 02. 1021346 Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 219 [10] Jianjun Zhao and Martin Rinard (2003) System Dependence Graph Construction for Aspect-Oriented Programs. Technical report, Laboratory for Computer Science, Massachusetts Institute of Technology, USA, March. [11] Loren Larsen and Mary Jean Harrold (1996) Slicing Object-Oriented Software, Proceedings of 18th International Conference on Software Engineering, pp. 495-505, March. https://doi.org/10. 1109/icse.1996.493444 [12] Timon Ter Braak (2006) Extending Program Slicing in Aspect-Oriented Programming With Inter-Type Declarations, 5th TSConIT Program, June. [13] Aspect Oriented Programming. www.wikipedia.org. [14] Hiralal Agrawal and Joseph R. Horgan (1990) Dynamic Program Slicing, ACM SIGPLAN Notices, Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation PLDI'90, 25(6), pp. 246-256, June. https: //doi.org/10.114 5/93542.9357 6 [15] Durga Prasad Mohapatra and Rajib Mall and Rajeev Kumar (2004) An Edge Marking Technique for Dynamic Slicing of Object-Oriented Programs, Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC'04). https://doi.org/10. 110 9/cmpsac.2 0 04.134 2 80 6 [16] Abhisek Ray and Siba Mishra and Durga Prasad Mohapatra (2012) A Novel Approach for Computing Dynamic Slices of Aspect-Oriented Programs, International Journal of Computer Information Systems, 5(3),pp. 6-12, September. [17] Abhisek Ray and Siba Mishra and Durga Prasad Mohapatra (2013) An Approach for Computing Dynamic Slice of Concurrent Aspect-Oriented Programs International Journal of Software Engineering and Its Applications, 7(1), pp. 13-32, January. [18] Jagannath Singh and Durga Prasad Mohapatra (2013) A Unique Aspect-Oriented Program Slicing Technique, Proceedings of International Conference on Advances in Computing, Communications and Informatics (ICACCI'13), pp. 159-164. https://doi. org/10.110 9/icacci.2 013.6637164 [19] Jagannath Singh and Dishant Munjal and Durga Prasad Mohapatra (2014) Context Sensitive Dynamic slicing of Concurrent Aspect-Oriented Programs, Proceedings of 21st Asia-Pacific Software Engineering Conference (APSEC'14), pp. 167-174. https: //doi.org/10.110 9/apsec.2 014.35 [20] Juliana Alves Pereira and Kattiana Constantino and Eduardo Figueiredo (2015) A Systematic Literature Review of Software Product Line Management Tools, Proceeddings of 14th International Conference on Software Reuse (ICSR'15), Berlin Heidelberg, pp. 73-89. https://doi.org/10 . 1007/ 97 8-3-319-14130-5_6 [21] Durga Prasad Mohapatra and Rajeev Kumar and Rajib Mall and D. S. Kumar and Mayank Bhasin (2006) Distributed dynamic slicing of Java programs, Journal of Systems and Software, 79(12), pp. 1661-1678. https://doi.org/10.1016/ j.jss.2006.01.009 [22] Durga Prasad Mohapatra and Rajib Mall and Ra-jeev Kumar (2005) Computing dynamic slices of concurrent object-oriented programs, Information & Software Technology, 47(12), pp. 805-817. https://doi.org/10.1016/j.infsof. 2005.02.002 [23] Dishant Munjal and Jagannath Singh and Subhrakanta Panda and Durga Prasad Moha-patra (2015) Automated Slicing of Aspect-Oriented Programs using Bytecode Analysis, Proceedings of IEEE 39th Annual International Computers, Software & Applications Conference (COMPSAC 2015), pp. 191-199. https: //doi.org/10.1109/compsac.2015.98 [24] Madhusmita Sahu and Durga Prasad Mohapatra (2007) A Node-Marking Technique for Dynamic Slicing of Aspect-Oriented Programs, Proceedings of 10th International Conference on Information Technology (ICIT2007), pp. 155-160. https://doi. org/10.110 9/icit.2 0 07.7 0 [25] Diganta Goswami and Rajib Mall (1999) Fast Slicing of Concurrent Programs, Proceedings of 6th International Conference on High Performance Computing (HiPC 1999), pp. 38-42. https://doi.org/ 10.1007/978-3-54 0-4 6642-0_6 [26] Diganta Goswami and Rajib Mall (2000) Dynamic Slicing of Concurrent Programs, Proceedings of 7th International Conference on High Performance Computing (HiPC 2000), pp. 15-26. https://doi. org/10.1007/3-54 0-444 67-x_2 [27] Jaiprakash T. Lallchandani and Rajib Mall (2011) A Dynamic Slicing Technique for UML Architectural Models, IEEE Transactions on Software Engineering, 37(6), pp. 737-771. https://doi.org/10. 1109/tse.2010.112 [28] Philip Samuel and Rajib Mall (2009) Slicing-based test case generation from UML activity diagrams, ACM SIGSOFT Software Engineering Notes, 34(6), pp. 1-14. https://doi.org/10.114 5/ 1640162.1666579 220 Informatica 44 (2020) 199-224 M. Sahu et al. [29] Jaiprakash T. Lallchandani and Rajib Mail (2010) Integrated state-based dynamic slicing technique for UML models, IET Software, 4(1), pp. 5578. https://doi.org/10.104 9/iet-sen. 2009.0080 [30] G. B. Mund and Rajib Mall (2006) An efficient in-terprocedural dynamic slicing method, The Journal of Systems and Software, 79, pp. 791-806. https: //doi.org/10.1016/j.jss.2005.07.024 [31] Diganta Goswami and Rajib Mall (2004) A parallel algorithm for static slicing of concurrent programs, Concurrency - Practice and Experience, 16(8), pp. 751-769. https://doi.org/10.1002/cpe. 789 [32] G. B. Mund and R. Mall and S. Sarkar (2003) Computation of intraprocedural dynamic program slices, Information and Software Technology, 45 (8), pp. 499-512. https://doi.org/10.1016/ S0950-5849(03)00029-6 [33] G. B. Mund and R. Mall and S. Sarkar (2002) An efficient dynamic program slicing technique, Information and Software Technology, 44 (2), pp. 123-132. https://doi.org/10.1016/ s0950-5849(01)00224-5 [34] Diganta Goswami and Rajib Mall (2002) An efficient method for computing dynamic program slices, Information Processing Letters, 81(2), pp. 111-117. https://doi.org/10.1016/ s0020-0190(01)00202-2 [35] Madhusmita Sahu and Durga Prasad Mohapatra (2016) Dynamic Slicing of Feature-Oriented Programs, Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics (ICACNI 2015), pp. 381-388. https://doi. org/10.1007/978-81-322-2529-4_4 0 [36] Mark Weiser (1981) Program Slicing, Proceedings of the 5th International Conference on Software Engineering (ICSE), pp. 439-449. IEEE Computer Society, March. [37] Durga Prasad Mohapatra and Madhusmita Sahu and Rajib Mall and Rajeev Kumar (2008) Dynamic Slicing of Aspect-Oriented Programs, Informatica, 32(3), pp. 261-274. [38] Jaiprakash T. Lallchandani and Rajib Mall (2005) Computation of Dynamic Slices for Object-Oriented Concurrent Programs, Proceedings of Asia Pacific Software Engineering Conference (APSEC 2005), pp. 341-350. https://doi.org/10. 1109/apsec.2005.51 [39] Jianjum Zhao (1998) Dynamic Slicing of Object-Oriented Programs, Technical report, Information Processing Society of Japan, pp. 1-7, May. [40] Sebastian Gunther and Sagar Sunkle (2009) Feature-Oriented Programming with Ruby. In Proceedings of the First International Workshop on Feature-Oriented Software Development (FOSD'09), pp. 11-18, October. https://doi.org/10.114 5/162 9716. 1629721 [41] Sebastian Gunther and Sagar Sunkle (2012) rbFea-tures: Feature-oriented programming with Ruby, Science of Computer Programming, 77(3), pp. 152173, March. https://doi.org/10.1016/j. scico.2010.12.007 [42] Bogdan Korel and Satish Yalamanchili (1994) Forward Computation Of Dynamic Program Slices, Proceedings of the 1994 ACM SIGSOFT international symposium on Software testing and analy-sis(ISSTA'94), pp. 66-79, August. https://doi. org/10.1145/186258.186514 [43] Sven Apel and Thomas Leich and Gunter Saake(2008) Aspectual Feature Modules, IEEE Transactions On Software Engineering, 34(2),pp. 162-180, March/April. https: //doi.org/10.1109/tse.2 0 07.7 077 0 [44] Jia Liu and Don Batory and Srinivas Nedunuri (2005) Modeling Interactions in Feature-Oriented Software Designs, Proceedings ofInternational Conference on Feature Interactions in Telecommunications and Software Systems (ICFI2005), pp. 178-197. [45] Ian Adams and Sigmon Myers (2009) FOP and AOP: Benefits, Pitfalls and Potential for Interaction, pp. 17. [46] Sagar Sunkle and Marko Rosenmuller and Norbert Siegmund and Syed Saif ur Rahman and Gunter Saake and Sven Apel (2008) Features as First-class Entities-Toward a Better Representation of Features. Proceedings of Workshop on Modularization, Composition, and Generative Techniques for Product Line Engineering, pp. 27-34, October. [47] Susan Horwitz and Thomas Reps and David Bink-ley (1990) Inter-Procedural Slicing Using Dependence Graphs, ACM Transactions on Programming Languages and Systems, vol. 12, no. 1, pp. 26-60, January. https://doi.org/10.114 5/ 77606.77608 Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 221 8 Appendices A Construction of CFADG Algorithm 8 Construction of CFADG Input: The feature-oriented program containing aspects with selected required features and weaved aspects. Output: The composite feature-aspect dependence graph (CFADG). 1: procedure ConstructPDG() 2: for start of a method do 3: Create method entry node. 4: for each executable statement in the program do 5: Create a node. 6: for all nodes created do 7: if node n2 is under the scope of node ni then 8: Add control dependence edge from ni to n2, ni ^ n2. 9: if node n1 controls the execution of node n2 then 10: Add control dependence edge from ni to n2, ni ^ n2. 11: if node n2 uses the value of a variable that is defined at node ni then 12: Add data dependence edge from ni to n2, ni ^ n2. 13: procedure ConstructMxDG() 14: for all methods in a mixin do 15: Call ConstructPDG(). 16: for entry of a mixin do 17: Create mixin entry node. 18: for each parameter present in the method call do 19: Create an actual-in parameter node. 20: for each parameter present in the method definition do 21: Create a formal-in parameter node. 22: for each parameter in the method call that is modified inside the method do 23: Create an actual-out parameter node. 24: for each actual-out parameter node do 25: Create corresponding formal-out parameter node. 26: for all nodes created do 27: if node x corresponds to mixin entry node and node y is a method entry node then 28: Add mixin membership edge from x to y, x ^ y. 29: if node ni returns a value to the calling method at node n2 within a mixin layer then 30: Add return dependence edge from ni to n2, ni ^ n2. 31: if node ni calls a method that is defined at node n2 within a mixin layer then 32: Add call edge from ni to n2, ni ^ n2. 33: if node ni calls a method that is defined at node n2 within a mixin layer by passing parameters then 34: Add call edge from ni to n2, ni ^ n2. 35: Add parameter-in edge from actual-in parameter node to corresponding formal-in parameter node. 36: Add parameter-out edge from formal-out parameter node to corresponding actual-out parameter node. 37: if node ni is an actual-in parameter node and node n2 is an actual-out parameter node such that the value at node ni affects the value at node n2 then 38: Add summary edge from ni to n2, ni ^ n2. 39: procedure ConstructSDG() 40: for all mixins within a mixin layer do 41: Call ConstructMxDG. 42: for all nodes created do 43: if node x is a polymorphic method call then 44: Create polymorphic choice vertex. 45: if node y is a polymorphic choice vertex then 46: Add a call edge from x to y, x ^ y 47: if node x is a new operator node and node y is the correspond- ing constructor node then 48: Add call edge from ni to n2, ni ^ n2. 49: Add parameter-in edge from actual-in parameter node to corresponding formal-in parameter node. 50: Add parameter-out edge from formal-out parameter node to corresponding actual-out parameter node. 222 Informatica 44 (2020) 199-224 M. Sahu et al. 82: for all pointcuts in an aspect do 83: Call ConstructPtDG(). 84: for all introductions in an aspect do 85: Call ConstructIDG(). 86: for all nodes created do 87: if node x is aspect start vertex then 88: if node y is advice start vertex then 89: Create aspect membership edge from x to y, x ^ y. 90: if node y is pointcut start vertex then 91: Create aspect membership edge from x to y, x ^ y. 92: if node y is introduction start vertex then 93: Create aspect membership edge from x to y, x ^ y. 94: if node x is pointcut start node and node y is advice start node then 95: Create data dependence edge from x to y, x ^ y. 96: Add parameter-in edge from actual-in parameter node to corresponding formal-in parameter node. 97: Add parameter-out edge from formal-out parameter node to corresponding actual-out parameter node. 98: procedure ConstructCFADG() 99: for each mixin layer do 100: Call ConstructSDG(). 101: for each aspect do 102: Call ConstructAsDG. 103: for all nodes created do 104: if node «2 in one mixin layer uses the value of a variable that is defined at node rai in different mixin layer then 105: Add mixin data dependence edge from «1 to «2, "1 ^ «2. 106: if node «2 in an aspect uses the value of a variable that is defined at node n1 in a mixin then 107: Add aspect data dependence edge from «1 to «2, «1 ^ «2. 51: Remcwemixin membership edges. 108: if node «i in one mixin layer returns a value to the calling 52: Remove mixin entry nodes. method at node «2 in different mixin layer then 53: procedure ConstructADG() 109: Add mixin return dependence edge from «1 to «2, «1 ^ 54: for start of an advice do «2. 55: Create advice start vertex. 110: if node «1 in one mixin layer calls a method that is defined 56: if advice contams parameter then at node «2 in different mixin layer then 57: Create formal-m rndformci^iit parameter nodes. m Add mixin call edge from «1 to «2, «1 ^ «2. 58: for all nodes created do 112: Add parameter-in and parameter-out edges. 59: if n°de «2 is under the sc°pe °f n°de «1 then 113: if node «1 calls a method that is defined at node «2 using 60: Add control dependence edge from «1 to «2, «1 ^ «2. Super() method then 61: if node «1 controls the execution of node «2 then 114: Add refinement edge from «1 to «2, «1 ^ «2. 62: Add contr°l dependence edge from «1 to «2, «1 ^ «2. 115: if node «1 is an output statement followed by node «2 and 63: if node «2 uses the value of a variable that is defined at node node «2 is an input, a computation, a predicate, or a method call «1 then statement then 64: Add data dependence edge from «1 to «2, «1 ^ «2. 116: Add message dependence edge from «1 to «2, «1 ^ 65: procedure ConstructIDG() «2. 66: for entry of introduction do 117: if node «1 is a method call node and node «2 is a before 67: Create introduction start vertex. advice node capturing the method called at «1 then 68: if introduction is a method or constructor then 118: Add weaving edge from «1 to «2, «1 ^ «2. 69: CaUConstructpDG. 119: if node «1 is the last statement in a before advice and node 70: if mtodurtwnis a field then «2 is the method entry node of the method captured by the advice 71: Do nothing. then 72: procedure ConstructPtDG() 120: Add weaving edge from «1 to «2, «1 ^ «2. 73: for entry of pointcut do 121: if node y is an after advice node and node «1 is the last 74: Create pointcut start vertex. statement in the method captured by node «2 then 75: if pointcut contains parameters then 122: Add weaving edge from «1 to «2, «1 ^ «2. 76: Create actual-in and actual-out parameter nodes. 123: if node «1 is the last statement in an after advice and node 77: procedure ConstructAsDG() «2 is the statement followed by method call node and the method is 78: for entry of an aspect do captured by the advice then 79: Create aspect start vertex. 124: Add weaving edge from «1 to «2, «1 ^ «2. 80: for all advices in an aspect do 125: Remove aspect membership edges. 81: Call ConstructADG(). 126: Remove aspect entry vertices. Computing Dynamic Slices of Feature-Oriented Programs with... Informatica 44 (2020) 199-224 223 B Feature-aspect node-marking dynamic slicing (FANMDS) algorithm Algorithm 9 Feature-Aspect Node Marking Dynamic Slicing (FANMDS) Algorithm INPUT: Composite Feature-Aspect Dependence Graph (CFADG) of the program FP, Slicing criterion < i, s,v >. OUTPUT: List of nodes contained in required dynamic slice. 1: Marked =

Initially, unmark all nodes of CFADG. 2: Set dyn_slice(u) =

u is a node in CFADG. 3: Set RecDefn(v) = NULL > visavariable. 4: Execute the program FP for input i. 5: while FP does not terminate do 6: Update dyn_slice(u) = {u, ei, e2,..., ek }Udyn_slice(ei)U dyn_slice(e2) U ... U dyn_slice(ek) 7: Marked = Marked U {u}. > Mark node u. 8: if u is a Defn(v) node then 9: Marked = Marked \ {RecDefn(v)}. > Unmark the node RecDefn(v). 10: RecDefn(v) = u. > Update RecDefn(v). 11: if u is a method call node for a method M then 12: panodeM = f (M,panode). 13: MeM = g(M,Me). 14: pfnodeM = h(M,pfnode). 15: Marked = Marked U {u}. > Mark node u. 16: Marked = Marked U panodeM. > Mark associated actual parameter nodes. 17: Marked = Marked U {MeM}. > Mark corresponding method entry node. 18: Marked = Marked U pfnodeM. > Mark associated formal parameter nodes. 19: if u is a new operator node for a constructor M then 20: panodeM = f (M,panode). 21: MeM = g(M,Me). 22: pfnodeM = h(M,pfnode). 23: Marked = Marked U {u}. > Mark node u. 24: Marked = Marked U panodeM. > Mark associated actual parameter nodes. 25: Marked = Marked U {MeM}. > Mark corresponding method entry node. 26: Marked = Marked U pfnodeM. > Mark associated formal parameter nodes. 27: if u is a polymorphic node for a virtual method M then 28: panodeM = f (M,panode). 29: MeM = g(M,Me). 30: pfnodeM = h(M,pfnode). 31: Marked = Marked U {u}. > Mark node u. 32: Marked = Marked U panodeM. > Mark associated actual parameter nodes. 33: Marked = Marked U {MeM}. > Mark corresponding method entry node. 34: Marked = Marked U pfnodeM . > Mark associated formal parameter nodes. 35: if u is a mixin call node for a method M then 36: panodeM = f (M,panode). 37: MeM = g(M,Me). 38: pfnodeM = h(M,pfnode). 39: Marked = Marked U {u}. > Mark node u. 40: Marked = Marked U panodeM . > Mark associated actual parameter nodes. 41: Marked = Marked U {MeM}. > Mark corresponding method entry node. 42: Marked = Marked U pfnodeM. > Mark associated formal parameter nodes. 43: if u is a Super() method call node for a method M then 44: MeM = h(M,Me). 45: Marked = Marked U {u}. > Mark node u. 46: Marked = Marked U {MeM}. > Mark corresponding method entry node. 47: if u is a pointcut node then 48: badvp = x(P,badv). 49: aadvp = y(P, aadv). 50: panodep = f (P,panode). 51: pfnodeM = g(M,pfnode). 52: Marked = Marked U {u}. > Mark node u. 53: Marked = Marked U panodeM . > Mark corresponding actual parameter nodes. 54: Marked = Marked U pfnodeM . > Mark corresponding formal parameter nodes. 55: Marked = Marked U badvp. > Mark the corresponding before advice entry node. 56: Marked = Marked U aadvp . > Mark the corresponding after advice entry node. 57: if u is an advice entry node for an advice A corresponding to pointcut P then 58: bbadvA = z(A,badvp). 59: baadva = z(A, aadvp). 60: Marked = Marked \ bbadvA. 61: Marked = Marked \ baadvA. > Unmark all nodes in body of advice corresponding to previous execution ofu. 62: pfnodeM = g(M,pfnode). 63: Marked = Marked \ pfnodeM. > Unmark all the formal parameter nodes associated with u corresponding to previous execution of u. 224 Informatica 44 (2020) 199-224 M. Sahu et al. 64: if u is an introduction node such that u is a method then 65: MbM = k(M, Mb). 66: Marked = Marked \ MbM. > Unmark all the nodes in the method body corresponding to previous execution of u. 67: pfnodeM = g(M,pfnode). 68: Marked = Marked \ pfnodeM. > Unmark all the formal parameter nodes associated with u corresponding to previous execution ofu. 69: if v is method call node corresponding to previous execution of u then 70: Marked = Marked \ v. > Unmark the method call node corresponding to previous execution of u. 71: panodev = f (v,panode). 72: Marked = Marked \ panodev. > Unmark the asso- ciated actual parameter nodes for a method call node corresponding to previous execution ofu. 73: pfnodeM = h(M,pfnode). 74: Marked = Marked U {u}. > Mark node u. 75: Marked = Marked U pfnodeM. > Mark associated formal parameter nodes. 76: if u is an introduction node such that u is a field then 77: Marked = Marked U {Defn(v)}. > Mark Defn(v) node. 78: Marked = Marked U{Usage(v)}. > MarkUsage(v) node. 79: if u is a method entry node for a method M then 80: MbM = k(M, Mb). 81: Marked = Marked \ MbM. > Unmark all the nodes in the method body corresponding to previous execution of u. 82: pfnodeM = g(M,pfnode). 83: Marked = Marked \ pfnodeM. > Unmark all the formal parameter nodes associated with u corresponding to previous execution ofu. 84: if v is method call node corresponding to previous execution of u then 85: Marked = Marked \ v. > Unmark the method call node corresponding to previous execution of u. 86: panodev = f (v,panode). 87: Marked = Marked \ panodev. > Unmark the asso- ciated actual parameter nodes for a method call node corresponding to previous execution ofu. 88: if u is a mixin entry node for a method M then 89: MbM = k(M, Mb). 90: Marked = Marked \ MbM. > Unmark all the nodes in the method body corresponding to previous execution of u. 91: pfnodeM = g(M,pfnode). 92: Marked = Marked \ pfnodeM . > Unmark all the formal parameter nodes associated with u corresponding to previous execution of u. 93: if v is method call node corresponding to previous execution of u then 94: Marked = Marked \ v. > Unmark the method call node corresponding to previous execution ofu. 95: panodev = f (v,panode). 96: Marked = Marked \ panodev. > Unmark the asso- ciated actual parameter nodes for a method call node corresponding to previous execution of u. 97: if u is new operator entry node for a constructor M then 98: MbM = k(M, Mb). 99: Marked = Marked \ MbM . > Unmark all the nodes in the method body corresponding to previous execution ofu. 100: pfnodeM = g(M,pfnode). 101: Marked = Marked \ pfnodeM. > Unmark all the formal parameter nodes associated with u corresponding to previous execution of u. 102: if v is method call node corresponding to previous execution of u then 103: Marked = Marked \ v. > Unmark the method call node corresponding to previous execution ofu. 104: panodev = f (v,panode). 105: Marked = Marked \ panodev. > Unmark the asso- ciated actual parameter nodes for a method call node corresponding to previous execution of u. 106: for a given slicing command < i,s,v > do 107: Look up dyn_slice(u) for variable v. 108: Display dyn_slice(u). 109: Map nodes in dyn_slice(u) to corresponding statements in composed Java program. 110: Map statements included in dyn_slice(u) in composed Java program to corresponding statements in composed Jak program. 111: Display statements included in dyn_slice(u) from com- posed Jak program. https://doi.org/10.31449/inf.v44i2.2385 Informatica 44 (2020) 225-198 183 Colour-Range Histogram Technique for Automatic Image Source Detection Nancy C. Woods and Abiodun B. C. Robert Department of Computer Science, University of Ibadan, Ibadan, Nigeria E-mail: Chyn.woods@gmail.com and abc.robert@live.com Keywords: natural images, computer generated images, colour histogram Received: November 28, 2018 Computer generated images are visually becoming increasingly genuine, due to advances in technology as well as good graphic applications. Consequently, making distinction between computer generated images and natural images is no longer a simple task. Manual identification of computer generated images have failed to resolve the problems associated with legal issues on exact qualification of images. In this work, a colour range histogram was developed to categorise colours in computer generated images and natural images from a point of reference. Four groups were selected, using the algorithm, consisting of exact Red-Green-Blue (RGB) code (group 1), colour code within a range of 10 (group 2), colour code within a range of 20 (group 3) and colour code within a range of 30 (group 4) from the point of reference. An optimised equation for the four Colour Code Groups (CCG) was developed. The computer generated images categorised an average of 69.8%, 92.9%, 96.9% and 98.6%, of any colour code for groups 1, 2, 3 and 4, respectively. The categorised colours for natural images were 31.1%, 82.6%, 90.8% and 95.0% for groups 1, 2, 3 and 4, respectively. The results showed that natural images contain a wide range of RGB colours which makes them different. Consequently, the disparity in the percentage of colours categorised can be used to differentiate computer generated images from natural images. Povzetek: Razvit je sistem za razločevanje naravnih od računalniško generiranih umetnih slik na osnovi barvnega histograma. 1 Introduction Digital images have become a commonplace in the lives of individuals nowadays, because of the ease of acquisition using mobile phones and other electronic devices. A digital image can be described as a rectangular two-dimensional array of pixels, where each pixel (usually a square) represents the image colour at that position and where the dimensions represent the width and height of the image as it is displayed [1]. With advances in technology and the proliferation of imaging software, digital images are now classified as either computer generated or natural. With image processing techniques, it is becoming increasingly easy to produce computergenerated images (CGI) that are so realistic with commercially available software packages [2] and these CGI are presently called Photorealistic images. How can one tell if a digital image is natural or computer generated? Usually, a photograph provides an effective and natural communication medium for humans. This is because people do not really need any special training to comprehend the content of an image and they used to believe that photographs represent the truth [3]. Unfortunately, this truth no longer holds with digital images because it is easy to manipulate them [4]. Therefore, being able to verify the credibility of digital images and perform image forensics can protect the truthfulness of digital images. It can be cumbersome and difficult for the human eye to tell the difference between the two types of images [5]. This is what the research carried out under the field of digital image forensics, among other things tries to answer. Digital image forensics is the area of image processing with the main function of assessing the authenticity and the origin of images and is divided into active forensics and passive forensics, which are further sub divided [6], [3], [4]. Figure 1 shows the classification of digital image forensics. In active forensics, additional information needs to be inserted into the host or source of the image in advance. This requires that the acquisition device should have the corresponding functionality to hold such information, some of which include digital signature [7] or digital watermarking [8]. Passive forensics technology is more practical and attempts to identify the authenticity or source of an image, based only on the characteristics of the image itself without embedded additional information [6]. Passive forensics occurs after the image has been captured and stored. Depending on its applications in different research fields, passive forensics can be broadly classified into tampering detection [9], [10], [11], Steganalysis [12] and source identification [13], [6], which is the art and science of differentiating computer generated images from natural images (NI). The aim of this work therefore is to develop a model for colour range histogram towards discovering features that differentiate CGI from NI. 226 Informatica 44 (2020) 225-230 N.C. Woods et al. 2 Literature review Several research has been carried out in a bid to differentiate computer generated images from natural images using several approaches and features. One of the first approaches offered to differentiate NI from CGI was proposed by [14]. In their statistical approach, the first and higher-order statistics of wavelet transform coefficients are extracted from both CGI and NI to capture their statistical regularities. Another work by [15] proposed an approach using differences in image texture by considering the physical / visual properties of these images. They took into account the differences in surface and object models as well as the differences in the acquisition processes between the CGI and NI, and extracted 192 geometry features by analysing the differences existing between the physical generative process of computer graphics and photographs. Their approach extracted a lot of features in a bid to find the difference between CGI and NI. A total of 216 features, based on the RGB colour information of CGI and NI were considered and extracted by [12] in their work. As such, their method employed image decomposition based on separable quadrature mirror filters (QMFs) to capture regularities inherent to photographic images [12]. An approach presented by [16] discriminates CGI from NI based on the lack of artifacts due to the use of a digital camera as an acquisition device for NI [16]. Their technique is based on the fact that image acquisition in a digital camera is fundamentally different from the generative algorithms deployed by computer generated imagery. This difference is captured in terms of the properties of the residual image (pattern noise in case of digital camera images) extracted by a wavelet based de-noising filter. An approach proposed by [17] used features that are based on the differences in the acquisition process of images. First they tried to detect the presence of the colour filter array demosaicking from a given image because most consumer cameras use colour filter array which requires the involvement of a demosaicking operation in generating the RGB colour values. The approach by [17] specifically searched for traces of demosaicking and chromatic aberration which were used to differentiate CGI from NI. Another technique based on the differences in the acquisition process of images was proposed by [18]. The starting point of their research is that the different formation processes, leave distinct intrinsic traces on digital images. In their algorithm, spectral correlations between colour components are exploited efficiently by discrete wavelet transform, block partitioning and normalized cross correlation, and three statistical features are derived to capture the inherent differences between CGI and NI. [6] combined statistical, visual and physical features of digital images to propose features that can differentiate CGI from NI. Their approach amongst other features, extracted the mean and median of the histograms of grayscale image in the spatial and wavelet domain as statistical features. Secondly, the fractal dimensions of grayscale image and wavelet sub-bands were extracted as visual features. And finally, the physical features are calculated from the enhanced photo response non-uniformity noise. Thereafter, a support vector machine (SVM) classifier was used in the classification process. More recently, the researchers in [19] comparing CGI with NI, extracted and used 9 dimensions of texture features. They argued that NI have higher self-similar and have more delicate and complex texture. The work by [5] extracted textural descriptors from images using binary statistical image features and also used SVM as the classifier. According to them, the textural features are different for CGI and NI as their approach was based on learning of natural image statistic filters and further using that to differentiate the two images. In this research work, a model is proposed where colour and statistical features are extracted and combined in identifying features that differentiate CGI from NI. 3 Methodology The proposed model is termed Colour Range Histogram (CRH). The CRH works by first randomly selecting a pixel Axy in an image as a point of reference. Next, the Figure 1: Classification of DigitalImage Forensics. Colour-Range Histogram Technique for Automatic Image Source Informatica 44 (2020) 225-230 227 CRH algorithm fetches the RGB colour code for Ax,y, which is an integer value, due to the programming language used. Then the algorithm checks through all other pixels in the rest of the image to highlight all pixels that have the same RGB colour code as pixel Ax,y. These pixels were classified as group 1 pixels. The complete steps used in CRH are further elucidated in the pseudo codes in Listing 1. In general, the following steps are proposed: 1. For any image (A), with dimension (w X h) where h represents the height and w represents the width of the image, select any pair of coordinates (x, y) that represent a pixel position such that 0 > x y x < imageA_width and 0 > y < imageA_height; Pcolour = imageA.getRGB(x,y); for (int w = 0; w < imageA_width(); w++) for (int h = 0; h < image_height(); h++){ Pixelcolour = image.getRGB(w,h); if (Pcolour = pixelcolour) retain pixel colour else change pixel colour to white; } Save image; Listing 1 : Algorithm for exact colour highlight. Load image = ImageIO.read(new File(path)) width = image.getWidth(); height = image.getHeight(); Pick Pixelxy where 0 > x < width and 0 > y < height; Pcolour = image.getRGB(x,y); for(int x = 0; x < image.getWidth(); x++) for(int y = 0; y < image.getHeight(); y++){ current_Pixelcolour = image.getRGB(x,y); if current_pixelcolour is within range Project the original colour; else Change colour to white; } Save image; End The algorithm above was used to "highlight" group 1 pixels, which are pixels with the exact RGB code as Ax,y. The algorithm was further extended to highlight seven more groups of pixels (Listing 2). These are pixels in the image that have RGB colour codes within certain ranges from RGB (Axy). These groups were: By 'highlight' we mean that their original colour is retained, while the colour of the rest of the image was changed to white. However, if the randomly selected pixel was white in colour then the colour of the rest of the image was changed to Black as can be seen from figure 2c. This highlight was to enable us have a visual clue of the resultant image. In addition to saving the image file for a visual presentation of CRH, eight more features which represent the total number of pixels projected in each group was captured. The image dataset used for this work was obtained from the Internet as well as personal picture collections. A total of 1,620 images were obtained which contained 851 CGIs and 769 NIs. Listing 2: Algorithm for colour range highlight. 4 Results and discussions Figure 2(a-d) shows some of the visual outputs of group 1 for both NI (a & b) and CGI (c & d). From figure 2 (a & b), it was observed that in NI, a colour that appears to be the same visually is actually represented by a wide range of RGB codes. This could be largely due to the demosaicking process that NI undergo while being produced or the lighting conditions when the image was captured. Therefore picking a random pixel colour and projecting all pixels with exactly the same colour code yielded a scanty set of pixels visually. However, for CGI, the visual results are considerably different. The CGI visual results show that a higher number of pixels are projected for group 1. This exact colour projection sometimes corresponded with a 'shape' in the CGI as can be seen in figure 2 (c & d). This could be because most CGI are a combination of various shapes, where each shape is "filled" with the same colour and then the colour of some areas "blended". For the colour range highlight, although eight groups were initially proposed, it was observed that beyond group 4 the number of projected pixels remained almost constant for both CGI and NI. This can be viewed from the projected pixel count result displayed in listing 3 (for a natural image) and listing 4 (for a computer generated image). The listing includes the file chosen, its resolution, the pixel chosen (Axy), the pixel colour RGB(Axy) and finally the various counts of projected pixels by range. Listing 3 showed that 37 pixels were projected for group 1; 98 pixels for group 2, and so on. This result showed that 228 Informatica 44 (2020) 225-230 N.C. Woods et al. for NI, there is usually a gradual increase in the number of projected pixels from group 1 to group 4. Listing 4 however showed that 3242 pixels were projected for group 1; 3488 pixels for groups 2 and beyond. This showed that computer generated images projected almost a constant number pixels. Listing 3: Projected pixel count for a selected natural image. Pi 2Cro T,cr± 30 X 100 1 Equation (1) P2 _ T,cr±io 2 Cr±30 X 100 1 Equation (2) P3 2 Cr±20 2 cr±30 X 100 1 Equation (3) P4 _ 2cr±30 2cr±30 X 100 1 Equation (4) Where: Cr0 = count of projected pixels for group 1 Cr±10 = count of projected pixels for group 2 Cr±20 = count of projected pixels for group 3 Cr±30 = count of projected pixels for group 4 These equations were then optimised to give a generalized equation 5 Pi = CCGi X CCG-1 X K Equation (5) Where Pt is the percentage of projected pixel for a group i i = 1,2,3,4 ; j = 4; CCG is count of projected pixels for groups i or j and K is a constant. The summary of the analysed data is presented in table 1. From table 1 it can be observed that the average File Chosen is C:\...\CGI\ tamar8.jpg Image Width: 564 Image Height: 942 Chosen Pixel is: 114,194 Java RGB code is : -13171452 the real RGB values are: Alpha: 255, Red: 55, Green: 5, Blue: 4 For range 0 ProjectedCount is 3242 For range 10 ProjectedCount is 3488 For range 20 ProjectedCount is 3488 For range 30 ProjectedCount is 3488 For range 40 ProjectedCount is 3488 For range 50 ProjectedCount is 3488 For range 60 ProjectedCount is 3488 For range 70 ProjectedCount is 3488 Listing 4: Projected pixel count for a selected computer generated image. The projected pixel counts for groups 1 to 4 were saved, processed and analysed. Using equations 1-4, the average percentages, P1, P2, P3, P4 of projected pixels were calculated for groups 1, 2, 3 and 4 respectively. I 4k I ¡¡jLAjflfl \ b Figure 2: Results of projected exact colour areas in some images. d a c File Chosen is C:\...\Natural Images\4Egg.jpg Image Width: 1918Image Height: 1077 Chosen Pixel is: 833,392 Java RGB code is : -1920995 the real RGB values are: Alpha: 255, Red: 226, Green: 176, Blue: 29 For range 0 ProjectedCount is 37 For range 10 ProjectedCount is 98 For range 20 ProjectedCount is 246 For range 30 ProjectedCount is 267 For range 40 ProjectedCount is 267 For range 50 ProjectedCount is 267 For range 60 ProjectedCount is 267 For range 70 ProjectedCount is 267 Colour-Range Histogram Technique for Automatic Image Source Informatica 44 (2020) 225-230 229 percentages of projected pixels for natural images were 31.07%, 82.64%, 90.75% and 95.00% for groups 1, 2, 3 and 4, respectively, while CGI projected an average of NI % projected CGI % projected Pi 31.07 69.79 P2 82.64 92.87 P3 90.75 96.87 P4 95.00 98.60 Table 1: Average % pixel colour projection. 69.79%, 92.87%, 96.87% and 98.60%, of any colour code for groups 1, 2, 3 and 4, respectively. This shows that natural images contain a wide range of RGB colour codes for a particular colour that has similar visual colour presentation [20]. For each image, the value of P1, P2, P3,P4 were further analysed in order to distinguish between NI and CGI. The analysis showed that an image is classified as CGI if: AND(P2 -P1< 60; P3-P2 < 30; P4-P3< 15) While an image is classified as NI if: NAND(P2 -P1< 25; P3 - P2 < 12; P4-P3< 6) Using the above results we achieved the following classification percentages CGI NI True Positives 81.6% 87.0% False negatives 18.4% 13.0% 350 i/i § 300 ^ 250 ° 200 150 100 50 0 group 1 group 2 — NIs — group 3 CGIs group 4 Figure 3: Average number of projected pixels. Figure 3 shows a graph of the total number of projected pixels for all the computer generated images and natural images in the sample size. This figure shows that irrespective of the random colour chosen, computer generated images projected almost a "constant" number of pixels across the four groups, this can be seen in figure 3 where the red line for computer generated images is almost a horizontal straight line. The pattern of the blue line for the natural images shows a sharp increase in the number pixels emphasizes from group 1 to group 2 and then a gradual increase from group 2 to group 4. The figure also showed that CGI projected a greater percentage of their total pixels than natural images, despite the fact that most natural images had greater number of pixels than the computer generated images. Consequently, the disparity in percentage emphasised can be used to differentiate computer generated images from natural images. 5 Conclusion In this research work, the RGB colour features of some selected pixels in both natural and computer generated digital images were extracted, grouped and analysed. The analysis revealed that there is a disparity in the percentage selected/emphasized for the two groups of images. Consequently, this disparity in percentage of colours projected, within range 0 to 40 from a point of reference, can be used as a quick method to differentiate computer generated images from natural images. References [1] Oracle, "The Java tutorials," 2 Febuary 2012. [Online]. Available: http://www.oracle.com/java-se-7-tutorial-2012-02-28-1536013.html. [Accessed 14 June 2013]. [2] M. K. Johnson, K. Dale, S. Avidan, H. Pfister, W. T. Freeman and W. Matusik, (2011) "CG2Real: Improving the Realism of Computer Generated Images using a Large Collection of Photographs," IEEE Transactions on Visualization and Computer Graphics, vol. XVII, no. 9, pp. 1273 - 1285. https://doi.org/10.1109/tvcg.2010.233 [3] T.-t. Ng, S.-f. Chang, C.-y. Lin and Q. Sun, (2006) "Passive-blind Image Forensics," in In Multimedia Security Technologies for Digital Rights, Elsevier, pp. 383- 412. https://doi.org/10.1016/b978-012369476-8/50017-8 [4] G. K. Birajdar and V. H. Mankar, (2013). "Digital image forgery detection using passive techniques: A survey," Digital Investigation, vol. 10, no. 3, pp. 226245. https://doi.org/10.1016/j.diin.2013.04.007 [5] G. K. Birajdar and V. H. Mankar, (2017). "Computer Graphic and Photographic Image Classification using Local Image Descriptors," Defence Science Journal, vol. 67, no. 6, pp. 654-663. https://doi.org/10.14429/dsj.67.10079 [6] F. Peng, J. Liu and M. Long, (2012). "Identification of Natural Images and Computer Generated Graphics Based on Hybrid Features," International Journal of Digital Crime and Forensics, vol. IV, no. 1, pp. 1-16. https://doi.org/10.4018/jdcf.2012010101 [7] A. Swaminathan, M. Wu and K. J. R. Liu, (2006). "Component forensics of digital cameras: A non-intrusive approach," in 2006 40th Annual Conference on Information Sciences and Systems. https://doi.org/10.1109/ciss.2006.286646 [8] M. Chandra, S. Pandey and R. Chaudhary, (2010). "Digital watermarking technique for protecting digital images," in 3rd IEEE International 230 Informatica 44 (2020) 225-230 Conference on Computer Science and Information Technology (ICCSIT). https://doi.org/10.1109/iccsit.2010.5565177 [9] A. C. Popescu and H. Farid, (2005). "Exposing digital forgeries in color filter array interpolated images," IEEE Transactions on Signal Processing, vol. 53, pp. 3948-3959. https://doi.org/10.1109/tsp.2005.855406 [10] Wang W., Dong J., Tan T. (2009) A Survey of Passive Image Tampering Detection. In: Ho A.T.S., Shi Y.Q., Kim H.J., Barni M. (eds) Digital Watermarking. IWDW 2009. Lecture Notes in Computer Science, vol 5703. Springer, Berlin, Heidelberg https://doi.org/10.1007/978-3-642-03688-0_27 [11] S. D. Mahalakshmia, K. Vijayalakshmib and S. Priyadharsinia, (2012). "Digital image forgery detection and estimation by exploring basic image manipulations," Digital Investigation, pp. 215-225. https ://doi.org/10.1016/j.diin.2011.06.004 [12] S. Lyu and H. Farid, (2005). "How realistic is photorealistic?," IEEE Transactions on Signal Processing, vol. 53, no. 2, pp. 845-850. https://doi.org/10.1109/tsp.2004.839896 [13] J. Lukas, J. Fridrich and M. Goljan, (2005). "Determining digital image origin using sensor imperfections," in SPIE Electronic Imaging, Image and Video Communication and Processing, San Jose, California. https://doi.org/10.1117/12.587105 [14] H. Farid and S. Lyu, (2003). "Higher-order wavelet statistics and their application to digital forensics," in Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvprw.2003. 10093 N.C. Woods et al. [15] T.-T. Ng, S.-F. Chang, J. Hsu, L. Xie and M.-P. Tsui, (2005). "Physics-motivated features for distinguishing photographic images and computer graphics," in ACM Multimedia, Singapore. https://doi.org/10.1145/1101149.1101192 [16] S. Dehnie, T. Sencar and N. Memon,. (2006). "Digital Image Forensics for Identifying Computer Generated and Digital Camera Images," in IEEE International Conference on image processing. https://doi.org/10.1109/icip.2006.312849 [17] A. E. Dirik, S. Bayram, H. T. Sencar and N. Memon, (2007). "New features to identify computer generated images," in IEEE International Conference on Image Processing 4. https://doi.org/10.1109/icip.2007.4380047 [18] X. Kang, E. Zhang, Y. Chen and Y. Wei, (2011). "Forensic discrimination of computer generated images and photographs using spectral correlations in wavelet domain," Energy Procedia, vol. 13, no. 311, pp. 2174-2182. https://doi.org/10.1016/S1876-6102(14)00454-8 [19] F. Peng, Y. Zhu and M. Long, (2015). "Identification of Natural Images and Computer Generated Graphics using Multi-fractal Differences of PRNU," in ICA3PP 2015: Part II of the 15th International Conference on Algorithms and Architectures for Parallel Processing. https://doi.org/10.1007/978-3-319-27122-4_15 [20] N. C. Woods and C. A. B. Robert, (2017) "A Model for Creating Exact Colour Spectrum for Image Forensic," University of Ibadan Journal of Science and Logics in ICT Research (UIJSLICTR), vol. Volume 1, no. 1, pp. 1-6. https://doi.org/10.31449/inf.v44i2.2385 Informatica 44 (2020) 231-198 183 Data Mining Approach to Effort Modeling on Agile Software Projects Hrvoje Karna, Sven Gotovac and Linda Vickovic University of Split, Poljička cesta 35, 21000, Split, Croatia E-mail: hrvoje.karna@gmail.com, sven.gotovac@fesb.hr, linda.vickovic@fesb.hr Keywords: agile scrum, data mining, effort estimation, k-nearest neighbor, software engineering, project management Received: April 26, 2019 Software production is a complex process. Accurate estimation of the effort required to build the product, regardless of its type and applied methodology, is one of the key problems in the field of software engineering. This study presents the approach to effort estimation on agile software project using local data and data mining techniques, in particular k-nearest neighbor clustering algorithm. The applied process is iterative, meaning that in order to build predictive models, sets of data from previously executed project cycles are used. These models are then utilized to generate estimate for the next development cycle. Used data enrichment process, proved to be useful as results of effort prediction indicate decrease in estimation error compared to the estimates produced solely by the estimators. The proposed approach suggests that similar models can be built by other organizations as well, using the local data at hand and this way optimizing the management of the software product development. Povzetek: V prispevku je predstavljen pristop strojnega rudarjenja za modeliranje agilnih programskih projektov. 1 Introduction Accurate estimation of work effort required to build the product is a critical activity in software development industry [1] and it is carried out on most projects [2]. Previously a number of approaches have been proposed to reliably estimate the effort, such as theoretical [3], formal [4], analogy-based estimation [5], just to name a few. Despite all, expert estimation [6] remains the most widely used method of effort estimation. Regardless of its comparative advantages, such as ease of implementation and the validity of the results it produces [7], expert effort estimation can still be improved [8]. Estimation is particularly challenging in large agile projects [9]. One way to achieve this is to use own, locally built, collections of past project data [10], [11]. The emergence of machine learning algorithms and data mining in general, paired with the availability of tools, has led to progress in application of these methods in practice [12]. This paper presents an approach to effort estimation using data mining techniques, particularly k-Nearest Neighbor (KNN) clustering algorithm [13], on local collection of telco project data. The approach uses local data [14], extracted from the tracking system implemented on the project. The process itself is iterative, implemented in a way that at first it uses a collection of data from initial project phase in order to build primary predictive model. Then in the next project phase - an upgrade, this model is being enriched with the data from the recently completed iteration in order to gradually improve its properties, and thus reduce the estimation error. This research builds upon our previous work [15] now being applied to a large agile project and using different approach to predict effort. Instead of project clustering applied in [15], in this paper KNN is used to cluster work items and for each new instance it finds the nearest neighbors and calculates the model predicted effort. The proposed approach itself follows on one hand the iterative nature of agile scrum methodology [16] implemented on the project while at the same time fitting it to the cyclicality of the CRISP-DM process [17]. This proved to be efficient way to improve estimation accuracy and therefore can be suggested as a method by which organizations can improve the process of project management. The remaining part of this paper is organized as follows: Section 2 presents the current state of the research of the areas being discussed in the paper. Section 3 elaborates the design of the study, applied approach and techniques used to model effort estimation. In Section 4 results are presented together with their implication and potential limitations. The concluding section summarizes the findings and gives directions for future work. 2 Related research Data mining techniques provide a means to analyze and extract patterns from data and through that process produce previously unknown and potentially useful information [18]. They emerged as an interdisciplinary domain with evolution and merging of databases, statistics and machine learning [19]. 232 Informatica 44 (2020) 231-239 H. Karna et al. It can be viewed as a method for discovering knowledge from large sets of data [20]. Data mining consists of a set of techniques applicable for different purposes [21]. Clustering being one among them is particularly useful in prediction [22] and KNN is one of the most widely used algorithms [23]. Research in the field of software development effort estimation is active since the emergence of this industry [24]. During that period this has resulted in the number of approaches intended to estimate the effort required to build the product [25], each with their own advantages and limitations. Up to now, due to its comparative advantages, expert effort estimation remains the most frequently used technique in practice [26]. Paired with modern data analysis techniques it has potential to significantly improve reliability of the estimates [27]. Mining software engineering data raises the interest of researcher for quite some time [28], it also poses specific challenges [29]. It has been applied to different types of data [30], [31] and uses a number of techniques [32]. The application of these techniques is particularly appropriate in software engineering as it is rich in data [33] while, on the other hand, they can be used to optimize the software development process, software itself and support decision making process [34]. Agile development methods emerged from the need to efficiently handle close interaction with the customer, flexibility in requirements definition and the urge to deliver software on time and within the budget [35]. In contrast to sequential, agile development methods propose incremental approach to building of the software product [36]. These practices can also be used to handle the system and team scale issues [37] what is especially important in today's dynamic business environment. Agile scrum executes the project in a sequence of iterations called sprints, where each sprint represents a cycle within which development activities occur [38]. During sprint planning, team members determine sprint goal, prioritize and estimate the effort of work items [39]. 3 Study design This empirical study was performed using local data from a complex telco solution development project executed in large international company. Development of the application was based on Java technology and Oracle DB. Data used for the study refers to the tracking system items and descriptive features of the estimators, as these are the entities used to construct the predictive models. The authors implemented these models before [15], [40], so the selection of predictors was based on their relative importance determined in this, our previous [41] and similar studies [2]. The study exclusively used data required to build predictive models for effort estimation and for this it was sufficient that for example, components are identified as Component_1, Component_2, etc. or that estimators are referred to as Estimator_1, Estimator_2, ..., and so on, with matching attributes taking appropriate values. The average number of estimators per sprint fluctuated around 22, reaching at one point the total of 31. The number of estimation items per Sprint was between 80 and 110, with the total of 1,732 in Phase 1 and 532 in Phase 2. Total actual effort recorded in Phase 1 was 20,814.25 [h] and in Phase 2 sprints 5,344.50 [h]. These numbers indicate that the analyzed project belongs to the class of "large" projects [42]. In the sequence of analyzed sprint data, none of them ended up exactly on the estimated value of effort. Both under and over estimations occurred with relatively same frequency, see Table 1 (sprints 1-19) and Table 3 (sprints 20-24), yet overestimation was more common in early project phase while underestimation was more common in later phase. Sprint Effort [h] Estimation Error Estimated Actual Absolute (Relative) MMRE Pred(0.25) 1 1,335.00 1,297.00 +38.00 (+2.93%) 0.652 0.660 2 1,224.00 1,302.00 -78.00 (-5.99%) 0.215 0.738 3 1,294.00 1,223.00 +71.00 (+5.81%) 0.310 0.673 4 1,173.00 1,171.00 +2.00 (+0.17%) 0.359 0.774 5 375.00 358.00 +17.00 (+4.75%) 0.522 0.733 6 1,328.00 1,278.00 +50.00 (+3.91%) 0.378 0.767 7 1,314.00 1,289.00 +25.00 (+1.94%) 0.301 0.663 8 1,262.00 1,239.00 +23.00 (+1.86%) 0.323 0.670 9 1,056.00 1,078.00 -22.00 (-2.04%) 0.254 0.803 10 1,432.50 1,424.25 +8.25 (+0.58%) 0.210 0.779 11 1,120.00 1,146.00 -26.00 (-2.27%) 0.071 0.879 12 1,518.00 1,479.00 +39.00 (+2.64%) 0.383 0.780 13 1,255.00 1,304.00 -49.00 (-3.76%) 0.092 0.893 14 1,089.00 1,081.00 +8.00 (+0.74%) 0.063 0.925 15 991.00 975.00 +16.00 (+1.64%) 0.226 0.861 16 1,182.00 1,149.00 +33.00 (+2.87%) 0.180 0.843 17 970.00 979.00 -9.00 (-0.92%) 0.194 0.813 18 884.00 922.00 -38.00 (-4.12%) 0.128 0.848 19 118.00 120.00 -2.00 (-1.67%) 0.026 0.923 Table 1: Efforts and estimation error values per sprint for the training set. Data Mining Approach to Effort Modeling on Informatica 44 (2020) 231-239 233 The analyzed data covers Phase 1 (initial version) and Phase 2 (upgrade) of development project. Each phase was implemented in so called sprints i.e. development cycles as defined by the agile scrum methodology. Phase 1 consists of 19, while Phase 2 covers 5 sprints. Each sprint produces a given set of estimation items i.e. data records. The problem that is being solved was weather it is possible to predict the effort of the upcoming Phase 2 sprints by using the knowledge from those completed. Sprints 1 to 19 (S1-S19) were used as initial data base of items for training and test of predictive model, while sprints S20 to S24 served for validation, see Figure 1. Upon building of the initial model M1 this model was used to predict effort of sprint S20. In each following iteration the model was enriched by the data from the last sprint, thus the data set (S1-S19+S20) was used to build model M2 and predict effort of S21, data set (S1-S19+S20+S21) was used to build model M3 and predict effort of S22, etc. This process passed through five iterations that could also be presented as follows: 1st iteration: (S1-S19) } Ml ^ S20 2nd iteration: (S1-S19+S20) } M2 ^ S21 3rd iteration: (S1-S19+S20+S21) } M3 ^ S22 4th iteration: (S1-S19+S20+S21+S22) } M4 ^ S23 5th iteration: (S1-S19+S20+S21+S22+S23) } M5 ^ S24 Expert estimation heavily relies on the intuition where based on the received input information estimator uses his judgment to come up with the solution [43]. This process can be improved by designing models that support the estimation of effort. The proposed predictive model targets the agile software development environment. It uses data mining approach that is explained next in more details. This is followed by the description of the entities that represent the sources of data and the fields used as predictors of the effort. Finally, the modeling method, determined by the selected tool itself is described. 3.1 Data mining process Building of the data mining model considered in this study required the definition of research objective. In this case it was optimization of the software development process through the application of machine learning algorithm in order to provide the way to decrease effort estimation error, thus allowing more efficient management of the project. The data mining process applied in this study uses de-facto industry standard known as CRISP-DM (CRoss-Industry Standard Process for Data Mining). This is iterative process structured around six phases: • Business understanding - identification of the business problem that has to be solved, • Data understanding - obtaining, exploring and verification of the data that will be used, • Data preparation - retirement of the data before it can be used for modeling, • Modeling - selection of appropriate technique, building and assessment of the model, • Evaluation - evaluation of results and review of the process, • Deployment - use of the model in order to improve the business. Understanding of the business and data was established prior and during initial prediction iteration: (S1-S19) } Ml ^ S20. For each next iteration data preparation followed by modeling and evaluation phase was executed. The presented model has academic purpose i.e. evaluation of proposed approach, so currently there is no deployment in real environment. Once the model proves effectiveness, it is possible to recommend its application in practice. 3.2 Entities and data The study uses following entities and related fields as data sources: • Item: these are the records by which the work is represented and stored in the tracking system implemented on the analyzed project i.e. tickets. Variables used to represent work item entity are: Assignment (representing type of item association to the estimator, taking the form of either "own" or "assigned"), Component (identifying the component Figure 1: Model building and prediction process used in the study. 234 Informatica 44 (2020) 231-239 H. Karna et al. within the system that is related to, identified as Component_1, Component_2, ...), Area (refers to the area of work with possible values: PM, QM, CM, System, ..., Other), Activity (refers to the type of activity with possible values: Management, Quality, Design, Implementation, Test, ..., Installation, Documentation, etc.), Type (identifies type of the item according to the applied scrum methodology, being either user story, task, defect, or other) and Priority (or urgency, it indicates the order in which item should be taken into execution in relation to the other items, describe as Prio_1, Prio_2, where Prio_1 refers to the highest priority). As it is evident, these are descriptive attributes related to the item at the moment of its creation. Additional fields associated with the item entity used to record the efforts are: Estimated Effort, Remaining Effort and Actual Effort. These were populated at the moment of item creation and later updated as the work progresses until its completion. • Estimator: the estimator is basically the employee engaged on the project, sometimes referred to as a project team member. In the model the estimator is represented with set of variables describing his: Role (representing his primary occupation on the project, with potential values: Project Manager, Solution Architect, Software Engineer, Configuration Manager, etc.), Seniority Level (representing the level of seniority, being either Junior, Mid-Level or Senior), Total Experience (representing the total number of years of work experience), Company Experience (representing number of years of experience within the current company), Number of Projects (representing number of projects employee participated in while working for the current company) and Estimation Competence (representing the level of estimation competence, being either Beginner, Intermediate or Advanced). The list of fields used as predictors and target, together with associated measurement type is presented in Table 2. 3.3 ^-Nearest Neighbor algorithm The model uses k-Nearest Neighbor (KNN) algorithm. The nearest neighbor (NN) rule assigns to unclassified incoming observation the class of the nearest sample in the set, the simplest form of KNN when k = 1 [44]. KNN is based on measuring the distance between data to decide the final classification output based on their similarity [45]. KNN is an extension of NN and due to its advantages has been used for solving classification problems in numerous domains, the algorithm procedure can be presented as follows [46]: T= i(xi,yi)} ; ¿ = 1.....N Let T denote the training set, where xt 6 is a training vector in the m-multidimensional feature space, and Ji is the corresponding class label. Given a query x', its unknown class y' is assigned in two steps. First, a set of k similarly labelled target neighbors for the query x' is identified. Denote the set Entity Field Name Measurement Role Assignment Flag Component Nominal ITEM Area Nominal Activity Nominal Type Nominal Priority Ordinal Role Nominal Seniority Level Ordinal Predictor O H d) 15 dl 100 dl di u c ra £ (u o. 80 60 40 20 123456789 10 Number of Job Sequence -•-TCT -»-MWT -»-MWE Figure 17: Pareto front (large-size). dominated sets and the large one is resulting 10 such solutions in the resulting Pareto front. 0 Multi-Objective Artificial Bee Colony Algorithms ... Informatica 44 (2020) 241-262 255 from the large-sized dataset. Each solution is represented with individual job completion time and finally the TCT value of the same sequence, followed by MWT and MWE respectively. Each best fitted solution for the large-sized dataset is captured as its Pareto front and is represented in figure 18, with its respective fitness values. The MOABC also yields equally compromising optimized solutions as that of DABC algorithm. The results reveal that the proposed algorithms are superior enough to deal with multi-objectives with a little parameter variation to the canonical ABC. It is a straight forward extension of uni-objective ABC with mixing advantages of local search procedure from the proposed DABC algorithm. We have just applied one of the simplest local search procedure that is two-swap () procedure to optimize the local optima which definitely helps in reducing the algorithmic complexity. From the result analysis, apart from the completion time, it is seen that most of time the earliness penalty is more than the tardiness penalty. Hence with a required priority level of all the objectives a decision maker can easily go for making a balanced decision for him by applying a suitable MCDM method. 7 Decision making with chaotic-TOPSIS After generating successful optimized solution set, we cannot avoid for selecting an appropriate one among these during the decision making process. MCDM is a successful tool for decision making with conflicting P Final job sequence & completion time TCT MWT MWE 1 J2 J3 J4 J5 J6 J7 J8 J9 J0 J1 104 8.96 10.27 24 51 62 72 77 83 87 95 98 104 2 J3 J4 J5 J6 J7 J8 J9 J0 J1 J2 97 7.54 10.60 43 54 64 69 75 79 87 90 96 97 3 J4 J5 J6 J7 J8 J9 J0 J1 J2 J3 99 7.19 12.86 36 56 61 67 71 79 82 88 89 99 4 J5 J6 J7 J8 J9 J0 J1 J2 J3 J4 106 9.8 9.47 55 60 66 70 78 81 87 88 98 106 5 J6 J7 J8 J9 J0 J1 J2 J3 J4 J5 101 6.54 21.29 32 39 43 51 57 63 64 80 91 101 6 Jo J1 J5 J3 J4 J2 J6 J7 J8 J9 109 10.39 5.03 43 49 70 80 88 89 93 98 102 109 7 Ji J2 J6 J4 J5 J3 J7 J8 J9 J0 99 7.37 15.01 27 29 50 60 71 81 86 90 97 99 8 J2 J3 J7 J5 J6 J4 J8 J9 J0 J1 107 10.15 8.76 24 51 61 74 79 89 93 100 102 107 9 J4 J5 J9 J7 J8 J6 J0 J1 J2 J3 99 7.19 11.21 36 56 66 71 75 79 82 88 89 99 10 Jo J1 J5 J4 J3 J2 J6 J7 J8 J9 111 11.05 4.29 43 49 70 80 90 91 95 100 104 111 Table 15: Non-dominated job sequence. P Final job sequence & completion time TCT MW T MW E 1 J0 J1 J3 J2 21 2.0 4.36 12 14 18 21 2 J3 J1 J2 J0 20 0.9 5.9 9 11 15 20 3 J1 J3 J0 J2 19 0.72 6.9 5 10 15 19 4 J3 J2 J1 J0 21 1.63 5.27 9 14 16 21 5 J1 J3 J2 J0 20 0.90 6.9 10 15 20 6 J3 J2 J0 J1 21 1.63 4.18 9 14 19 21 7 J1 J2 J3 J0 22 1.54 5.63 5 13 17 22 Table 14: Non-dominated job sequence. 6.3.2.1 Small-size dataset 7 non-dominated solutions emerged from the first dataset and are listed in Table 14. These solutions can be further evaluated by the decision maker to reach at the definite goal. The resulted non-dominated set of table 14 has been depicted to the corresponding Pareto front in figure 16. The 3 objectives fitness values show a clear graphical visualization of the non-dominated set. 6.3.2.2 Large-size dataset Table 15 stores the 10 non-dominated solutions emerged 256 Informatica 44 (2020) 241-262 M. Panda et al. criterion. Various methods show their respective efficiency in this regard. By a comparative survey we have concluded to decide the final optimal solution here with in our problem using TOPSIS method which really seems to be fit .We have summarized some of the recent TOPSIS applications followed by the discussions of our motivation. Li et al. [47] presents a new method based on TOPSIS and response surface method (RSM) for MCDM problems with interval number. Similarly Madi et al. [48] provided a detailed comparison of TOPSIS and Fuzzy-TOPSIS in a systematic and stepwise manner. Sotoudeh-Anvari [49] suggested a stochastic multi-objective optimization model for assigning resource and time in order to search the individuals who are trapped in disaster regions. To reduce the heavy computation of the model, two efficient MCDM techniques, i.e. TOPSIS and COPRAS are employed which tackles the ranking problem. Zavadskas et al. [50] reviewed 105 papers which developed, extended, proposed and presented TOPSIS approach for solving DM problems from 2000 to 2015. Recently Wu et al.[51] proposes an improved methodology for handling ships which uses TOPSIS method to make the final decision. TOPSIS TOPSIS was developed by Hwang and Yoon [52] in the year of 1981 as an alternative to the elimination and choice translating reality (ELECTRE) method. The basic idea of TOPSIS is quite simple and it has been originated from a displaced ideal point from which the selected solution has shortest distance [53-54]. Further it is refined [52] to the rank based method by assigning specific orders to the available alternatives. The whole concept is based on the two artificial ideal points; that is the ultimate solution is measured by having longest distance from the positive ideal solution (PIS) and the shortest from the negative ideal solution (NIS). Hence a preference order of all alternatives is generated as per their relative closeness to the ideal solutions. As concluded by Kim et al. [55] and our observations, basic TOPSIS advantages are recorded as: (i) It is an accepted logic that is focused to rationale of human choice; (ii) A scalar value justifies both the ideal alternatives together; (iii) Simple algorithmic framework and can easily be coded to the spreadsheet; (iv) A straightforward performance evaluation of all alternatives against the defined criteria which can be clearly visualized and represented for two or more dimensions. The above defined advantages make TOPSIS an omnipresent MCDM technique as compared with rest techniques [52]. In fact it is a utility-based method that evaluates every alternative directly depending on the available data in the decision matrices and weights [56]. Apart from this, the simulation comparison [57] of TOPSIS method signifies that it has the fewest rank reversals apart from rest methods in the category. Thus, TOPSIS is chosen as the backbone of MCDM. The preliminary issue with the method is the normalized decision matrix operation, where randomness is achieved while assigning the criterion weights. Hence a narrow gap derived between the performed measures due to the weighted normalized value of the decision matrix. It can be advantageous to substitute this randomness with a suitable chaotic map. Chaos has a very similar property to randomness with better statistical and dynamical characteristic. Such a dynamic mixing is truly appreciated to enhance solutions potentiality by touching every mode in a multi-objective landscape. Hence the use of a well-suit chaotic map in TOPSIS can be definitely helpful to enhance the decision making by generating preferred randomness in criterion weight. Chaotic maps Simulation of complex phenomena such as: numerical analysis, decision making, sampling, heuristic optimization etc. needs random sequences for a longer period and good uniformity [58]. Chaotic map is a deterministic, discrete-time dynamic system that is considered as source of randomness, which is non-period, bounded and non-converging [59-60]. However the nature of chaotic maps is apparently random, unpredictable and it has a very sensitive dependence on its initial condition and parameter [58]. A chaotic map can be represented as: **+i = f(X),0,X < 1, k = 0,1,2,3... (12) Different selected chaotic maps that produce chaotic numbers in [0, 1] are listed below in table 16 [59-60]. Chao Map Definition Logistic Map Xn+1 = 4Xn (1 - Xn ) Circle Map xn+1 = xn +1.2 - (0.5/ 2wcn) mod(1) Gauss Map x"=0 1 [1/ xn mod(1), otherwise J 1/xk mod(1) =1/xk -[1/xk] Henon Map Xn+1 = 1 1 * 4Xn2 + 03Xn-1 Sinusoidal Map Xn+1 = sinK, ) Sinus Map Xn+1 = 2.3(Xn )2SIn(-») Tent Map x fc/0.7, Xn < 0.7 I n+1 [10/3xb (1 - x), otherwiseJ Table 16: Different Chaotic Maps. Multi-Objective Artificial Bee Colony Algorithms ... Alternative TCT MWT MWE Closeness coefficient Rank A1 19 0.72 5.54 0.1524 10 A2 20 0.9 5.9 0.2130 9 A3 21 2.0 4.36 0.5606 5 A4 21 1.3 5.6 0.3238 8 As 22 1.54 5.27 0.4201 7 A6 22 1.54 5.63 0.4302 6 A7 23 2.36 4.0 0.7036 4 A8 21 2.54 4.0 0.7278 3 A9 21 2.54 4.36 0.7475 2 A10 23 2.9 4.36 0.8476 1 Table 17: Alternatives from DABC-I. Alternative TCT MWT MWE Closeness coefficient Rank A1 19 0.72 6.9 0.2462 7 A2 19 1.27 5.27 0.2515 6 A3 20 0.90 6.9 0.2704 5 A4(A5,As) 21 1.63 5.27 0.3907 3 A7 21 1.63 4.18 0.3604 4 A8 (A9) 22 2.72 3.63 0.6813 2 A10 23 3.18 4.0 0.7755 1 Table 18: Alternatives from DABC-II. Alternative TCT MWT MWE Closeness coefficient Rank A1 20 1.45 4.54 0.179 6 8 A2 21 2.0 4.36 0.396 7 6 A3 21 2.54 4.0 0.668 8 5 A4 21 2.54 4.36 0.687 3 4 A5(As) 21 1.63 4.18 0.198 3 7 A7 22 2.72 3.63 0.756 4 3 A8(A9) 23 2.9 4.3 0.946 1 1 A10 22 2.72 4.36 0.835 6 2 Table 19: Alternatives from DABC-III. Again it is a challenging task to find out a proper and suitable chaotic function to well fit to our decision making problem. Researchers used a number of chaotic sequences to tune various parameters in various meta-heuristic optimization algorithms such as particle swarm optimization[61-62], genetic algorithms[63], harmony search[60], imperialist competitive algorithm [64], ant and bee colony optimization [65, 59], firefly algorithm [62] and simulated annealing [66]. Each research in different direction has shown some promise once the right set of chaotic maps is applied. Gandomi and Yang [67] founds sinusoidal map is the most suitable for the bat algorithm to replace with loudness and pulse rate Informatica 44 (2020) 241-262 257 Altern -ative TCT MWT MWE Closeness coeff Rank A1 92 5.21 14.43 0.2960 10 A2 96 6.11 14.6 0.3667 9 A3 98 7.49 11.94 0.4132 8 A4 100 7.25 12.84 0.4320 7 A5 104 8.68 14.0 0.6617 3 A6 106 8.84 17.8 0.8070 1 A7 106 9.7 12.52 0.6873 2 A8 107 10.17 9.0 0.5862 5 A9 106 9.7 9.11 0.5640 6 A10 107 9.66 11.9 0.6613 4 Table 20: Alternatives from DABC-I. Altern -ative TCT MWT MWE Closeness coefficient Rank A1 94 6.07 14.82 0.2946 9 A2 95 5.3 20.49 0.4250 6 A3 95 5.74 13.64 0.2352 10 A4 95 6.8 13.6 0.3047 8 A5 100 7.5 13.47 0.3839 7 A6 109 7.78 15.82 0.5232 3 A7 105 8.7 10.6 0.4484 4 A8 104 9.17 8.76 0.4466 5 A9 111 11.13 9.45 0.5918 1 A10 107 9.66 11.9 0.5668 2 Table 21: Alternatives from DABC-II. respectively. Similarly Gandomi et al. [61] have experimented twelve different chaotic maps to tune the major parameters of PSO. They revealed sinusoidal map and singer map perform better result in comparison to the rest. Talatahari et al.[64] proposed in a novel chaotic improved imperialist competitive algorithm by investing seven different chaotic maps and sinusoidal and logistic maps are found as the best choices. Also in Gandomi et al. [62] experimentally revealed sinusoidal map and gauss maps are the best performed chao to be adopted for firefly algorithm. Most experimental results proved sinusoidal as a common better performing random generator. By watching the efficiency of sinusoidal map, we have used the same to find out the random numbers in the TOPSIS weight assignment procedure. Again it is important for the decision maker to maintain the priority level of all criterions. To cope up with this we have sorted the random numbers and assigned them to the respective criterions. Decision results To finalize the decision results we have generated a set of three chaotic numbers using sinusoidal map and sorted them to represent different criterion weights. With respect to each decision matrix we have allotted the same criterion weight, in a preference order i.e., {0.5, 0.3, 0.2}. Here we have assumed of TCT with highest preference, 258 Informatica 44 (2020) 241-262 M. Panda et al. Altern -ative TCT MWT MW E Closen ess coeffic ient Rank Altern -ative TCT MWT MWE Closenes s coeff Rank A1 21 2.0 4.36 0.7498 1 A2 20 0.9 5.9 0.2380 7 Ai 94 6.07 14.82 0.2946 9 A3 19 0.72 6.9 0.2529 6 A2 95 5.3 20.49 0.4250 6 A4 21 1.63 5.27 0.6696 2 A3 95 5.74 13.64 0.2352 10 A5 20 0.90 6.9 0.3065 5 A4 95 6.8 13.6 0.3047 8 A6 21 1.63 4.18 0.6129 4 A5 100 7.5 13.47 0.3839 7 A7 22 1.54 5.63 0.6554 3 A6 109 7.78 15.82 0.5232 3 Table 23: Alternatives from MOABC (Small-sized). A7 105 8.7 10.6 0.4484 4 Alternative TCT MWT MWE Closeness coefficient Rank As 104 9.17 8.76 0.4466 5 A9 111 11.13 9.45 0.5918 1 A1 104 8.96 10.27 0.1132 6 A10 107 9.66 11.9 0.5668 2 A2 97 7.54 10.60 0.1098 7 Ai 93 5.7 13.6 0.2540 9 A3 99 7.19 12.86 0.1441 4 A2 94 5.4 14.47 0.2760 8 A4 106 9.8 9.47 0.8105 1 A3(A4) 95 5.54 19.31 0.4281 6 A5 101 6.54 21.29 0.2516 2 A5 96 5.64 19.0 0.4291 5 A6 109 10.39 5.03 0.0744 10 A6 99 6.2 19.17 0.4806 3 A7 99 7.37 15.01 0.1750 3 A7 97 7.52 11.19 0.3873 7 A8 107 10.15 8.76 0.1015 8 As 101 8.41 8.43 0.4526 4 A9 106 9.96 9.19 0.6012 1 A9 99 7.19 11.21 0.1192 5 A10 106 9.3 9.9 0.5810 2 A10 111 11.05 4.29 0.0847 9 Table 22: Alternatives from DABC-III. then MWT and lastly MWE. The decision matrices are nothing but various resulted non-dominated sequences of TCT, MWT and MWE. For every individual decision matrix we have generated the closeness coefficient value w.r.t both the ideal solutions and so as the ranks. Firstly we have calculated the ranks of all the alternatives generated from DABC-I, DABC-II and DABC-III for the small-size dataset followed by the large one. Lastly the alternatives from MOABC are evaluated in the same sequence. DABC (Small-sized) Table 17 represents the alternatives generated from DABC-I. 10 alternatives are evaluated with the proposed chao-TOPSIS procedure and the ranks are presented. Alternative A10 is having highest closeness coefficient value than all, hence is chosen as rank 1 alternative for the decision maker. The non-dominated sequences of DABC-II (Table 18) are having some of the repeating sequences; hence they are treated as one single alternative. Alternatives A4, A5, A6 are the same sequences and that of alternatives A8 and A9. These repeating sequences are the result of selecting the proportionately best fitness values from each objective function and application of local search algorithms repeatedly to a small sized data set. Hence altogether we have evaluated 7 sequences and the last alternative A10 is the best ranked. Similarly table 19 contains the resulting sequences of DABC-III. Out of 10 sequences two pairs ((A5=A) and Table 24: Alternatives from MOABC (Large-sized). (A8=A9)) are repeated sequences. Hence 8 sequences are evaluated against the three objectives using chaotic-TOPSIS. The calculation shows, the seventh sequence i.e. A8 ( or A9) is having rank 1. DABC (Large-size Dataset) The large-sized synthetic dataset has again 3 decision matrices from DABC-I, DABC-II and DABC-III to be evaluated. Table 20 contains the decision matrix resulted from DABC-I. The 10 different alternatives (sequences) are having different closeness coefficient values and A6 is the highest ranked alternative. The following decision matrix of Table 21 is the resulted optimized sequence of DABC-II for the large input data. Each alternative are processed to check the best set of functional values from the calculated closeness coefficient value. Here alternative A9 is found to be superior one. The non-dominated sequence of DABC-III is represented as the decision matrix in Table 22 with 10 alternatives. Two alternatives A3 and A4are having same sequences. Hence altogether 9 different sequences are processed and according to chaotic-TOPSIS, A9 is the best one to be chosen by the decision maker. MOABC Table 23 contains the non-dominated sequence of MOABC for the small sized data set. It is consisting of 7 alternatives and chaotic-TOPSIS valuates A1 as the suitable alternative for the decision maker among all. Multi-Objective Artificial Bee Colony Algorithms ... Informatica 44 (2020) 241-262 259 The non-dominated sequences of MOABC for the large data input is consisting of 10 sequences and are represented in Table 24. After checking the closeness coefficient values A4 is found as the best alternative among all. The use of generating random numbers using different chaotic functions has been one of the remarkable techniques to tune the parameters in various algorithms in many fields, and this has become an active research topic in the recent optimization literature. By watching its advantage, we have introduced the concept of chaotic map to the standard TOPSIS, and have checked for the best alternative among a set of non-dominated solutions. The decision makers will be definitely confident enough to take a right decision among the conflicting ones using the approach. 8 Conclusions and future research The DABC and MOABC algorithms were coded and applied to the multiple instances of dataset ranging from 3 jobs with 3 machines to 10 jobs and 9 machines. In this paper, we considered the MOPFSSP under the multiple (three) criteria. The DABC algorithm is hybridized with a variant of iterated greedy algorithms employing a local search procedure based on insertion (), swap () and destruct- construct () neighborhood structures. In addition, we also presented an extended version of ABC algorithm to the proposed MOABC algorithm employed through a particular local search procedure with reduced complexity. Our proposal is having a significant application of DABC to check the time complexity of different local search procedures. Hence, we are motivated to use simple swap () operation in local search procedure in the MOABC algorithm. The performances of both the proposed algorithms were tested by using different instances of datasets and it has been shown that the performances of both DABC and MOABC algorithms are highly competitive with the best performing existing literature. Also we have extended our work to optimize the non-dominated solutions to a single optimal solution using chaotic-TOPSIS method to derive the optimal decision in the field of MCDM. The proposed approach will definitely help the decision makers to solve various MCDM problems in future. Further the problem of FSSP can be extended with no-wait flowshop, blocking flowshop and no-idle flowshop, etc. Apart from three criteria we may practically have a many objective (MaO) PFSSP, which will obviously increase the number of non-dominated solutions in the search space. We may further work to find other effective ways to make a right decision for the decision makers to reach at a definite goal. 9 Acknowledgment The data applied in this study is consisting of synthetic datasets ranges from small-size to large-sized ones. 10 References [1] K R Baker. Introduction to sequencing and scheduling. John Wiley & Sons Inc. New York,1974. [2] S. M. Johnson. Optimal two- and three-stage production schedules with setup times included. Naval Research Logistics Quarterly, 1(1): 61-68, 1954. https://doi.org/10.1002/nav.3800010110 [3] D. S. Palmer. Sequencing jobs through a multistage process in the minimum total time-a quick method of obtaining a near optimum. Operations Research Society, 16(1): 101-107, 1965. https://doi.org/10.2307/3006688 [4] Jatinder N. D. Gupta. A functional heuristic algorithm for the flow shop scheduling problem. Operations Research Quarterly, 22(1)39-47, 1971. https://doi.org/10.2307/3008015 [5] Herbert G. Campbell, Richard A. Dudek and Milton L. Smith. A heuristic algorithm for the n-job, m-machine sequencing problem. Management Science, 16(10): .630-637, 1970. https://doi.org/10.1287/mnsc. 16.10.b630 [6] David G. Dannenbring. An evaluation of flow shop sequencing heuristics. Management Science, 23(11):1174-1182, 1977. https://doi.org/10.1287/mnsc.23.11.1174 [7] Nawaz, Muhammad, Enscore Jr, E Emory and Ham, Inyong. A heuristic for the m-machine n-job flow shop sequencing problem. Omega, 11(1): 9195, 1983. [8] S P Bansal. Minimizing the sum of completion times of n-jobs over m-machines in a flowshop: a branch and bound approach. AIIE Transactions, 9(3):306-311, 1977. https://doi.org/10.1080/05695557708975160 [9] Chia-Shin Chung, James Flynn and Omer Kirca. A branch and bound algorithm to minimize the total flow time for m-machine permutation flowshop problems. International Journal of Production Economics, 79(3): 185-196, 2002. https://doi.org/10.1016/s0925-5273(02)00234-7 [10] Edward Ignall and Linus Schrage. Application of the branch and bound technique to some flow-shop scheduling problems. Operations Research, 13( 3): 400-412, 1965. https://doi.org/10.1287/opre.13.3.400 [11] S.L. van de Velde. Minimizing the sum of the job completion times in the two-machine flow shop by Lagrangian relaxation. Annals of Operations Research, 26(1-4):257-268, 1990. https://doi.org/10.1007/bf03500931 [12] Willem J. Selen and David D. Hott. A mixed-integer goal-programming formulation of the standart flow-shop scheduling problem. Operation Research Society, 37(12) :1121-1128, 1986. https://doi.org/10.2307/2582302 [13] J. M. Wilson. Alternative formulations of a flowshop scheduling problem. Operation Research Society, 40(4): 395-399, 1989. https://doi.org/10.1057/jors.1989.58 260 Informatica 44 (2020) 241-262 [14] Richard L. Daniels and Robert J. Chambers. Multi-objective flowshop scheduling. Naval Research Logistics, 37(6): 981-995, 1990. https://doi.org/10.1002/1520-6750(199012)37:6%3C981::aid-nav3220370617%3E3.0.co;2-h [15] Chandrasekharan Rajendran. Heuristics for scheduling in flowshop with multiple objectives. European Journal of Operation Research, 82(3):540-555, 1995. https://doi.org/10.1016/0377-2217(93)e0212-g [16] Neelam Tyagi, R. P. Tripathi and A. B. Chandramouli. Three Machines Flowshop Scheduling Model with Bicriterion Objective Function, 9(48): 1-14, 2016. https://doi.org/10.17485/ijst/2016/v9i48/103012 [17] S.M. Mousavi, I. Mahdavi, J. Rezaeian and M. Zandieh. Bi-objective scheduling for the re-entrant hybrid flow shop with learning effect and setup times. Scientia Iranica, 25(4): 2233-2253, 2017. https://doi.org/10.24200/sci.2017.4451 [18] Karunakaran Chakravarthy and Chandrasekharan Rajendran. A heuristic for scheduling in flowshop with bi-criteria of makespan and maximum tardiness minimization. Production Planning & Control, 10(7): 707-714, 1999. https://doi.org/10.1080/095372899232777 [19] R.K. Suresh and K.M. Mohanasundaram. Pareto archived simulated annealing for permutation flow shop scheduling with multiple objective. Proceedings of the 2004 IEEE Conference on Cybernetics and Intelligent Systems, 712-717, 2004. https://doi.org/10.1109/iccis.2004.1460675 [20] T.K. Varadharajan and Chandrasekharan Rajendran. A multi-objective simulated-annealing algorithm for scheduling in flowshops to minimize the makespan and total flowtime of jobs. Europian Journal of Operation Research, 167(3): 772-795, 2005. https://doi.org/10.1016/j.ejor.2004.07.020 [21] B. Shahul Hamid Khan and Kannan Govindan. Multi-objective simulated annealing algorithm for permutation flow shop scheduling problem. International Journal of Advanced Operations Management, 3(1):88-100, 2011. https://doi.org/10.1504/ijaom.2011.040661 [22] T. Loukil, J. Teghem and D. Tuyttens. Solving multi-objective production scheduling problems using metaheuristics. European Journal Operation Research, 161(1):42-61, 2005. https://doi.org/10.1016/j.ejor.2003.08.029 [23] Xiangtao Li and Shijing Ma. Multi-objective memetic search algorithm for multi-objective permutation flow shop scheduling problem. IEEE Access, 4: 2154-2165, 2016. https://doi.org/10.1109/access.2016.2565622 [24] Fuyu Yuan, Xin Xu and Minghao Yin. A novel fuzzy model for multi-objective permutation flow shop scheduling problem with fuzzy processing M. Panda et al. time. Advances in Mechanical Engineering, 11(4):1-9, 2019. https://doi.org/10.1177/1687814019843699 [25] S. Chandrasekaran, S. G. Ponnambalam, R. K. Suresh and N. Vijayakumar. Multi-objective particle swarm optimization algorithm for scheduling in flowshops to minimize makespan, total flowtime and completion time variance. IEEE Congress on Evolutionary Computation, 40124018, 2007. https://doi.org/10.1109/cec.2007.4424994 [26] Vincent T'kindt, Nicolas Monmarche, Fabrice Tercinet and Daniel Laugt. An ant colony optimization algorithm to solve a 2-machine bicriteria flowshop scheduling problem. European Journal of Operation Research, 142(2):250-257, (2002). https://doi.org/10.1016/s0377-2217(02)00265-5 [27] Betul Yagmahan and Mehmet Mutlu Yenisey. Ant colony optimization for multi-objective flow shop scheduling problem. Computers and Industrial Engineering, 54(3):411-420, 2008. https://doi.org/10.1016/j.cie.2007.08.003 [28] B.M.T. Lin, C.Y. Lu, S.J. Shyu and C.Y. Tsai. Development of new features of ant colony optimization for flowshop scheduling. International Journal of Production Economics, 112( 2) :742-755, 2008. https://doi.org/10.1016/j.ijpe.2007.06.007 [29] M. Ziaee, S.J. Sadjadi, J.L. Bouquard. An Ant Colony Algorithm for the Flowshop Scheduling Problem. Journal of Applied Sciences. 8(21): 39383944, 2008. https://doi.org/10.3923/jas.2008.3938.3944 [30] Dervis Karaboga. An idea based on honey bee swarm for numerical optimization. Technical Report TR06. Computer Engineering Department. Erciyes University. Turkey, 2005. [31] Dervis Karaboga and Bahriye Basturk. A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. Journal of Global Optimization, 39(3):459-471, 2007. https://doi.org/10.1007/s10898-007-9149-x [32] Dervis Karaboga and B. Basturk. On the performance of artificial bee colony (ABC) algorithm. Applied Soft Computing, 8(1): 687-697, 2008. https://doi.org/10.1016/j.asoc.2007.05.007 [33] Nurhan Karaboga. A new design method based on artificial bee colony algorithm for digital IIR filters. Journal of the Franklin Institute, 346 (4): 328-348, 2009. https://doi.org/10.1016/j.jfranklin.2008.11.003 [34] Dervis Karaboga and Bahriye Akay. A comparative study of artificial bee colony algorithm. Applied Mathematics and Computation, 214 (1):108-132, 2009. https://doi.org/10.1016/j.amc.2009.03.090 [35] Dervis Karaboga and Bahriye Akay. A survey: Algorithms simulating bee swarm intelligence. Multi-Objective Artificial Bee Colony Algorithms ... Artificial Intelligence Review, 31(1-4):68-85, 2009. https://doi.org/10.1007/s10462-009-9127-4 [36] Sangeeta Sharma and Pawan Bhambu. Artificial bee colony algorithm: A survey. International Journal of Computer Applications, 149(4):11-19, 2016. https://doi.org/10.5120/ijca2016911384. [37] Pradeep Kumar Singh. A systematic review on artificial bee colony optimization technique. International Journal of Control Theory and Applications, 9(11): 5487-5500, 2016. [38] Tugge Anilan, Ergun Uzlu, Murat Kankal and Omer Yuksek. The estimation of flood quantiles in ungauged sites using teaching-learning based optimization and artifcial bee colony algorithms. Scientia Iranica, 2017. https://doi.org/10.24200/sci.2017.4185 [39] Valery Tereshko. Reaction-diffusion model of a honeybee colony's foraging behaviour, in: PPSN VI. Proceedings of the Sixth International Conference on Parallel Problem Solving from Nature, Springer-Verlag, 807-816, 2000. https://doi.org/10.1007/3-540-45356-3_79 [40] Su-jun Zhang and Xing-sheng Gu. An effective discrete artificial bee colony algorithm for flow shop scheduling problem with intermediate buffers. Journal of Central South University, 22(9):3471-3484, 2015. https://doi.org/10.1007/s11771-015-2887-x [41] Hoon-Shik Woo and Dong-Soon Yim. A heuristic algorithm for mean flowtime objective in flowshop scheduling. Computers and Operations Research, 25(3):175-182. https://doi.org/10.1016/s0305-0548(97)00050-6 [42] I. Kassabalidis, M.A. El-Sharkawi, R.J. Marks, P. Arabshahi and A.A. Gray. Swarm intelligence for routing in communication networks. Global Telecommunications Conference, 3613-3617, 2001. https://doi.org/10.1109/glocom.2001.966355 [43] M. Fatih Tasgetiren, Quan-Ke Pan, P.N. Suganthan and Angela H-L Chen. A discrete artificial bee colony algorithm for the total flowtime minimization in permutation flowshops. Information Sciences, 181(16): 3459-3475, 2011. https://doi.org/10.1016/j.ins.2011.04.018 [44] Bertrand Mareschal. Weight stability intervals in multicriteria decision aid. Europian Journal of Operation Research, 33(1) :54-64,1988. https://doi.org/10.1016/0377-2217(88)90254-8 [45] Dervis Karaboga, Beyza Gorkemli, Celal Ozturk and Nurhan Karaboga. A comprehensive survey: Artificial bee colony (ABC) algorithm and applications. Artificial Intelligence Review, 42(1): 21-57, 2014. https://doi.org/10.1007/s10462-012-9328-0 [46] E. Taillard, E. Benchmarks for basic scheduling problems. European Journal of Operational Research, 64(2):278-285, 1993. https://doi.org/10.1016/0377-2217(93)90182-m Informatica 44 (2020) 241-262 261 [47] Peng Wang, Yang Li, Yong-Hu Wang and Zhou-Quan Zhu. A new method Based on TOPSIS and Response Surface Method for MCDM problems with interval numbers. Mathematical Problems in Engineering. Article ID 938535, 2015:1-11, 2015. https://doi.org/10.1155/2015/938535 [48] Elissa Nadia Madi, Jonathan M. Garibaldi and Christian Wagner. An exploration of issues and limitations in current methods of TOPSIS and fuzzy TOPSIS. IEEE International Conference on Fuzzy Systems, 2098-2105, 2016. https://doi.org/10.1109/fuzz-ieee.2016.7737950 [49] Alireza Sotoudeh-Anvari, Seyed Jafar Sadjadi, Seyed Mohammad Hadji Molana and Soheil Sadi-Nezhad. A stochastic multi-objective model based on the classical optimal search model for searching for the people who are lost in response stage of earthquake. Scientia Iranica, 26(3):1842:1864, 2019. https://doi.org/10.24200/sci.2018.20226 [50] Edmundas Kazimieras Zavadskas, Abbas Mardani, Zenonas Turskis, Ahmad Jusoh and Khalil MD Nor. Development of TOPSIS method to solve complicated decision-making problems: An overview on developments from 2000 to 2015. International Journal of Information Technology & Decision Making, 15 (3): 1-38, 2016. https://doi.org/10.1142/s0219622016300019 [51] Bing Wu, Likang Zong, Xinping Yan and C. Guedes Soares. Incorporating evidential reasoning and TOPSIS into group decision-making under uncertainty for handling ship without command. Ocean Engineering,164: 590-603, 2018. https://doi.org/10.1016/j.oceaneng.2018.06.054 [52] Ching-Lai Hwang and Kwangsun Yoon. Multiple Attribute Decision Making. Springer-Verlag, Berlin, 58-191, 1981. https://doi.org/10.1007/978-3-642-48318-9_3 [53] Sheldon M. Belenson and Kailash C. Kapur. An algorithm for solving multi-criterion linear programming problems with examples. Operational Research Quarterly, 24(1): 65-77, 1973. https://doi.org/10.2307/3008036 [54] Milan Zelany. A concept of compromise solutions and the method of the displaced ideal' Computers and Operations Research, 1( 3-4): 479-496,1974. https://doi.org/10.1016/0305-0548(74)90064-1 [55] Gyutai Kim, Chan S Park and K.Paul Yoon. Identifying investment opportunities for advanced manufacturing systems with comparative-integrated performance measurement. International Journal of Production Economics, 50(1): 23-33, 1997. https://doi.org/10.1016/s0925-5273(97)00014-5 [56] Steven Cheng, Christine W. Chan and Guo H. Huang. Using multiple criteria decision analysis for supporting decision of solid waste management. Journal of Environmental Science and Health. Part A, 37(6): 975-990, 2002. https://doi.org/10.1081/ese-120004517 [57] Stelios H. Zanakis, Anthony Solomon, Nicole Wishart and Sandipa Dublish. Multi-attribute 262 Informatica 44 (2020) 241-262 M. Panda et al. decision making: A simulation comparison of selection methods. European Journal of Operational Research, 107(3):507-529, 1998. https://doi.org/10.1016/s0377-2217(97)00147-1 [58] Leandro dos Santos Coelho and Viviana Cocco. Use of chaotic sequences in a biologically inspired algorithm for engineering design and optimization. Expert Systems with Applications, 34(3):1905-1913, 2008. https://doi.org/10.1016/j.eswa.2007.02.002 [59] Bilal Altlas. Chaotic bee colony algorithms for global numerical optimization. Expert systems with applications, 37(8): 5682-5687, 2010. https://doi.org/10.1016Zj.eswa.2010.02.042 [60] Bilal Altlas. Chaotic harmony search algorithms. Applied mathematics and computation, 216( 9): 2687-2699, 2010. https://doi.org/10.1016/j.amc.2010.03.114 [61] Amir Hossein Gandomi, Gun Jin Yun, Xin-She Yang and Siamak Talatahari. Chaos-enhanced accelerated particle swarm algorithm. Communications in Nonlinear Science and Numerical Simulation, 18(2):327-340, 2013. https://doi.org/10.1016/j.cnsns.2012.07.017 [62] Amir Hossein Gandomi, X.-S. Yang, S. Talatahari and A.H. Alavi. Firefly algorithm with chaos. Communications in Nonlinear Science and Numerical Simulation, 18(1): 89-98, 2013. https://doi.org/10.1016/j.cnsns.2012.06.009 [63] Golnar Gharooni-fard, Fahime Moein-darbari, Hossein Deldari and Anahita Morvaridi. Scheduling of scientific workflows using a chaos-genetic algorithm. Procedia Computer Science, 1(1): 14451454, 2010. https://doi.org/10.1016/j.procs.2010.04.160 [64] S. Talatahari, B. Farahmand Azar, R. Sheikholeslami and A.H. Gandomi. Imperialist competitive algorithm combined with chaos for global optimization. Communications in Nonlinear Science and Numerical Simulations, 17(3): 13121319, 2012. https://doi.org/10.1016/j.cnsns.2011.08.021 [65] Wei Gong and Shoubin Wang. Chaos ant colony optimization and application. 4th Inter-national Conference on Internet Computing for Science and Engineering, 301-303, 2009. https://doi.org/10.1109/icicse.2009.38 [66] Ji Mingjun and Tang Huanwen. Application of chaos in simulated annealing. Chaos, Solitons & Fractals, 21(4): 933-941, 2004. https://doi.org/10.1016/j.chaos.2003.12.032 [67] Amir H. Gandomi and Xin-She Yang. Chaotic bat algorithm. Journal of Computational Science, 5(2):224-234, 2014. https://doi.org/10.1016/jjocs.2013.10.002 https ://doi.org/10.31449/inf.v44i2.3166 Informatica 44 (2020) 263-268 263 Research on Resource Allocation and Management of Mobile Edge Computing Network Rui Zhang and Wenyu Shi Anhui Xinhua University, Hefei, Anhui 230087, China E-mail: nw2094@163.com Keywords: edge network, resource allocation, transmitting power, system utility Received: May 14, 2020 The popularity of mobile Internet makes the application of mobile terminals need more computing resources, and cloud computing enables mobile terminals to handle application tasks that need high computing resources under the premise of maintaining small specifications. However, it is difficult to obtain high-quality low latency services as the mobile Internet edge is far away from the cloud computing center; hence mobile edge computing (MEC) is proposed. This study introduced computing resource allocation methods based on power iteration and system utility, applied them to the mobile edge computing network, and carried out simulation experiments in MATLAB software. The experimental results showed that the network throughput and system utility under two resource allocation methods increased and the average transfer rate decreased with the increase of users in the mobile edge network; under the same number of access users, the edge network based on the system utility allocation method had higher throughput, average transfer rate and system utility. Povzetek: Robno računalništvo (edge computing) omogoča boljše delovanje mrež, ker podatke v oblaku prestavi na rob mreže. Prispevek se ukvarja z analizo in izdelavo tovrstnih metod. 1 Introduction The development of mobile Internet technology has further facilitated people's life, and the popularity of mobile terminals such as mobile phones and IPADS has greatly promoted mobile Internet technology [1]. The emergence of cloud computing [2] greatly reduces the requirements of the mobile terminal for computing and storage performance, thus reducing the manufacturing cost. However, even if the mobile Internet has a large coverage, there is only one data center that can provide cloud computing services. When data are transmitted in the nodes of the mobile Internet, there will inevitably be a delay. The further the distance with the data center is, the more serious the delay is. High delay will seriously affect the service reliability of various mobile applications. The emergence of mobile edge computing [3] solves the above problems. Compared with the cloud server in the data center, edge devices are closer to users and have shorter time delay. The combination of cloud server and edge device can provide more reliable services to users on the edge. Liu et al. [4] proposed a deep learning architecture based on the close connection network, transplanted it into the mobile edge algorithm, and found through the simulation that the algorithm had obvious overall efficiency advantage. Zhang et al. [5] proposed the weight based algorithm and mobile prediction based heuristic algorithm for users with certain and uncertain mobile tracks to reduce the network overhead caused by task migration and found through experiments that the two algorithms could effectively reduce the network overhead caused by task migration. In order to solve the joint optimization problem of task caching and offloading in edge cloud computing, Hao et al. [6] proposed an efficient task caching and offloading algorithm based on the alternative iterative algorithm and found through the simulation that the algorithm had lower energy consumption. This study introduced a computing resource allocation method based on power iteration and a computer resource allocation method based on system utility, applied them to the mobile edge computing network, and carried out a simulation experiment on the two computing resource allocation methods in MATLAB software. 2 Mobile edge computing (MEC) In recent years, the configuration performance of mobile terminals has been greatly improved with the progress of science and technology, mainly reflected in small volume and large amount of computing. However, in order to maintain its mobile convenience, the size of mobile terminal itself must be portable. Unless there is a breakthrough in the existing materials and technologies, the amount of computing must be limited and lower than that of large-scale computing equipment. In the face of today's huge mobile network applications, mobile terminals with limited computing and energy are gradually difficult to provide good services. The development of cloud computing technology has greatly liberated the computing burden of mobile terminals, but the increase of terminals which are accessed to mobile network for cloud computing has increased the network burden and delay. Cloud computing services usually give priority to meet the service request of mobile terminals near the data center; 264 Informática 44 (2020) 263-268 R. Zhang et al. therefore terminals on the edge of the network will have network delay. In order to solve the problem that the service quality of edge network cloud computing reduces due to the excessive access of mobile terminals, mobile edge computing is proposed. The basic structure of MEC system includes the bottom structure, functional in the edge network can be expressed as N ={1,2,3,..., n} , all communication channel resources can be expressed as M = {1,2,3,..., m} the tasks to be performed by the i-th MT can be expressed as X = (D, c, max) where D, q max i and i are the size of calculation data, the required calculation power and maximum time delay respectively. If the transmission power is used to control resource allocation, the resource allocation problem of the edge computing network can be expressed as: objective function is: 2 x n i a m +1 i,m + Pi,mSi,m nn ^^ ai,mBm g 2 -2----Ri f max Figure 1: The basic architecture of MEC system. components and application layer. The bottom structure mainly includes the virtual layer structure and hardware facilities which are used for generating virtual computing resources [7]; the functional components in MEC system play the role of interaction interface between internal and external data, assisting the system to access the mobile network without obstacles; the application layer is the application interface of the system, which is used for interacting information with users. After connecting MEC system to cloud computing service, its basic architecture is shown in Figure 1. The access location of MEC system is between the wireless access network and mobile core network. The wireless access network contains MEC servers, and they constitute the edge cloud and connect with various kinds of mobile terminals through the base station. The mobile core network is the center of the whole mobile Internet, which is used for realizing the large area transmission of communication data. The cloud service center is an important part of providing cloud computing. The principle of MEC system improving the service quality of cloud computing can be generally described as lowering the task that needs to be executed in the cloud computing center to the edge server. In the traditional cloud computing data interaction, the request is first sent to the base station of wireless access network, and then the base station transmits the request to the core network for data protection. The request of any user follows the above process even though the request is the same. Once the number of access users increases, not only channel resources will be wasted, but also network congestion will cause delay [8]. After adding MEC, cloud computing resources are distributed to MEC server. When users repeat the same request, they can directly obtain resources from MEC server, which greatly improves the speed of service response. 3 Resource management based on power iteration For the convenience of explanation, it is assumed that the mobile edge network has one base station (BS) and several mobile terminals (MT). MEC server is set in BS, which can directly carry out data interaction. Then all MTs v 1=1 m=1 condition is: M £Bm log: m=1 M I* m=1 I * meM £ SNR = meM (1) ^m + Ii,m + P,,mSi. ^m +1i,m > r m = 1 'i,mr i,m e [0, pri (2) ^m +1i, m a,. where ',m indicates whether the i-th MT passes the m -th channel migration tasks or not, 1 for yes, and 0 for B a2 no; m indicates the bandwidth of the m -th channel, m stands for the Gaussian white noise on channel m [9], I',m indicates the interference of other MT in channel m p. to the i-th MT, i,m indicates the transmitting power of the i-th MT in channel m, gh _____ r m indicates the channel gain j^min of the i-th MT in channel m , i indicates the minimum transmission rate of the i-th MT in channel m to ensure the transmission quality, SNR indicates the A signal to noise ratio [10], and D indicates the threshold of the signal to noise ratio. It is seen from equation (1) and (2) that the p. corresponding allocation scheme is optimal when i,m is optimal, then the solution step of optimal pi,m is: ® Related iteration update parameters including number of iterations t and Lagrange multiplier ^ and ^ are initialized. @ Let t =t +1, indicating one time of iteration, and then the Lagrange multiplier is updated according to the following formula: Research on Resource Allocation and Management of. Informatica 44 (2020) 263-268 265 Km (t + 1) = maX Km (t) + «1 Z ar,mPr,m - P"" I0 Um (t +1) = max (3) Um (t) + ß Z ! a, ,m pi ,m Si ,m &Í +I m i,m @ Transmission power pm is calculated according to the Lagrange multiplier [11] obtained from the previous update, and the formula is: Pi,m = B„ Ii,m + [rn,.m -K(Ii,m +amm)]ln2 Su (4) ® p,,m which is calculated after updating and before updating is compared. If the difference between them is smaller than the set threshold value, iteration updating stops, otherwise step @ and ® repeat. After several p. times of iterations, optimal ,m is obtained. 4 Resource management based on system utility First of all, the model structure of MEC system is, same as above, set as multi-user MT-single BS. The resource allocation of the edge computing network is designed based on the utility function of the system. Then the allocation can be expressed as follows: objective function is: ( N max Zps (ri T - Tr Ti -+ri E, - El E' ; (5) condition is: (6) energy consumption of processing tasks locally and processing tasks by migrating to MEC server respectively, the former depends on the power and processing time of equipment and the latter depends on the energy during data transmission, pi indicates the transmitting power of equipment i, which cannot exceed its maximum transmitting power, S indicates a set of equipment participating in task migration. Steps to solve objective function (5) are as follows. ® First, the optimal transmission power of the equipment in each mobile edge network was calculated using the dichotomy method [13]. @ Then whether task migration is required for each equipment is determined. If necessary, a task migration request is issued; if not, a NULL message is issued. ® After receiving the message from the device, the equipment is classified according to the following formula: s, = f Pi (rT + rE ) -A(i) < oí ) < o} Sr = i Pi (rT + rE ) - (A(i)+A(i|N - s¡ - {i})) > o} Ss = N - S, - Sr A(i) = T, F log 2 (1 + aiPi ) /ma A(i|N - S, - {i}) = 2J T F Jr F' jeN-S,-{¡}V J J Vi ri = PirTDi BmT p,rEP, BmTÍi ri = Pi r¡ (7) Sv S. s, = {0,1} Pi 6 (0, Pmax] fi > 0 Z / < f ^^ J i J max Í6S Z s, < M where pt indicates the priority of the i-th MT receiving MEC service, [0,1] , st indicates whether terminal equipment I selects task immigration or not, 1 for migrating to MEC server and 0 for processing tasks locally, yj and yf indicate the preference of equipment I for improving efficiency and reducing energy consumption respectively [12], i.e., the intention of users to efficiently solve problems and save energy, [0,1], j/ and T[ indicate the time delay of processing tasks locally and processing tasks by migrating to MEC server respectively, the former depends on the CPU performance of equipment and the latter depends on computation resource fi allocated by MEC server and the time delay of information transmission, E1 and Er indicate the S S S where , r and s are the equipment set of locally processed tasks, the equipment set of task migration and the equipment set to be allocated respectively, A(i), A(i N - S , - {,}) are the intermediate quantity for calculating the system marginal utility value in the original migration equipment set after adding equipment i, , and T' are intermediate variables for , , . A(/), A(/lN - S, - {i}) F' . u calculating l , i is the working frequency of mobile equipment CPU, ai is the ratio of channel gain to channel noise power during the transmission of equipment i, and ^ is the working efficiency of transmission power amplifier. n © Whether the number of equipment in r exceeds the total number of channels (K ) in the edge network is determined. If it exceeds, then a equipment with the smallest system utility is selected from S S, and moved to S, . The cycle stops until the number of equipment in s does not exceed K . Then S r is output, and task migration and allocation of computation resources were performed s according to r . 0 K / ) i=1 266 Informatica 44 (2020) 263-268 R. Zhang et al. n © If the number of equipment in r is smaller than the total number of channels in the edge network, then the equipment with the largest system utility is selected from S S S s and added to r . r does not shrink after adding the new equipment, i.e., the system utility of the set does not decrease. The step repeats until the number of equipment in r reaches the largest number of channels in the 5,. 5,. network or r does not extend. After the cycle stops, is output, and the task migration and allocation of computation resources were performed according to r. 5 Simulation experiment 5.1 Experimental environment In this study, the two edge computing network resource allocation methods mentioned above were simulated and analyzed by MATLAB software [14]. The experiment was carried out in a laboratory service. The server configuration included Windows7 system, i7 processor and 16 G memory. 5.2 Experiment setup In this study, a mobile edge computing network area was established using MATLAB, and the basic parameters are shown in Table 1. In the simulated mobile edge network, the effective coverage of the network was 500 m. In the network, there was a base station and a MEC server. The base station and MEC server were connected. The total channel bandwidth provided by the base station of the edge network was 30 MHz, and its subchannel bandwidth was 1.5 MHz. The maximum number of subchannel that could be provided was 20. The maximum transmission power of MT held by users in the edge network was set as 25 dbm. The computing performance was 1.3 GHz. The user's preference for improving task processing efficiency and equipment energy saving was randomly distributed between 0.3 and 0.8. For the convenience of simulation calculation, the task data size of the MT which was needed to be processed by user was set as 450 kB, and the computing power required to process the task was 1200 Mcycles. The MEC server used for remote processing of tasks had a computing performance (computing resources) of 25 GHz. The simulation network was was set as described above. Then grouping experiments were carried out according to the number of users in the edge network. There were seven groups in total, 5 users in the 1st group, 10 users in the 2nd group, 15 users in the 3rd group, 20 users in the 4th group, 25 users in the 5th group, 30 users in the 6th group and 35 users in the 7th group. The above two resource management methods are used in each group of simulation experiments. The indicators of resource management method are network throughput, average transmission rate and system utility. Network throughput [15] refers to the number of successfully transmitted data Parameter Radius of edge network area Total channel bandwidth Subchannel bandwidth Numerical value 500 m 30 MHz 1.5 MHz Parameter Channel gain Channel interferenc e noise Maximum transmitting power of MT Numerical value 128.1 + 37.5]g(r) -175 dBm/Hz 25 dBm Parameter Task size Resources required to process tasks CPU performanc e of MT Numerical value 450 kB 1200 MCycles 1.3 GHz Parameter User preferences T E ( * ) MEC server performan ce Numerical value 0.3~0.8 25 GHz Table 1 : Basic parameters of simulated mobile edge network. in unit time, while System utility is the effective utilization of network computing resources. 5.3 Experimental results Changes of throughput in the edge computing network with the increase of the number of access network users under the two resource management methods are shown in Figure 2. It was seen intuitively from Figure 2 that the network throughput under the two resource management methods was on the rise with the increase of the number of access users in the edge network, and the rise amplitude reduced when the number of users was larger than 20. Generally speaking, the throughput of the edge network based on power iteration was lower than that of the system utility based network under the same number of access Figure 2: Changes of network throughput with the number of users under the two resource management methods. Research on Resource Allocation and Management of. Informatica 44 (2020) 263-268 267 Figure 3: Changes of average transmission rate with number of users under two resource management methods. - Based oil power iteration ■ Based oil system utility Figure 4: The system utility of the edge network changes with the number of users under two resource management methods. users, and the network throughput performance of the system utility based network was better. Changes of the average transmission rate of the edge network under the two resource allocation methods with the increase of the number of access users are as shown in Figure 3. In the edge network using the power iterative based allocation method, the average transmission rate was 0.45 MB/s when the number of users was 5, 0.4 MB/s when the number of users was 10, 0.39 mb/s when the number of users was 15, 0.32 MB/s when the number of users was 20, 0.30 MB/s when the number of users was 25, 0.27 MB/s when the number of users was 30, and 0.25 MB/s when the number of users was 35. In the edge network using the system utility based allocation method, the average transmission rate was 0.73 mb/s for 5 users, 0.65 mb/s for 10 users, 0.56 mb/s for 15 users, 0.53 mb/s for 20 users, 0.51 mb/s for 25 users, 0.46 mb/s for 30 users, and 0.41 mb/s for 35 users. It was seen from Figure 3 that the average transmission rate of the edge network under the two resource allocation methods decreased with the increase of the access users in the edge network. The reason for the decrease was that the increase of users occupied more subchannels, and moreover the interference between the transmitted signals strengthened. In addition, under the same number of access users, the average transmission rate of the edge network under the system utility based allocation method was higher. Changes of the system utility of the edge network under the power iteration and system utility with the increase of the number of access users are shown in Figure 4. As shown in Figure 4, when the number of access users was 5, the network system utility of the former was 1.9, and the system utility of the latter was 2.0; when the number of access users was 10, the system utility of the former was 2.8, and the system utility of the latter was 3.0; when the number of access users was 15, the system utility of the former was 3.7, and the system utility of the latter was 4.0; when the number of access users was 20, the system utility of the former was 4.5, and the system utility of the latter was 4.9; when the number of access users was 25, the system utility of the former was 5.0, and the system utility of the latter was 5.5; when the number of access users was 30, the system utility of the former was 5.6, and the system utility of the latter was 6.0; when the number of access users was 35, the system utility of the former was 5.0, and the system utility of the latter was 6.3. It was seen from Figure 4 that the system utility of the edge network under the two resource allocation methods increased with the increase of the number of access users in the edge network, and the increase amplitude decreased gradually; under the same number of access users, the edge network under the system utility based allocation method had higher system utility. 6 Conclusion This study introduced computing resource allocation methods based on power iteration and system utility, applied them to the mobile edge computing network, and carried out the simulation experiment on the two methods in MATLAB software. The experimental results are as follows: (1) with the increase of users in the edge network, the network throughput under the two computing resource allocation methods showed an increasing tendency, and the edge network under the system utility based allocation method had higher throughput; (2) the average transmission rate of the edge network decreased with the increase of the number of access users, and the edge network under the system utility based allocation method had a higher average transmission rate; (3) with the increase of the number of users in the edge network, the system utility of the edge network under the two methods of computing resource allocation was on the rise, and the edge network under the system utility based allocation method had higher system utility. 7 References [1] Al-Shuwaili A, Simeone O (2016). Energy-Efficient Resource Allocation for Mobile Edge Computing-Based Augmented Reality Applications. IEEE Wireless Communication Letters, PP(99). https://doi.org/10.1109/LWC.2017.2696539 [2] Ahmed E, Rehmani MH (2016). Mobile Edge Computing: Opportunities, solutions, and challenges. Future Generation Computer Systems, 70. https://doi.org/10.1016/j.future.2016.09.015 [3] Paymard P, Mokari N (2019). Resource allocation in PD-NOMA-based mobile edge computing system: Multiuser and multitask priority. Transactions on Emerging Telecommunications Technologies, (1), pp. e3631. https://doi.org/10.1002/ett.3631 268 Informatica 44 (2020) 263-268 R. Zhang et al. [4] Liu ZK, Yang XQ, Shen JX (2019). Optimization of multitask parallel mobile edge computing strategy based on deep learning architecture. Design Automation for Embedded Systems, (4). https://doi.org/10.1007/s10617-019-09222-5 [5] Zhang F, Liu G, Zhao B, Fu X, Yahyapour R (2018). Reducing the network overhead of user mobility-induced virtual machine migration in mobile edge computing. Software Practice and Experience, (3). https://doi.org/10.1002/spe.2642 [6] Hao Y, Chen M, Hu L, Hossain MS, Ghoneim A (2018). Energy Efficient Task Caching and Offloading for Mobile Edge Computing. IEEE Access, 6(99), pp. 11365-11373. https://doi.org/10.1109/ACCESS.2018.2805798 [7] Pham QV, Le LB, Chung SH (2019). Mobile Edge Computing with Wireless Backhaul: Joint Task Offloading and Resource Allocation. IEEE Access, PP(99), pp. 1-1. https://doi.org/10.1109/access.2018.2883692 [8] Ma LL, Yi SH, Carter N, Li Q (2018). Efficient Live Migration of Edge Services Leveraging Container Layered Storage. IEEE Transactions on Mobile Computing, PP(99), pp. 1-1. https://doi.org/10.1109/TMC.2018.2871842 [9] Farris I, Taleb T, Flinck H (2018). Providing ultrashort latency to user-centric 5G applications at the mobile network edge. Transactions on Emerging Telecommunications Technologies, 29. https://doi.org/10.1002/ett.3169 [10] Shahzadi S, Iqbal M, Dagiuklas T, Qayyum ZU (2017). Multi-Access Edge Computing: Open issues, Challenges and Future Perspective. Journal of Cloud Computing Advances Systems & Applications, 6(1), pp. 30. https://doi.org/10.1186/s13677-017-0097-9 [11] An N, Yoon S, Ha T, Kim Y, Lim H (2018). Seamless Virtualized Controller Migration for Drone Applications. IEEE Internet Computing, PP(99), pp. 1-1. https://doi.org/10.1109/MIC.2018.2884670 [12] Zeng DZ, Gu L, Pan SL, Cai JJ, Guo S (2019). Resource Management at the Network Edge: A Deep Reinforcement Learning Approach. IEEE Network, 33(3), pp. 26-33. https://doi.org/10.1109/MNET.2019.1800386 [13] Wang Z, Zhao ZW, Min GY (2018). User mobility aware task assignment for Mobile Edge Computing. Future Generation Computer Systems, 85. https://doi.org/10.1016/j.future.2018.02.014 [14] Fang WW, Ding S, Li YY (2019). OKRA: optimal task and resource allocation for energy minimization in mobile edge computing systems. Wireless Networks, 25(5). https://doi.org/10.1007/s11276-019-02000-y [15] Yang X, Chen ZY, Li KK (2018). Communication-Constrained Mobile Edge Computing Systems for Wireless Virtual Reality: Scheduling and Tradeoff. IEEEAccess, 6, pp. 16665-16677. https://doi.org/10.1109/ACCESS.2018.2817288 https://doi.org/1031449/inf.v44i2.3195 Informatica 44 (2020) 269-273 269 Research on the Detection of Network Intrusion Prevention with SVM Based Optimization Algorithm Debing Wang Anhui Vocational & Technical College of Industry & Trade, Huainan, Anhui 232007, China E-mail: bdaie2@163.com Guangyu Xu Anhui University of Science & Technology, Huainan, Anhui 232001, China Keywords: support vector machine, intrusion prevention, intrusion detection, whale optimization algorithm Received: June 6, 2020 Support vector machine (SVM) has a good application in intrusion detection, but its performance needs to be further improved. This study mainly analyzed the SVM optimization algorithm. The principle of SVM was introduced firstly, then SVM was improved using the improved whale optimization algorithm (WOA), the improved WOA (IWOA)-SVMbased intrusion detection method was analyzed, andfinally experiments were carried out on KDD CUP99 to verify the effectiveness of the algorithm. The results showed that the IWAO-SVM algorithm was more accurate in attack detection; compared with SVM, PSO-SVM and ant colony optimization (ACO)-SVM algorithms, the performance of the IWAO-SVM algorithm was better, the detection rate was 99.89%, the precision ratio was 99.92%, the accuracy rate was 99.86%, and the detection time was 192 s, showing that it had high precision in intrusion detection. The experimental results verify the reliability of the IWAO-SVM algorithm, and it can be promoted and applied in the detection of network intrusion prevention. Povzetek: Algoritem SVM je bil prilagojen za iskanje napadov v omrežjih. 1 Introduction With the development of technology and the further popularization of computer, the use of network has become more extensive [1], which not only changes the way people study and work, but also creates great values for economic development. However, the network security problem is becoming more and more prominent [2], means of intrusion attack is becoming more complex and diverse [3], which means greater and stronger harms, and the difficulty of intrusion prevention is becoming higher. In order to deal with all kinds of network intrusion, more and more methods have been applied in intrusion detection. Li et al. [4] studied relevance vector machine (RVM), determined the parameters of RVM using the cloud particle swarm optimization algorithm (CPSO), and verified its high accuracy through experiments. Sangeetha et al. [5] designed a method based on application layer signature. If the signature did not match the rule base, the system would generate an alarm. The method could effectively reduce the false alarm rate and improve the accuracy. Kannan et al. [6] designed an enhanced C4.5 for intrusion detection in hybrid virtual cloud environment and verified the effectiveness of the method through the data set and feeding. Geng et al. [7] designed an intrusion detection algorithm based on rough set and Bayes and combining with weighted average and found through experiments that the resource consumption of the method was low and it was easy to realize and had higher efficiency. This study optimized support vector machine (SVM), applied it to the detection of network intrusion, carried out an experiment on the data set, and compared the performance of different SVM optimization algorithms to verify the effectiveness of the designed optimization algorithm, which provides some theoretical bases for its further application in the actual network and offers more ideas for the design of intrusion detection methods. 2 Network intrusion prevention detection Network intrusion refers to the behavior of trying to access or destroy a system without authorization to make it unavailable [8]. Detection of network intrusion is to analyze the key information collected from the inside and outside of the computer, such as security log, etc. [9], find out the characteristics that may generate attacks [10], and give responses such as alarm and network outage [11], and its flow is shown in Figure 1. Firstly, multiple monitoring points are set in the network to collect data such as system log, firewall log, software information and intrusion information as much as possible and comprehensively to ensure the detection effect. Then, the collected data are normalized to reduce the detection error, and the processed data are analyzed Figure 1: The detection process. 270 Informática 44 (2020) 269-273 D.B. Wang et al. using detection methods to obtain the detection results. Finally, the system makes response to defend according to the detection results. 3 Detection method combined with SVM optimization algorithm 3.1 SVM algorithm SVM is a machine learning algorithm [12], which has advantages of strong generalization ability, learning ability and applicability. Its classification idea is that two separate categories are on both sides of the hyperplane and have as large an interval as possible (Figure 2). min + C Z 2 ,=i s.t.y(wx,. + b)> 1 >0 where C refers to the penalty factor, and then the Lagrange method is introduced to transform it into a dual problem: (2) max Z a i=1 n 1 n í \ Z a -1Z afljyyjk ta >x) 2 iJ=1 ,(3) where i is a Lagrange multiplier and k(X¿ , xj ) is a zay = o,o < a, < c a kernel function. The constraint is i=1 . The final classification function can be written as: f(x) = sgn| ^ajik(x,Xj)+ b\ V -1 J .(4) The kernel function used in this study is RBF kernel function, and the formula is as follows: f " l|2\ k (x, Xj ) = exp X - xj|l 2 ,(5) where r is a nuclear parameter. If there is a dataset, (xvyi),(x2,^2V's(xi,yJi = Ur^nx e R ,y e {+ L-1} , and the hyperplane of its classification can be written as: y Wi + b , where w stands for weight and b stands for the threshold value. To find the optimal classification plane, the constraints can be written as: |W||2 mm11-11-2 s.t.yi (wxi + b)> 1.(1) In order to improve the modeling speed, slack variable ^ is introduced, then: 3.2 SVM optimization algorithm In SVM, penalty factor C and nuclear parameter r has a great impact on the final allocation performance. In order to be able to get optimal values of C and r, the whale optimization algorithm (WOA) [13] was used to obtain the optimal value of parameters in this study, and SVM was optimized. WOA is an optimization algorithm based on the simulation of whale hunting behavior. It is easy to operate and implement, but it also has the problem of slow convergence speed. Therefore, inertia weight a was introduced to obtain an improved WOA (IWOA). Suppose that the population size of whales is N the position of the i -th whale in the d -th space is X. = (x', x2,---, xd ) i = 1,2, — , N j .-i ... f.u i \ ^ i ' ' i /, 1 1,2 , N , and the position of the prey of whale is the optimal solution of problem. In the process of surrounding prey, the formula of the position updating of whale can be written as: X (t + 1) = aX, (t )-A\CXt (t )-X (t ), (6) where t stands for the times of iterations, a is an ^ ^max (^max ^min, r t t 1 inertia weight, C are coefficient vectors, A = 2ar A and C = 2r a = 2 - 2t and r1 and r1 are random quantity in The hunting strategy of whales is called bubble-net [14], which means generating bubbles through the spiral path to surround the prey. This process can be expressed as follows: X(t +1) = D ebz cos (2^z) + cX, (t) ^ where D = |X; (t)- X(t) , b stands for the constant defining the spiral shape, and z is a random number in [" 1,1]. In addition to bubble-net, whales also conduct random search, which can be expressed as: X(t +1) = cXan*(t)-A\CXrald(t)-X(t) , (8) t a t 2 n Research on the Detection of Network Intrusion Prevention Informatica 44 (2020) 269-273 271 Table 1 : Experimental data set. Start Sample data set Data processing Establish SVM model SVM model training SVM intrusion detection System response End IWOA parameter settmg Initialize population Calculate fitness value and obtain individual and DODulation outiraal Position updating Satisfying termination condition Obtain optimal C and r Category Training set Testing set Normal 3500 1500 Probe 2100 900 DOS 4900 2100 U2R 700 300 R2L 560 240 Detection result Attack data Normal data Actual condition Attack data A B Normal data C D Table 2: Confusion matrix. Normal Probe DOS U2R R2L Normal 1497 1 2 0 0 Probe 0 899 1 0 0 DOS 0 1 2198 1 0 U2R 0 0 1 299 0 R2L 0 0 0 0 240 Figure 3: The flow of the intrusion detection algorithm. Table 3: Confusion matrix results of the IWAO-SVM algorithm. where Xrand refers to a randomly selected whale position. 3.3 IWOA-SVM intrusion detection algorithm After optimization with IWOA, the flow of the IWOA-SVM algorithm is shown in Figure 3. The specific steps of the algorithm are as follows. For the collected sample data set, after preprocessing, the parameters of IWOA are set, and parameter C and r which need optimization in SVM are taken as whale individuals. The population is initialized, and then the fitness value of the individual is calculated to obtain the optimal value of the individual and population. Then, the location is updated by IWOA to obtain new solutions until the termination conditions are met, and optimal value C and r are obtained and regarded as the parameters of SVM. The SVM model is established. After training, the model is tested using the testing samples. Finally, the system responds according to the test results. 4 Experimental results 4.1 Experimental environment and data set The experiment was carried out on Linux operating system, with Intel Core i7 CPU@2.40GHz, 8 GB memory, and Python language. The size of the IWOA population was 20, t was 50, a ■ was 0.3, and a was 0.9. ' max ' min ' ^ max The experimental data set was KDD CUP99, including probe, DOS, U2R and R2L in addition to Normal. As KDD CUP99 is too large, only a part of data was randomly selected in this study. There were 3500 normal data, 8260 attack data in the training set; there were 1500 normal data and 3540 attack data in the testing set, as shown in Table 1. 4.2 Evaluation index The detection algorithm was evaluated using the confusion matrix, as shown in Table 2. In Table 2, A represents that attack data is correctly judged as attack data; B represents that normal data is misjudged as attack data, C represents that attack data is misjudged as normal data, and D represents that normal data is correctly judged as normal data. (1) Detection rate = A/(A+B) (2) Precision ratio = A/(A+C) (3) Accuracy = (A+D) / (A+B+C+D) 4.3 Experimental results In order to verify the detection effect of the IWOA-SVM algorithm, it was compared with SVM, particle swarm optimization-SVM (PSO-SVM) [15] and ant colony optimization-SVM (ACO-SVM) algorithms [16]. The confusion matrix result of the IWAO-SVM algorithm is shown in Table 3, and the result comparison between different algorithms is shown in Table 4. The four numbers separated by slashes in Table 4 represent the results of SVM, PSO-SVM, ACO-SVM and IWOA-SVM algorithms respectively. According to the 272 Informática 44 (2020) 269-273 D.B. Wang et al. Table 4: Comparison results of different algorithms. Detection result Attack data Normal data Actual condition Attack data 3540 3320/3478 /3486/353 6 220/62/5 4/4 Normal data 1500 134/38/12/ 3 1366/146 2/1488/1 497 SVM PSO-SVM ACO-SVM IWAO-SVM Detectio n time/s 189 197 198 192 Figure 4: Comparison of the performance between different algorithms. Table 5: Comparison of testing time. data in Table 4, the detection rate of the algorithms was calculated, and the results are shown in Figure 4. According to Figure 4, first of all, the detection rate of the PSO, ACO and IWAO optimized SVM algorithms was 6.21%, 8.71% and 14.88% higher than that of SVM, respectively. It was seen that the detection rate of the IWAO-SVM algorithm significantly improved; the precision ratio of the four algorithms were all over 90%, of which the IWAO-SVM algorithm was the highest, 99.92%; from the perspective of accuracy rate, the optimization by PSO and ACO improved the accuracy rate of the SVM algorithm, but not as significant as IWAO; the accuracy of the IWAO-SVM algorithm was 15.51% higher than that of the SVM algorithm. The detection time of different algorithms was compared, and the results are shown in Table 5. It was seen from Table 5 that the time complexity of the SVM optimization algorithms increased compared with the SVM algorithm, the detection time of the PSO-SVM algorithm increased by 4.23% compared with the SVM algorithm, the detection time of the ACO-SVM algorithm increased by 4.76%, and the detection time of the IWAO-SVM algorithm only increased by 1.59%, 2.54% lower than the PSO-SVM algorithm and 3.03% lower than the ACO-SVM algorithm, which showed that the optimization algorithm designed in this study not only had obvious advantages in the detection rate, but also had a good performance in the detection time, i.e., it could provide more excellent service for network intrusion detection. 5 Discussion It is very important for network protection and control to detect intrusion attacks effectively [17]. In the network intrusion detection, clustering algorithm [18], Apriori algorithm, decision tree [19], Q-learning, neural network [20] and hidden Markov [21] have a wide range of applications. This study mainly analyzed SVM. As a common classification and prediction algorithm, SVM has a good application in many fields, such as face recognition [22], risk assessment [23], electricity price prediction [24] and image classification [25]. In order to improve the effectiveness of SVM in intrusion detection, it was optimized by the WAO algorithm in this study, and then it was verified by KDD CUP99 data set. It was seen from Table 3 that the IWAO-SVM algorithm had excellent accuracy in the classification of intrusion attacks, and only seven data were wrongly classified. Then, it was seen from Table 4 and Figure 4 that the IWAO-SVM algorithm had a better detection performance, with the detection rate reaching 99.89%, 14.88%, 8.18% and 5.68% higher than the other three algorithms respectively; the precision ratio improved by 7.10 %, 6.43% and 3.93% respectively; the accuracy increased by 13.41%, 10.64% and 6.86% respectively, which verified the effectiveness of IWAO in SVM optimization and the good precision of the IWAO-SVM algorithm in the intrusion detection. Finally, the comparison of the detection time showed that the method proposed in this study had a good advantage in time compared to the other optimization algorithms, only 1.59% longer than the SVM algorithm. Although some achievements have been made in the research of network intrusion prevention and detection, there are still some shortcomings that need to be solved in the future work: (1) the detection effect of the SVM algorithm should be compared when choosing different kernel functions; (2) the performance of more optimization algorithms in SVM should be compared; (3) the performance of the IWAO-SVM algorithm in practical application should be studied. 6 Conclusion Aiming at the detection of network intrusion prevention, this study analyzed the optimization of SVM, designed an improved WAO algorithm, and compared it with other optimization algorithms on the data set. The results suggested that: (1) the IWAO-SVM algorithm could detect intrusion attacks accurately; (2) the detection rate of the IWAO-SVM algorithm was 99.89%, the precision ratio was 99.92%, and the accuracy rate was 99.86%, which were all higher than the other excellent algorithms; (3) the detection time of the IWAO algorithm was 192s , only 1.59% longer than the SVM algorithm. Research on the Detection of Network Intrusion Prevention ... 7 References [1] Elekar KS (2015). Combination of data mining techniques for intrusion detection system. International Conference on Computer. IEEE. [2] Shah AA, Khiyal MSH, Awan MD (2015). Analysis of Machine Learning Techniques for Intrusion Detection System: A Review. International Journal of Computer Applications, 119(3), pp. 19-29. [3] Keegan N, Ji S Y, Chaudhary A, Concolato C, Yu B, Jeong DH (2016). A survey of cloud-based network intrusion detection analysis. Human-centric Computing and Information Sciences, 6(1), pp. 19. [4] Li GD, Hu JP, Xia KW (2015). Intrusion detection using relevance vector machine based on cloud particle swarm optimization. Control & Decision, 30(4), pp. 698-702. [5] Sangeetha S, Devi BG, Ramya R, Dharani MK, Sathya P (2015). Signature Based Semantic Intrusion Detection System on Cloud. Advances in Intelligent Systems and Computing, 339, pp. 657-666. [6] Kannan A, Venkatesan KG, Stagkopoulou A, Li S (2015). A Novel Cloud Intrusion Detection System Using Feature Selection and Classification. International Journal of Intelligent Information Technologies, 11(4), pp. 1-15. [7] Geng X, Li Q, Ye D, Wu Z, Jiang Y (2017). Intrusion detection algorithm based on rough weightily averaged one-dependence estimators. Journal of Nanjing University of Science & Technology, 41(4), pp. 420-427. [8] Milliken M, Bi Y, Galway L, Hawe GI (2015). Ensemble learning utilising feature pairings for intrusion detection. World Congress on Internet Security. IEEE. [9] Ghosh P, Mandal AK, Kumar R (2015). An Efficient Cloud Network Intrusion Detection System. Advances in Intelligent Systems & Computing, 339, pp. 91-99. [10] Jinny SV, Kumari JJ (2015). Encrusted CRF in Intrusion Detection System. Advances in Intelligent Systems & Computing, 325, pp. 605-613. [11] Tedesco G, Aickelin U (2016). Adaptive Alert Throttling for Intrusion Detection Systems. Social Science Electronic Publishing, 730, pp. 194-201. [12] Abdiansah A, Wardoyo R (2015). Time complexity analysis of support vector machines (SVM) in LibSVM. International Journal of Computer Applications, 128(3), pp. 975-8887. [13] Aljarah I, Faris H, Mirjalili S (2016). Optimizing connection weights in neural networks using the whale optimization algorithm. Soft Computing, 22(1), pp. 1-15. [14] Friedlaender A, Weinrich M, Bocconcelli A, et al (2011). Underwater components of humpback whale bubble-net feeding behaviour. Behaviour, 148(5), pp. 575-602. [15] Wang L, Dong C, Hu J, Li G (2015). Network Intrusion Detection Using Support Vector Machine Based on Particle Swarm Optimization. Plant Biotechnology Reports, 4(3), pp. 237-242. Informatica 44 (2020) 269-273 273 [16] Zan P, Ai YT, Zhao J, Shao Y (2014). A Prediction Model of Rectum's Perceptive Function Reconstruction Based on SVM Optimized by ACO. 461, pp. 121-128. [17] Deng S, Zhou A, Yue D, Hu B, Zhu L (2017). Distributed intrusion detection based on hybrid gene expression programming and cloud computing in cyber physical power system. IET Control Theory and Applications, 11(11), pp. 1822-1829. [18] Chahal JK, Kaur A (2016). A Hybrid Approach based on Classification and Clustering for Intrusion Detection System. International Journal of Mathematical Sciences & Computing, 2(4), pp. 3440. [19] Modinat M, Abimbola A, Abdullateef B, Opeyemi A (2015). Gain Ratio and Decision Tree Classifier for Intrusion Detection. International Journal of Computer Applications, 126(1), pp. 975-8887. [20] Gautam SK, Om H (2016). Computational Neural Network Regression Model for Host based Intrusion Detection System. Perspectives in Science, 8(C), pp. 93-95. [21] Sharma SK, Manoria M (2015). Intrusion Detection using Hidden Markov Model. International Journal of Computer Applications, 115(4), pp. 35-38. [22] Prakash N, Singh Y (2015). Fuzzy Support Vector Machines for Face Recognition: A Review. Maropoulos P G, 131(3), pp. 24-26. [23] Bui DT, Tuan TA, Klempe H, Pradhan B, Revhaug I (2016). Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides, 13(2), pp. 361-378. [24] Shrivastava NA, Khosravi A, Panigrahi BK (2015). Prediction Interval Estimation of Electricity Prices Using PSO-Tuned Support Vector Machines. Industrial Informatics, IEEE Transactions on, 11(2), pp. 322-331. [25] Tan K, Zhang J, Du Q, Wang X (2015). GPU Parallel Implementation of Support Vector Machines for Hyperspectral Image Classification. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing, 8(10), pp. 1-10. 274 Informática 44 (2020) 269-273 D.B. Wang et al. https ://doi.org/10.31449/inf.v44i2.3166 Informatica 44 (2020) 275-268 263 A Web Server to Store the Modeled Behavior Data and Zone Information of the Multidisciplinary Product Model in the CAD Systems Yatish Bathla Doctoral School of Applied Informatics and Applied Mathematics, Obuda University, Budapest E-mail: yatish.bathla@phd.uni-obuda.hu Sândor Szénâsi John von Neumann Faculty of Informatics, Obuda University, Budapest E-mail: szenasi.sandor@nik.uni-obuda.hu Student paper Keywords: web server, multidisciplinary product modeling, rflp structure, cad systems, information retrieval, information storage Received: January 23, 2019 This work focuses on Human Computer Interaction (HCI) for multidisciplinary product modeling. Requirement Functional Logical Physical (RFLP) structure has emerged as one of the prominent approaches for modeling the multidisciplinary products. To simplify the HCI of an RFLP structured product model, Information Content (IC) provides effective communication and interaction. It controls the RFLP level by the Multilevel Abstraction based Self-Adaptive Definition (MAAD) structure. However, it needs an application to represent the modeled behavior data and zone information of a multidisciplinary product model. Further, the IC application requires an interface to interact with the Computer-Aided Design (CAD) based multidisciplinary product model application and exchange the information between the database of the servers. As per the knowledge of authors, no work has been done yet on the HCI of the IC. Therefore, this paper proposes a Content Web server, which is used to store the modeled behavior data and zone information of the multidisciplinary product model and represented by the IC web application. Then, the Content database is created to store the Layer Info-Chunk (LiC) entities' information of the multidisciplinary product model. Finally, communication between the Content Server and the CAD server is done to represent the IC application interface in the multidisciplinary product application. The Apache Tomcat server, PostgreSQL database, and RESTful web service are used to explain the operations. Povzetek: Pristop HCI temelji na izkoriščanju prednosti človeških možganov in računalniške umetne inteligence. Na ta način so avtorji prispevka izboljšali multidisciplinarno modeliranje oblektov. 1 Introduction A good Human Computer Interaction (HCI) interface in the Computer-Aided Design (CAD) systems deals smartly with the relationship between industrial designers and computer software and hardware, studies the design of man-machine interface model efficiently, the smart design of the virtual interface, multi-user, and multi-sensory interface, and provide a good technical foundation for industrial design [1]. CAD systems simplify the engineering tasks in collecting, using, creating and sharing information, but interface designed without consideration of usability often results in unsatisfied experiences and limited outcomes [2]. Classical Product Models (CPM) in the CAD systems [3] allow product development firms to meet their goals more efficiently. It improves product development time, product quality, productivity and reduces manufacturing as well as product costs. Also, the Requirement Functional Logi- cal Physical (RFLP) structure [4] is applied from the system engineering and offers to handle the multidisciplinary product model as a system. Product assembly is done in the specification tree (white square) of the RFLP structure as shown in Fig. 1. Here, Dassault Syst6m's CATIA 3DEX-PERIENCE [5] is using the RFLP structure for multidisciplinary product modeling. The authors have considered this CAD software for explaining the proposed concepts. There is plenty of research done for improving the HCI of the CAD systems. Some of the appreciated work are as follow: - A webized interactive CAD review system [12] that uses Super Multi-View (SMV) autostereoscopic displays renders the content through a web browser and handles user interactions via JavaScript. But it is an expensive technology and limited to the CPM. - A VR (Virtual Reality)-CAD server [13] that embeds 276 Informatica 44 (2020) 275-283 Bathla et al. a commercial CAD engine for loading and modifying native CAD data in a CVE (Collaborative Virtual Environments). It is a distributed architecture that allows collaborative modifications on native CAD data from heterogeneous VR platforms. It is based on the management the CAD product data and need improvement in terms of visualizing and representation of 3D product data. The complexity increases in the case of a multidisciplinary product model as it requires the coordination of huge amounts of model information of the multiple disciplines. Indeed, Information Content (IC) [9] handles the multi-disciplinary product model indirectly to record and apply the content of modeled information efficiently [10], which further drives the RFLP level by the Multilevel Abstraction based self Adaptive Definition (MAAD) structure [4]. However, there is a need for Information Content based Web application to represent and store the behavior modeled data [11] from the Process plane [14] and Community Zone information [15] of a multidisciplinary product model. As a solution, this research work proposes a Content Web Server that consists: - IC based web application to represent the modeled behavior data and Community zone information of a multidisciplinary product model. - Content database to store the entities of the modeled behavior data of a multidisciplinary product model. Figure 1: Multidisciplinary Product Model using the RFLP Structure For the smart interaction of a multidisciplinary product model through the Information Content (IC), an interface is introduced in the multidisciplinary product application through the IC web application. The Apache HTTP Server [16] hosts the IC based web application with the Post-greSQL database [17] to store the multidisciplinary product model data and RESTful web service [18] to exchange the information between the Content and CAD system web server. For the IC web application, the objects collect the Functional and Logical layer information from the Info-Chunk [28] entities of the RFLP structure. These objects communicate with the objects of the MAAD structure and collect the modeled behavior data of a multidisciplinary product model. The retrieval of data is according to the process plane of the IC. The objects are based on Object-Oriented Programming (OOP) [8] concepts. The concepts are used frequently in software engineering [29]. The zone and extracted modeled behavior data of a multidisciplinary product are displayed by the IC web application. This paper starts with the preliminary research where RFLP structured product model, IC, MAAD Structure and Info-Chunk entities are discussed. Then human interaction with the IC application and Multidisciplinary application are outlined with the introduction of the Content server. Then, the Content server is explained where the operations and Content database are emphasized. Here, the PostgreSQL database is used for the explanation. Then, Operations of the Content Web server are emphasized. Finally, communication between the Content and CAD system web server is elaborated. Here, the RESTful API web service is used for the explanation. 2 Background The product modeling is the prominent field. There are plenty of companies like Dassault Systèmes [19], Autodesk [20], Robert McNeel [21], Pixologic [22] investing a lot of money in this market. The feature driven CPM (Classical Product Model) [7] is most commonly used for discipline specific product modeling. CPM is limited to the physical level. Handling a complex product model is a challenging task due to the involvement of a large number of engineering objects and their relationship. But, product modeling is not limited to the physical layer. The separated or only slightly integrated mechanical engineering modeling increasingly demanded multidisciplinary integration [30]. Modeling of a multidisciplinary product must have a means for the integration of discipline specific models into a model with a unified structure. Higher abstraction is realized by using of RFLP structure based product model [4]. It is commonly used for multidisciplinary modeling as it models the product as a system. It is compliant with the IEEE 1220 standard. This structure has four layers i.e. Requirement layer for the requirements against the product, Function layer for the functions to fulfill requirements, Logical layer for the product wide logical connections, and Physical layer for the representations of physically existing objects. It accommodates product behavior definitions on its Functional and Logical levels. In the RFLP structure of the Dassault Systém's CATIA 3DEXPERIENCE software, Dymola [6] is used to analyze the dynamic logical behavior of a product and Modelica [7] is used for logical and physical modeling of the technical system. Modelica is a A Web Server to Store the Modeled Behavior Data and. Informatica 44 (2020) 275-283 277 multi-domain modeling language for component-oriented modeling of complex systems and based on the OOP concepts. [30]. Information Content (IC) [9] assists effective communication in the multidisciplinary product modeling. It drives the RFLP level by the MAAD structure. The MAAD modeling [31] methods and model structures are introduced as generalized means for the support of higher level abstraction based generation of RFLP elements. The MAAD modeling was based on the knowledge representation, contextual change propagation, and extended feature definition capabilities for advanced modeling systems. In the IC, the intent is defined by the human to control the definition of engineering objects of a product model[32]. In the Engineering objectives layer, the Process plane [14] is used to store the processes performed on a multidisciplinary product model. Also, Community zones [15] are used by the IC to organize the complex product model entities and their relationship as shown in Fig. 2. Here, product model space is divided into community zones based on the discipline, specification or configuration. In this figure, the multidisciplinary product model is divided into community Internal or External based on configuration. The information of the process plane and community zones are shown on the representation plane of the IC. disciplinary product modeling by using the IC application. 3 User interaction and multidisciplinary product application In this research work, the Multidisciplinary product model is handled and controlled through the IC application. IC application is a web based application in the JSP (JavaServer Pages) format [23]. It resides on a web server called Content Server. It is explained in the next section. A multidisciplinary product model application using the RFLP structure is an application in the 3DXML format [24]. The 3DEXPERIENCE [25] CAD software requires ENOVIA [26] Product Lifecycle Management (PLM) system in the backend that allows the data to be stored in one central location, therefore, access from anywhere. ENOVIA V6 uses Microsoft SQL Server 2008 R2 Enterprise platform for database management [39]. Therefore, in this research work, CAD Product server refers to the Microsoft SQL Server [27]. The IC, MAAD Structure and LiC entities in the RFLP structure communicate through the Info-Chunk objects, which is based on the Object-Oriented Programming Principles (OOP) [8]. The advantage of the IC application is a simpler user interface and efficient organization of objects retrieved from the product model. There are two scenarios to be considered: Figure 2: Community Zones in the RFLP Structure Layer Info-Chunk (LiC) [28][38] entities were introduced in the Functional and Logical layer of the RFLP structure for effective communication with the IC. It controls the behavior data activities of the RFLP structure. The Logical layer Info-Chunk (LiCL) entity stores the information of the Logical layer and the Component Info-Chunk (CiC) entity stores the information of the Logical component. Further, the Functional layer Info-Chunk (LiCF) entity stores the main function information of the Functional layer and the Sub-Function Info-Chunk (SFiC) entity stores the sub-function information of a function. Considering the above mentioned concepts as a base, the authors propose the effective Human Computer Interaction (HCI) for the multi- Figure 3: User Interaction from the Information Content application - The user interacts with the IC application to access the Multidisciplinary product application as shown in Fig. 3. IC drives the RFLP structure through the MAAD structure. Every application has its own web server for the resource management and database to locate the information. The database of the CAD 278 Informática 44 (2020) 275-283 Bathla et al. product server is isolated from the database of the content server. The user interacts with the Multidisciplinary product application to access the IC application through a separate plane as shown in Fig. 4. There is an interface between the two applications. The database of the CAD product server retrieves the process and zone partition information using the web services from the database of the content server. change and SCN file format for the 3D product model (Assembly model or part model) management. Figure 4: User Interaction from the Multidisciplinary product application 4 Content web server Content Web server is the Apache Http Server [33] that used to store and display the data of the Information Content (IC) as shown in Fig. 5. The Tomcat Servlet [34] is used for the Information Content web application. Enterprise Management Agent (EMA) is the integral software component responsible for managing and maintaining the IC based Web application. It also allows monitoring the CAD Product database, through management plug-ins and connectors. The Process partition consists of the outcome of the process plane of the IC, the product model after a certain set of the process applied and the files that explain the location of outcomes of the process plane and product model. Similarly, the Zone partition consists of the outcome of the community zone of the IC, the product model after divided into the zones and the files that explain the location of outcomes of the community zone and product model. The outcomes are the graphs obtained from the Process plane and Community zones. The authors store the graphs in the PNG format [35], product model application in the Dassault Systém's 3DEXPERIENCE file format (3DXML) [36], XML file format [37] for the data inter- Figure 5: Content Server The Content database is created by using the PostgreSQL. It stores the data of the Information Content application while handling the behavior modeled data and zone information of the multidisciplinary product application. Entity Relationship (ER) [40] diagram is used for the physical data modeling as shown in Fig. 6. It is required for the schema level for creating a database. There are nine tables created based on the concept of LiC entities of the RFLP structure. During the product modeling using the RFLP structure, there is a set of information transferred from the Requirement layer to the Physical layer. Behaviors of a product model are represented in the Functional and Logical layer of the RFLP structure. LiCF table is used to store the attributes of the Functional layer and the LiCL table is used to store the attributes of the Logical layer of the LiC (Layer Info-Chunk) entity of the RFLP structure. In these tables, some of the data types are built-in while others are user-defined. In the case of the LiCL table, - LiCLConnector is the Enumerated data type that stores the inner and connector values of the LiCL entity. - LiCF, CiC, and LiCLDataModel are the composite data type, whose attributes and data types are specified in the Content ER diagram. - In the CiC table, CiCConnector is the Enumerated data type that stores the inner and stream values of the CiC entity A Web Server to Store the Modeled Behavior Data and. Informatica 44 (2020) 275-283 279 Figure 6: Content Entity Relationship Diagram. - In the CiC table, SFiC and CiCDataModel is the composite data type, whose attributes and data types are specified in the Content ER diagram. - In the LiCLDataModel table, LiCLSituation and Li-CLProcess are the user-defined composite data types. The CiC table is used to store the attributes of the CiC (Component Info-Chunk) entity present in a LiCL entity. LiCLDataModel table is used to store the attributes of the detailed description of the Physical layer of the RFLP structure. LiCLProcess table is used to store the attributes of the Process plane of the IC. LiCLSituation table is used to store the attributes of a situation in the logical layer of the RFLP structure. Here, LiCLGeometry composite data type is used to store the information of a part model or assembly model in a situation. In the case of the LiCF table, - LiCFLink is the Enumerated data type that stores the inner and connector values of the LiCF entity. - SFiC and ReqInfoChunk are the composite data types, whose attributes and data types are specified in the Content ER diagram. - In the SFiC table, SubFunctionLink is the Enumerated data type that stores the inner and stream values of the SFiC entity - In the SFiC table, Element is the composite data type, whose attributes and data types are specified in the Content ER diagram. - In the ReqInfochunk table, attributes and data types are specified in the Content ER diagram For reference, LiCLConnector, LiCLDataModel, and LiCLProcess commands are demonstrated using the SQL statements of PostgreSQL as shown below. Here, new tables and data type is created using the CREATE statement. CREATE TYPE LiCLConnector AS ENUM ( ' inner ' , ' Stream ' ); CREATE TYPE LiCLDataModel AS ( LiCLD_ID INT, PO_Contextual VARCHAR(255) , PO_Connected VARCHAR(255) , PO_Output VARCHAR( 100), PO_Input VARCHAR( 100), P r o c e s s Li C LPro c es s , S i t u a t i o n L i C L S i t u a t i o n ) ; CREATE TYPE Li C LP ro c e s s AS ( LiCLP_ID INT , P r o c e s s _ A n a l y s i s BOOLEAN, 280 Informática 44 (2020) 275-283 Bathla et al. Figure 7: Communication between Content server and CAD server. Process_Effect BOOLEAN, Process_Optimize BOOLEAN, Value_Analysis TEXT[] , Value_Effect TEXT[] , Value_Optimize TEXT [ ]); The modeled behavior data of a multidisciplinary product is stored in the entities based on the entities' relationship. The entities are populated by the IC application. - In the context of the Functional layer, One LiCF entity may have many SFiC entities and one or many LiCF entities may have one ReqInfoChunk entity. Further, one SFiC entity may have one or many Element entities. - In the context of the Logical layer, One LiCL entity may have one LiCF entity and many CiC entities. Also, one or many LiCL entities may have one LiCLDataModel entity. Further, one and only one Li-CLDataModel entity may have one or many LiCLPro-cess and LiCLSituation entities. Here, one or many LiCLSituation entities may have one LiCLGeometry entity. Also, One CiC entity may have one CiCData-Model. 4.1 Operations A human expert handles the multidisciplinary application through the IC application. To model the behavior data, the process plane from the Engineering objectives layer of the IC interacts with the Info-Chunk objects of the Product Behaviors level of the MAAD structure, which further, drives the Info-Chunk objects of the Functional and Logical layer of the RFLP structure. Here, the Process plane of the IC communicates with LiC entities of the RFLP structure using the Info-Chunk objects to retrieve the modeled behavior data of a multidisciplinary product plane. The data is stored in the Process partition. Also, the Product model is divided into community zone based on the discipline. The outcome is stored in the Zone partition. Then, a human interacts with the results stored in the partition through the representation plane of the Interactive IC application. The outcome could be static or dynamic and represented as graphs, images or animation. 4.2 Communication between content server and CAD product server The CAD Server pulls process partition and zone partition from the Content server when replaying through the IC interface in the Multidisciplinary web application as shown in Fig. 7. Content server partitions information is saved in CAD server cache and auto deleted almost immediately after a replay. EM-EMA Link handles publishing of configuration between framework and Content server. RESTful Web API Link handles passing of modeled behavior data and zone partition details from Postgres job queue to Local Contact DB which then gets moved on to Central Contact DB by ETL SQL process of the CAD server. The advantage of this API there is no need to install additional software or libraries and provide a great deal of flexibility. Content Server handles retrieval of .XML, .PNG, .3DXML and .SCN content from the Content server to IC Webtop A Web Server to Store the Modeled Behavior Data and. Informatica 44 (2020) 275-283 281 application [41] interface of the Multidisciplinary application for replay. The process and zone partition details are taken from Central Contact DB and converted to .3DXML format for the multidisciplinary application and then it is deleted. 5 Conclusion This research work proposes the Content server to store zone and modeled behavior information of a multidisciplinary product model. This work starts with the Human Computer Interaction (HCI) of a multidisciplinary product model where the model is handled directly by the Information Content (IC) web application or through an interface in the multidisciplinary product application. The operation and process of IC web application are stored in the Content server. Then, the server is explained in brief, where data is stored in the Zone partition and Process partition based on the communication between the IC and RFLP structure. It is done by the Info-Chunk objects and stored in the Content database. Finally, communication between the Content Server and CAD product server is explained where information of zone partition and process partition pushed temporarily to the CAD product server so that IC webtop application in the main application could handle the multi-disciplinary product model. As Modelica and Info-Chunk objects are based on the OOPS concepts, the RFLP structure and IC could be compatible with each other and exchange information easily. This research work is an effort to provide efficient user interaction of a multidisciplinary product model through the Information Content. Acknowledgement This study is sponsored by the Doctoral School of Applied Informatics and Applied Mathematics, Obuda University, Budapest, Hungary and Tempus foundation. The authors gratefully acknowledge to his supervisor Dr. Horvâth Lâs-zlô for the guidance to write this paper. References [1] Z. Liang, Z. Jian, Z. Li-Nan and L. Nan. The Application of Human-Computer Interaction Idea in Computer Aided Industrial Design. International Conference on Computer Network, Electronic and Automation (ICCNEA), 160-164, 2017.https://doi.org/10.1109/iccnea.2017.71 [2] Li, Yujiang, Mikael Hedlind, and Torsten Kjell-berg. Usability evaluation of CAD CAM: State of the art. Procedia CIRP, 36, 205-210, 2015. https://doi.org/10.1016/j.procir.2015.01.053 [3] Bloom, R. Getting started with CAD/CAM. Materials & Design, 17(4): 223-224, 1996.https://doi.org/10.1016/S0261-3069(97)88930-3. [4] L. Horvâth and I. J. Rudas. Systems engineering methods for multidisciplinary product definition. IEEE 12th International Symposium on Intelligent Systems and Informatics (SISY), 293-298, 2014. https://doi.org/10.1109/sisy.2014.6923604 [5] Barfield, W. The law of virtual reality and increasingly smart virtual avatars. Research Handbook on the Law of Virtual and Augmented Reality, 2-4, 2018. https://doi.org/10.4337/9781786438591.00008 [6] Dempsey, M. Dymola for Multi-Engineering Modelling and Simulation. IEEE Vehicle Power and Propulsion Conference, 1-6, 2006. https://doi.org/10.1109/vppc.2006.364294 [7] Peter Fritzson. Principles of Object-Oriented Modeling and Simulation with Modelica 3.3: A Cyber-Physical Approach, Wiley-IEEE Press, John Wiley & Sons Inc, 2015. https://doi.org/10.1002/9781118989166 [8] Baesens, B., Backiel, A., & Broucke, S.vanden (Eds.). Beginning Java Programming, Wrox, 2012. https://doi.org/10.1002/9781119209416 [9] L. Horvâth and I. J. Rudas. Towards the Information Content-driven Product Model. Proceedings of the IEEE International Conference on System of Systems Engineering, 1-6, 2008. https://doi.org/10.1109/sysose.2008.4724183 [10] Laszlo Horvath, Imre J. Rudas. Bringing up product model to thinking of engineer. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 1355 - 1360, 2008. https://doi.org/10.1109/icsmc.2008.4811474 [11] L. Horvâth and I. J. Rudas. Elevated level design intent and behavior driven feature definition for product modeling. 39th Annual Conference of the IEEE Industrial Electronics Society, 1(2), 4374-4379, 2013. https://doi.org/10.1109/iecon.2013.6699839 [12] Seo, Daeil, Yongjae Lee and Byounghyun Yoo. Webizing Interactive CAD Review System Using Super Multiview Autostereoscopic Displays. HCII Posters, Part II, CCIS 714, 62-67, 2017. https://doi.org/10.1007/978-3-319-58753-0-10 [13] Okuya, Yujiro, Nicolas Ladeveze, Olivier Gladin, Cédric Fleury and Patrick Bourdot. Distributed Architecture for Remote Collaborative Modification of Parametric CAD Data. IEEE Fourth VR International Workshop on Collaborative Virtual Environments (3DCVE), 1-4, 2018. https://doi.org/10.1109/3dcve.2018.8637112 282 Informatica 44 (2020) 275-283 Bathla et al. [14] Yatish Bathla. Different types of process involved in the information content product model. Proceedings of the IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY), 99-104, 2016. https://doi.org/10.1109/sisy.2016.7601478 [15] Yatish Bathla. Structured organization of Engineering Objects in the information content of PLM system. Proceedings of the IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI), 473 - 478, 2016. https://doi.org/10.1109/saci.2016.7507424 [16] Fielding, R.T., Kaiser, G. The Apache HTTP Server Project, IEEE Internet Computing, 1(4): 88 - 90, 1997. https://doi.org/10.1109/4236.612229. [17] Neil Matthew, Richard Stones. Introduction to Post-greSQL. Apress, 2005, https://doi.org/10.1007/978-1-4302-0018-5. [18] Li Li and Wu Chou. Design Patterns for RESTful Communication Web Services. IEEE International Conference on Web Services, 2010. https://doi.org/10.1109/icws.2010.101 [19] Smolek, P., Heinzl, B., Ecker, H., & Breitenecker, F.. Exploring the Possibilities of Co-Simulation with CATIA V6 Dynamic Behavior Modeling, SNE Simulation Notes Europe, 23(3-4), 2013. https://doi.org/10.11128/sne.23.sn.10205 [20] Sha Liu. Sustainable Building Design Optimization Using Building Information Modeling. ICCREM, 2015. https://doi.org/10.1061/9780784479377.038 [21] Chen, H., Lowe, A.A., de Almeida, F.R., Wong, M., Fleetham, J.A., Wang, B.. Three-dimensional computer-assisted study model analysis of long-term oral-appliance wear. Part 1: Methodology. American Journal of Orthodontists & Dentofac. Orthop, 134(3): 393-407, 2008. https://doi.org/10.1016Zj.ajodo.2006.10.030 [22] Tim Vernon. Zbrush. Journal of Visual Communication in Medicine, 34(1): 31-35, 2011. https://doi.org/10.3109/17453054.2011.548735 [23] Lennart Jorelid. J2EE FrontEnd Technologies: A Programmer's Guide to Servlets, JavaServer Pages, and Enterprise JavaBeans. Springer, 2002. https://doi.org/10.1007/978-1-4302-1148-8 [24] Jing Chen, Jiawei Li, Mo Li. Progressive Visualization of Complex 3D Models Over the Internet. Transactions in GIS, 20(6): 887-902, 2016. https://doi.org/10.1111/tgis.12185 [25] Adam Suydam, Jason Pyles. Lockheed Martin Conceptual Design Modeling in the Dassault Systemes 3DEXPERIENCE Platform, AIAA Scitech Forum, 2020. https://doi.org/10.2514/6.2020-1391 [26] Ling-Long Lin, Yun-Tao Song, Yu-Xiang Tang, Qing-Qing Du, Yi-Peng Gong. Implementation and application study on 3D collaborative design for CFETR based on ENOVIA VPM. Fusion Engineering and Design, 100, 198-203, 2015. https://doi.org/10.1016Zj.fusengdes.2015.05.072 [27] Buffington, J.. Microsoft SQL Server. Data Protection for Virtual Data Centers. 2011, https://doi.org/10.1002/9781118255766.ch8 [28] Yatish Bathla. Conceptual Models of Information Content for Product Modeling. Acta Poly-technica Hungarica, XV (2): 169-188, 2018. https://doi.org/10.12700/aph.15.1.2018.2.9 [29] H. F. Krikorian. Introduction to object-oriented systems engineering.1. Journal of IT Professional. V(2), 38-42, 2003. https://doi.org/10.1109/MITP.2003.1191791 [30] L. Horvâth. New method for enhanced driving of entity generation in RFLP structured product model. 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), 541-546, 2017. https://doi.org/10.1109/iciea.2017.8282903 [31] L. Horvâth and I. J. Rudas. Multilevel Abstraction Based Self Control Method for Industrial PLM Model. IEEE International Conference on Industrial Technology, 695-700, 2014. https://doi.org/10.1109/icit.2014.6894915 [32] Laszlo Horvath. New methods on the way to intelligent modeling in computer integrated engineering. Proceedings of the 36th Annual Conference on IEEE Industrial Electronics Society(IECON), 1359-1364, 2010. https://doi.org/10.1109/iecon.2010.5675486 [33] Fielding, R.T., Kaiser, G.. The Apache HTTP Server Project. IEEE Internet Computing, 1(4): 88-90, 1997. https://doi.org/10.1109/4236.612229. [34] Aleksa Vukotic, James Goodwill. Apache Tomcat 7, Springer Nature Switzerland, Apress 2011. https://doi.org/10.1007/978-1-4302-3724-2 [35] Randers-Pehrson, G., Boutell PNG (Portable Network Graphics) Specification, Version 1.0. PNG Dev. Gr., 1999. https://doi.org/10.17487/rfc2083 [36] Roberto Riascos, Laurent Levy, Josip Stjepandic, Arnulf Fröhlich. Digital Mock-up. Concurrent Engineering in the 21st Century Foundations, Developments and Challenges, 355-388, 2015. https://doi.org/10.1007/978-3-319-13776-6_13 [37] Clarke, K.S.. Extensible markup language (XML), in: Understanding Information Retrieval Systems. Management, Types, and Standards, 2011. https://doi.org/10.1201/b11499-52 A Web Server to Store the Modeled Behavior Data and. Informatica 44 (2020) 275-283 283 [38] Yatish Bathla. Info-Chunk driven RFLP Structure based Product Model for Multidisciplinary Cyber Physical Systems. IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), pp. 000327-000332, 2018. https://doi.org/10.1109/sisy.2018.8524653 [39] He Youquan, Zhang Wei, Xie Jianfang, Wang Jian, Qiu Hanxing. Integrated Application of PLM Based ENOVIA Platform in Domestic Manufacturing Industry. International Conference on Information Management, Innovation Management and Industrial Engineering, nov, 2011. https://doi.org/10.1109/iciii.2011.202 [40] Richard Earp, Sikha Bagui. Database Design Using Entity-Relationship Diagrams. Foundations of Database Design, june, 2003. https://doi.org/10.1201/9780203486054 [41] Heller, D., Krenzelok, L., Orr, J. Webtop: Realities in designing a web-application platform. IEEE Potentials, Proceedings of the 2003 Conference on Designing for User Experiences. DUX '03, 2003, https://doi.org/10.1145/997078.997115 284 Informatica 44 (2020) 275-283 Bathla et al. https ://doi.org/10.31449/inf.v44i2.3166 Informatica 44 (2020) 285-268 263 Artificial Intelligence Methods for Modelling Tremor Mechanisms Vida Groznik University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, Koper, Slovenia University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia E-mail: vida.groznik@famnit.upr.si Thesis summary Keywords: tremor, attribute visualisation, digital spirography Received: May 28, 2020 The paper summarises a Doctoral Thesis in which we focus on two main goals: (1) building models for differentiation between three most common tremors: Parkinsonian, essential and mixed type tremor and (2) development of a novel method for attribute visualisation on series. Povzetek: Članek povzema vsebino doktorske disertacije, v kateri se osredotocamo na dva glavna cilja: (1) gradnja modelov za razločevanje med tremi najpogostejšimi tremorji: parkinsonskim, esencialnim in mešanim tipom tremorja ter (2) razvoj nove metode za vizualizacijo atributov na vrstah. 1 Introduction Tremor is an involuntary movement of the body and is one of the most common movement disorders. It is primarily associated with various diseases of the nervous system, including Parkinson's disease. Since there are more than 20 different types of tremors, differentiation between them is important from the treatment point of view. Spirography is a diagnostic method where the subject's task is to draw a left-twisted spiral while the doctor observes the process of drawing (speed, hesitation, etc.) and the final drawing. With the development of tablets digi-talised spirography emerged, making it possible to store the course of spiral drawing and the analysis of the acquired time series. In order to increase confidence in such a system, we would need to provide an explanation of the results to doctors. One option is to visualize the anomalies and results onto the drawn spirals. These visualizations must make sense to physicians and, above all, they must be consistent with their medical knowledge of the domain. This paper summarises a Doctoral Thesis [1] which tries to address the need for automatic differentiation of tremors and visualisation of the decisions such system would give. 2 Diagnostic models for tremor differentiation In the thesis, we focus on differentiation between three of the most common tremors: Parkinsonian, essential and mixed type of tremor. For the purpose of building the diagnostic models, we used the digitalised spirography for collecting the data needed. The first diagnostic model distinguishes between the three tremors, based on clinical examination data, family history and digital spirography. The process of building a model was carried out using argument-based machine learning technique which enabled us to build a decision model through the process of knowledge elicitation from the domain expert (a neurologist). The obtained model consists of thirteen rules that are medically sensible. The process of knowledge elicitation itself contributed to the higher classification accuracy of the final model in comparison with the initial one [2, 5]. In the first diagnostic model, attributes derived from the spirography were included in more than half of the rules. This motivated us to build a model based solely on the digital spirography data. For the needs of constructing an understandable model, we first built several attributes which represented domain medical knowledge. We have built more than 500 different attributes which were used in a logistic regression to construct the final diagnostic model. The model is able to distinguish subjects with tremors from those without tremors with 90% classification accuracy. The final diagnostic model is built into the freely available ParkinsonCheck mobile application [6]. 3 Method for attribute visualisation During the process of attribute construction, we wanted to know what our attributes were detecting. Thus, we have developed a method for attribute visualisation on series. The method not only helped us with attribute construction, but it is also useful for visual interpretation of the diagnostic model's decisions. The visualisation method and consequently the decision model were evaluated with the help of three independent neurology experts. The results show that 286 Informatica 44 (2020) 285-286 V. Groznik both the diagnostic model and the visualisation are meaningful and cover medical knowledge of the domain. Different visualisation approaches and their benefits have been published in several peer reviewed publications [3,4, 7]. 4 Conclusion The Thesis [1] describes the development of different diagnostic models for digitalised spirography systems. The emphasis is given to elicitation of expert's knowledge and including that knowledge into the built-in attributes. To increase physicians' confidence in such systems, a novel method for attribute visualisation has been proposed. The results were published in several peer-reviewed publications [2, 3, 4, 5, 6, 7]. [6] Sadikov, A., Groznik, V., Zabkar, J., Mozina, M., Georgiev, D., Pirtosek, Z., and Bratko, I. (2014) ParkinsonCheck smart phone app.. In T. Schaub, G. Friedrich, B. O'Sullivan, editors, ECAI2014: proceedings, Frontiers in artificial intelligence and applications (Print), vol. 263, IOS Press, pp. 1213-1214. https://doi.org/10.3233/978-1-61499-419-0-1213. [7] Sadikov, A., Groznik, V., Mozina, M., Zabkar, J., Ny-holm, D., Memedi, M., Bratko, I., and Georgiev, D. (2017) Feasibility of spirography features for objective assessment of motor function in Parkinson's disease. Artificial intelligence in medicine, vol. 66, pp. 54-62. https://doi.org/10.1016Zj.artmed.2017.03.011. References [1] Groznik, V. (2018) Metode umetne inteligence za modeliranje mehanizmov tremorjev: doktorska disertacija. Ljubljana. 139 pages, ilustr. http://eprints.fri.uni-lj.si/4134/. [2] Groznik, V., Guid, M., Sadikov, A., Možina, M., Georgiev, D., Kragelj, V., Ribaric, S., Pirtošek, Z., and Bratko, I. (2013) Elicitation of neurological knowledge with argument-based machine learning. Artificial Intelligence in Medicine, 57:133-144. https://doi.org/10.1016Zj.artmed.2012.08.003. [3] Groznik, V., Sadikov, A., Možina, M., Žabkar, J., Georgiev, D., and Bratko, I. (2014) Attribute visualisation for computer-aided diagnosis : a case study. In ICHI2014: proceedings, 2014 IEEE International Conference on Healthcare Informatics, IEEE Computer Society, Conference Publishing Services, pp. 294-299. https://doi.org/10.1109/ICHI.2014.47. [4] Groznik, V., Možina, M., Žabkar, J., Georgiev, D., Bratko, I., and Sadikov, A. (2015) Development, debugging, and assessment of ParkinsonCheck attributes through visualisation. In A. Briassouli, J. Benois-Pineau, A. Hauptmann. Health monitoring and personalized feedback using multimedia data, Springer, pp. 47-71. https://doi.org/10.1007/978-3-319-17963-6_4. [5] Guid, M., Možina, M., Groznik, V., Georgiev, D., Sadikov, A., Pirtošek, Z., and Bratko, I. (2012) ABML knowledge renement loop: a case study. In L. Chen, et al., editors, Foundations of intelligent systems, vol. 7661 of Lecture notes in computer science, Lecture notes in artificial intelligence, Springer, pp. 41-50. https://doi.org/10.1007/978-3-642-34624-8_5. Informatica 44 (2020) 287-288 287 Call for papers for Special Issue "Artificial Intelligence and Ambient Intelligence" A special issue of Electronics (ISSN 2079-9292) belongs to the section "Artificial Intelligence". Deadline for manuscript submissions: December 31, 2020. Special issue information Dear Colleagues, Ambient intelligence (AmI) and artificial intelligence (AI) both rely on AI methods applied to computing devices. Furthermore, their power stems from the same advancement of electronics, sensors, and software development. Yet, AmI is not just an AI application serving humans, and by its definition it is even more aligned to interactions with humans. Be it smart home or autonomous car, it is essential for the AmI system to be familiar with the user's desired performance as well as the current state of the mind. As such, the human-system relation is of a predominant importance, and it represents one of the most fruitful, but often neglected fields of research for AI. Strategically, the hardware (HW) and software (SW) exponential development was essential not only for AI and AmI, but for the overall human civilization as well. Information society laws such as Moore's Law or Keck's Law describe the progress of electronic computing and data transfer and storage devices. While several limitations of specific HW characteristics (e.g., the speed of the processor chip) have already been met, it is not clear which are the next promising technology fields to enable further HW progress. Is it that we are facing a slow but steady decline following the fast exponential growth? Are there major possibilities of improvements by connecting SW, AI, and AmI methods directly to the chips? Should we integrate the flexibility of SW with the speed of electronic HW and vastly improve the cognitive and computing powers? Will AmI benefit more through this progress, since is it intrinsically devoted to connecting devices and humans? Which one will bring the most benefit to the human society and which one will first achieve superintelligence - general AI or general AmI? Answers to these and related questions represent the core of this Special Issue. In order to cope with the abovementioned obstacles, original studies and either viewpoints, theoretical analyses or modeling methods can be developed and proposed to foster further progress. Specific topics We invite researchers to contribute original research articles as well as review articles to present their proposals, views, and studies in the field of AmI and AI in relation to the overall HW, SW, and human civilization progress. Submissions can focus on the research concept or applied research in topics including, but not limited to, the following: • Applications in COVID-19: patient monitoring, patient rehabilitation, brain-computer interfaces assisted living and caring, fall detection, elderly care, interventions for psychological crises • Applications in smart homes and smart buildings • Mobile/wearable intelligence • Robotics applied to smart environments • Applications of combined pervasive / ubiquitous computing with AI • Use of mobile, wireless, visual, and multi-modal sensor networks in intelligent systems • Intelligent handling of privacy, security and trust Manuscript submission information Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website. Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access monthly journal published by MDPI. Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs) (APC: CHF 1500 from 1 July 2020 onwards). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions. 288 Informatica 44 (2020) 241-262 M. Panda et al. Keywords Ambient intelligence Artificial intelligence Superintelligence Multi-agent systems Ambient assisted living Computational intelligence methods Pervasive Computing Mobile computing Ubiquitous computing Self-adaptation, self-organization, and self-supervised learning Intelligent interfaces (user-friendly man-machine interface) AI-assisted medical diagnosis Mobile sensing applications Autonomous and social robots IoT and cyber-physical systems Special issue editors Prof. Dr. Matjaz Gams Website Guest Editor Jozef Stefan Institute, Ljubljana, Slovenia Interests: artificial intelligence; intelligent systems; ambient intelligence; information society; machine learning Dr. Martin Gjoreski Website Guest Editor Jozef Stefan Institute, Ljubljana, Slovenia Interests: affective computing; machine learning; deep learning; sensor; mobile computing; mobile health For further reading, please follow the link to the Special Issue Website at: https://www.mdpi.com/si/electronics/AIs_electronics Electronics (ISSN 2079-9292; http://www.mdpi.com /journal/electronics) is a journal published by MDPI, Basel, Switzerland. Electronics maintains rigorous peer-review and a rapid publication process. All articles are published with a CC BY 4.0 license. For more information on the CC BY license, please see: http://creativecommons.org. Informatica 44 (2020) 289-289 289 JOŽEF STEFAN INSTITUTE Jožef Stefan (1835-1893) was one of the most prominent physicists of the 19th century. Born to Slovene parents, he obtained his Ph.D. at Vienna University, where he was later Director of the Physics Institute, Vice-President of the Vienna Academy of Sciences and a member of several scientific institutions in Europe. Stefan explored many areas in hydrodynamics, optics, acoustics, electricity, magnetism and the kinetic theory of gases. Among other things, he originated the law that the total radiation from a black body is proportional to the 4th power of its absolute temperature, known as the Stefan-Boltzmann law. The Jožef Stefan Institute (JSI) is the leading independent scientific research institution in Slovenia, covering a broad spectrum of fundamental and applied research in the fields of physics, chemistry and biochemistry, electronics and information science, nuclear science technology, energy research and environmental science. The Jožef Stefan Institute (JSI) is a research organisation for pure and applied research in the natural sciences and technology. Both are closely interconnected in research departments composed of different task teams. Emphasis in basic research is given to the development and education of young scientists, while applied research and development serve for the transfer of advanced knowledge, contributing to the development of the national economy and society in general. At present the Institute, with a total of about 900 staff, has 700 researchers, about 250 of whom are postgraduates, around 500 of whom have doctorates (Ph.D.), and around 200 of whom have permanent professorships or temporary teaching assignments at the Universities. In view of its activities and status, the JSI plays the role of a national institute, complementing the role of the universities and bridging the gap between basic science and applications. Research at the JSI includes the following major fields: physics; chemistry; electronics, informatics and computer sciences; biochemistry; ecology; reactor technology; applied mathematics. Most of the activities are more or less closely connected to information sciences, in particular computer sciences, artificial intelligence, language and speech technologies, computer-aided design, computer architectures, biocybernetics and robotics, computer automation and control, professional electronics, digital communications and networks, and applied mathematics. The Institute is located in Ljubljana, the capital of the independent state of Slovenia (or S9nia). The capital today is considered a crossroad between East, West and Mediter- ranean Europe, offering excellent productive capabilities and solid business opportunities, with strong international connections. Ljubljana is connected to important centers such as Prague, Budapest, Vienna, Zagreb, Milan, Rome, Monaco, Nice, Bern and Munich, all within a radius of 600 km. From the Jožef Stefan Institute, the Technology park "Ljubljana" has been proposed as part of the national strategy for technological development to foster synergies between research and industry, to promote joint ventures between university bodies, research institutes and innovative industry, to act as an incubator for high-tech initiatives and to accelerate the development cycle of innovative products. Part of the Institute was reorganized into several hightech units supported by and connected within the Technology park at the Jožef Stefan Institute, established as the beginning of a regional Technology park "Ljubljana". The project was developed at a particularly historical moment, characterized by the process of state reorganisation, privatisation and private initiative. The national Technology Park is a shareholding company hosting an independent venture-capital institution. The promoters and operational entities of the project are the Republic of Slovenia, Ministry of Higher Education, Science and Technology and the Jožef Stefan Institute. The framework of the operation also includes the University of Ljubljana, the National Institute of Chemistry, the Institute for Electronics and Vacuum Technology and the Institute for Materials and Construction Research among others. In addition, the project is supported by the Ministry of the Economy, the National Chamber of Economy and the City of Ljubljana. Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Tel.:+386 1 4773 900, Fax.:+386 1 251 93 85 WWW: http://www.ijs.si E-mail: matjaz.gams@ijs.si Public relations: Polona Strnad Informática 44 (2020) INFORMATICA AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS INVITATION, COOPERATION Submissions and Refereeing Please register as an author and submit a manuscript at: http://www.informatica.si. At least two referees outside the author's country will examine it, and they are invited to make as many remarks as possible from typing errors to global philosophical disagreements. The chosen editor will send the author the obtained reviews. If the paper is accepted, the editor will also send an email to the managing editor. The executive board will inform the author that the paper has been accepted, and the author will send the paper to the managing editor. The paper will be published within one year of receipt of email with the text in Informatica MS Word format or Informatica LTeX format and figures in .eps format. Style and examples of papers can be obtained from http://www.informatica.si. Opinions, news, calls for conferences, calls for papers, etc. should be sent directly to the managing editor. SUBSCRIPTION Please, complete the order form and send it to Dr. Drago Torkar, Informatica, Institut Jožef Stefan, Jamova 39, 1000 Ljubljana, Slovenia. E-mail: drago.torkar@ijs.si Since 1977, Informatica has been a major Slovenian scientific journal of computing and informatics, including telecommunications, automation and other related areas. In its 16th year (more than twentysix years ago) it became truly international, although it still remains connected to Central Europe. The basic aim of Informatica is to impose intellectual values (science, engineering) in a distributed organisation. Informatica is a journal primarily covering intelligent systems in the European computer science, informatics and cognitive community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international refereeing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the Refereeing Board. Informatica web edition is free of charge and accessible at http://www.informatica.si. Informatica print edition is free of charge for major scientific, educational and governmental institutions. Others should subscribe. Informatica WWW: http://www.informatica.si/ Referees from 2008 on: A. Abraham, S. Abraham, R. Accornero, A. Adhikari, R. Ahmad, G. Alvarez, N. Anciaux, R. Arora, I. Awan, J. Azimi, C. Badica, Z. Balogh, S. Banerjee, G. Barbier, A. Baruzzo, B. Batagelj, T. Beaubouef, N. Beaulieu, M. ter Beek, P. Bellavista, K. Bilal, S. Bishop, J. Bodlaj, M. Bohanec, D. Bolme, Z. Bonikowski, B. Boškovic, M. Botta, P. Brazdil, J. Brest, J. Brichau, A. Brodnik, D. Brown, I. Bruha, M. Bruynooghe, W. Buntine, D.D. Burdescu, J. Buys, X. Cai, Y. Cai, J.C. Cano, T. Cao, J.-V. Capella-Hernändez, N. Carver, M. Cavazza, R. Ceylan, A. Chebotko, I. Chekalov, J. Chen, L.-M. Cheng, G. Chiola, Y.-C. Chiou, I. Chorbev, S.R. Choudhary, S.S.M. Chow, K.R. Chowdhury, V. Christlein, W. Chu, L. Chung, M. Ciglaric, J.-N. Colin, V. Cortellessa, J. Cui, P. Cui, Z. Cui, D. Cutting, A. Cuzzocrea, V. Cvjetkovic, J. Cypryjanski, L. Cehovin, D. Cerepnalkoski, I. Cosic, G. Daniele, G. Danoy, M. Dash, S. Datt, A. Datta, M.-Y. Day, F. Debili, C.J. Debono, J. Dedic, P. Degano, A. Dekdouk, H. Demirel, B. Demoen, S. Dendamrongvit, T. Deng, A. Derezinska, J. Dezert, G. Dias, I. Dimitrovski, S. Dobrišek, Q. Dou, J. Doumen, E. Dovgan, B. Dragovich, D. Drajic, O. Drbohlav, M. Drole, J. Dujmovic, O. Ebers, J. Eder, S. Elaluf-Calderwood, E. Engström, U. riza Erturk, A. Farago, C. Fei, L. Feng, Y.X. Feng, B. Filipic, I. Fister, I. Fister Jr., D. Fišer, A. Flores, V.A. Fomichov, S. Forli, A. Freitas, J. Fridrich, S. Friedman, C. Fu, X. Fu, T. Fujimoto, G. Fung, S. Gabrielli, D. Galindo, A. Gambarara, M. Gams, M. Ganzha, J. Garbajosa, R. Gennari, G. Georgeson, N. Gligoric, S. Goel, G.H. Gonnet, D.S. Goodsell, S. Gordillo, J. Gore, M. Grcar, M. Grgurovic, D. Grosse, Z.-H. Guan, D. Gubiani, M. Guid, C. Guo, B. Gupta, M. Gusev, M. Hahsler, Z. Haiping, A. Hameed, C. Hamzagebi, Q.-L. Han, H. Hanping, T. Härder, J.N. Hatzopoulos, S. Hazelhurst, K. Hempstalk, J.M.G. Hidalgo, J. Hodgson, M. Holbl, M.P. Hong, G. Howells, M. Hu, J. Hyvärinen, D. Ienco, B. Ionescu, R. Irfan, N. Jaisankar, D. Jakobovic, K. Jassem, I. Jawhar, Y. Jia, T. Jin, I. Jureta, D. Juricic, S. K, S. Kalajdziski, Y. Kalantidis, B. Kaluža, D. Kanellopoulos, R. Kapoor, D. Karapetyan, A. Kassler, D.S. Katz, A. Kaveh, S.U. Khan, M. Khattak, V. Khomenko, E.S. Khorasani, I. Kitanovski, D. Kocev, J. Kocijan, J. Kollär, A. Kontostathis, P. Korošec, A. Koschmider, D. Košir, J. Kovac, A. Krajnc, M. Krevs, J. Krogstie, P. Krsek, M. Kubat, M. Kukar, A. Kulis, A.P.S. Kumar, H. Kwašnicka, W.K. Lai, C.-S. Laih, K.-Y. Lam, N. Landwehr, J. Lanir, A. Lavrov, M. Layouni, G. Leban, A. Lee, Y.-C. Lee, U. Legat, A. Leonardis, G. Li, G.-Z. Li, J. Li, X. Li, X. Li, Y. Li, Y. Li, S. Lian, L. Liao, C. Lim, J.-C. Lin, H. Liu, J. Liu, P. Liu, X. Liu, X. Liu, F. Logist, S. Loskovska, H. Lu, Z. Lu, X. Luo, M. Luštrek, I.V. Lyustig, S.A. Madani, M. Mahoney, S.U.R. Malik, Y. Marinakis, D. Marincic, J. Marques-Silva, A. Martin, D. Marwede, M. Matijaševic, T. Matsui, L. McMillan, A. McPherson, A. McPherson, Z. Meng, M.C. Mihaescu, V. Milea, N. Min-Allah, E. Minisci, V. Mišic, A.-H. Mogos, P. Mohapatra, D.D. Monica, A. Montanari, A. Moroni, J. Mosegaard, M. Moškon, L. de M. Mourelle, H. Moustafa, M. Možina, M. Mrak, Y. Mu, J. Mula, D. Nagamalai, M. Di Natale, A. Navarra, P. Navrat, N. Nedjah, R. Nejabati, W. Ng, Z. Ni, E.S. Nielsen, O. Nouali, F. Novak, B. Novikov, P. Nurmi, D. Obrul, B. Oliboni, X. Pan, M. Pancur, W. Pang, G. Papa, M. Paprzycki, M. Paralic, B.-K. Park, P. Patel, T.B. Pedersen, Z. Peng, R.G. Pensa, J. Perš, D. Petcu, B. Petelin, M. Petkovšek, D. Pevec, M. Piculin, R. Piltaver, E. Pirogova, V. Podpecan, M. Polo, V. Pomponiu, E. Popescu, D. Poshyvanyk, B. Potočnik, R.J. Povinelli, S.R.M. Prasanna, K. Pripužic, G. Puppis, H. Qian, Y. Qian, L. Qiao, C. Qin, J. Que, J.-J. Quisquater, C. Rafe, S. Rahimi, V. Rajkovic, D. Rakovic, J. Ramaekers, J. Ramon, R. Ravnik, Y. Reddy, W. Reimche, H. Rezankova, D. Rispoli, B. Ristevski, B. Robic, J.A. Rodriguez-Aguilar, P. Rohatgi, W. Rossak, I. Rožanc, J. Rupnik, S.B. Sadkhan, K. Saeed, M. Saeki, K.S.M. Sahari, C. Sakharwade, E. Sakkopoulos, P. Sala, M.H. Samadzadeh, J.S. Sandhu, P. Scaglioso, V. Schau, W. Schempp, J. Seberry, A. Senanayake, M. Senobari, T.C. Seong, S. Shamala, c. shi, Z. Shi, L. Shiguo, N. Shilov, Z.-E.H. Slimane, F. Smith, H. Sneed, P. Sokolowski, T. Song, A. Soppera, A. Sorniotti, M. Stajdohar, L. Stanescu, D. Strnad, X. Sun, L. Šajn, R. Šenkerik, M.R. Šikonja, J. Šilc, I. Škrjanc, T. Štajner, B. Šter, V. Štruc, H. Takizawa, C. Talcott, N. Tomasev, D. Torkar, S. Torrente, M. Trampuš, C. Tranoris, K. Trojacanec, M. Tschierschke, F. De Turck, J. Twycross, N. Tziritas, W. Vanhoof, P. Vateekul, L.A. Vese, A. Visconti, B. Vlaovic, V. Vojisavljevic, M. Vozalis, P. Vracar, V. Vranic, C.-H. Wang, H. Wang, H. Wang, H. Wang, S. Wang, X.-F. Wang, X. Wang, Y. Wang, A. Wasilewska, S. Wenzel, V. Wickramasinghe, J. Wong, S. Wrobel, K. Wrona, B. Wu, L. Xiang, Y. Xiang, D. Xiao, F. Xie, L. Xie, Z. Xing, H. Yang, X. Yang, N.Y. Yen, C. Yong-Sheng, J.J. You, G. Yu, X. Zabulis, A. Zainal, A. Zamuda, M. Zand, Z. Zhang, Z. Zhao, D. Zheng, J. Zheng, X. Zheng, Z.-H. Zhou, F. Zhuang, A. Zimmermann, M.J. Zuo, B. Zupan, M. Zuqiang, B. Žalik, J. Žižka, Informática An International Journal of Computing and Informatics Web edition of Informatica may be accessed at: http://www.informatica.si. Subscription Information Informatica (ISSN 0350-5596) is published four times a year in Spring, Summer, Autumn, and Winter (4 issues per year) by the Slovene Society Informatika, Litostrojska cesta 54, 1000 Ljubljana, Slovenia. The subscription rate for 2020 (Volume 44) is - 60 EUR for institutions, -30 EUR for individuals, and - 15 EUR for students Claims for missing issues will be honored free of charge within six months after the publication date of the issue. Typesetting: Borut Žnidar, borut.znidar@gmail.com. Printing: ABO grafika d.o.o., Ob železnici 16, 1000 Ljubljana. Orders may be placed by email (drago.torkar@ijs.si), telephone (+386 1 477 3900) or fax (+386 1 251 93 85). The payment should be made to our bank account no.: 02083-0013014662 at NLB d.d., 1520 Ljubljana, Trg republike 2, Slovenija, IBAN no.: SI56020830013014662, SWIFT Code: LJBASI2X. Informatica is published by Slovene Society Informatika (president Niko Schlamberger) in cooperation with the following societies (and contact persons): Slovene Society for Pattern Recognition (Vitomir Struc) Slovenian Artificial Intelligence Society (Saso Dzeroski) Cognitive Science Society (Olga Markic) Slovenian Society of Mathematicians, Physicists and Astronomers (Dragan Mihailovic) Automatic Control Society of Slovenia (Giovanni Godena) Slovenian Association of Technical and Natural Sciences / Engineering Academy of Slovenia (Mark Plesko) ACM Slovenia (Nikolaj Zimic) Informatica is financially supported by the Slovenian research agency from the Call for co-financing of scientific periodical publications. Informatica is surveyed by: ACM Digital Library, Citeseer, COBISS, Compendex, Computer & Information Systems Abstracts, Computer Database, Computer Science Index, Current Mathematical Publications, DBLP Computer Science Bibliography, Directory of Open Access Journals, InfoTrac OneFile, Inspec, Linguistic and Language Behaviour Abstracts, Mathematical Reviews, MatSciNet, MatSci on SilverPlatter, Scopus, Zentralblatt Math Volume 44 Number 2 June 2020 ISSN 0350-5596 Informática An International Journal of Computing and Informatics Introduction to Special Issue "SoICT 2019" Privacy Preserving Visual Log Service with Temporal Interval Query using Interval Tree-based Searchable Symmetric Encryption Cycle Time Enhancement by Simulated Annealing for a Practical Assembly Line Balancing Problem End of Special Issue / Start of normal papers_ H.T.T. Binh, I. Ide 113 V.-A. Pham, D.-H. Hoang, 115 H.-H. Chung-Nguyen, M.-K. Tran, M.-T. Tran H.M. Dinh, D.V. Nguyen, 127 L.V. Truong, T.P. Do, T.T. Phan, N.D. Nguyen T. Pisanski, M. Pisanski, 139 J. Pisanski M. Gruber, J. Matousek, 147 Z. Hanzlicek, D. Tihelka i J.C.D. Vera, G.M.N. Ortiz, 167 C. Molina, M.A. Vila M.A. Nemmich, F. Debbat, 183 M. Slimane M. Sahu, D.P. Mohapatra 199 N.C. Woods, C.A. Robert 225 H. Karna, S. Gotovac, 231 L. Vickovic M. Panda, S. Dehuri, 241 A.K. Jagadev R. Zhang, W. Shi 263 D. Wang, G. Xu 269 Y. Bathla, S. Szenasi 275 A Novel Method for Determining Research Groups from Co-authorship Network and Scientific Fields of Authors Dialogue Act-Based Expressive Speech Synthesis in Limited Domain for the Czech Language Knowledge Redundancy Approach to Reduce Size in Association Rules Hybrid Bees Approach based on Improved Search Sites Selection by Firefly Algorithm for Solving Complex Continuous Functions Computing Dynamic Slices of Feature-Oriented Programs with Aspect-Oriented Extensions Colour-Range Histogram technique for Automatic Image Source Detection Data Mining Approach to Effort Modeling On Agile Software Projects Multi-Objective Artificial Bee Colony Algorithms and Chaotic-TOPSIS Method for Solving Flowshop Scheduling Problem and Decision Making Research on Resource Allocation and Management of Mobile Edge Computing Network Research on the Detection of Network Intrusion Prevention With Svm Based Optimization Algorithm A Web Server to Store the Modeled Behavior Data and Zone Information of the Multidisciplinary Product Model in the CAD Systems Artificial Intelligence Methods for Modelling Tremor Mechanisms Call for Special Issue of Electronics V. Groznik M. Gams 285 287 Informática 44 (2020) Number 2, pp. 113-289