Volume 44 Number 3 September 2020 ISSN 0350-5596 Informática An International Journal of Computing and Informatics 1977 Editorial Boards Informática is ajournai primarily covering intelligent systems in the European computer science, informatics and cognitive community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international refereeing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor from the Editorial Board can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the list of referees. Each paper bears the name of the editor who appointed the referees. Each editor can propose new members for the Editorial Board or referees. Editors and referees inactive for a longer period can be automatically replaced. Changes in the Editorial Board are confirmed by the Executive Editors. The coordination necessary is made through the Executive Editors who examine the reviews, sort the accepted articles and maintain appropriate international distribution. The Executive Board is appointed by the Society Informatika. Informatica is partially supported by the Slovenian Ministry of Higher Education, Science and Technology. Each author is guaranteed to receive the reviews of his article. When accepted, publication in Informatica is guaranteed in less than one year after the Executive Editors receive the corrected version of the article. Executive Editor - Editor in Chief Matjaž Gams Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 251 93 85 matjaz.gams@ijs.si http://dis.ijs.si/mezi/matjaz.html Editor Emeritus Anton P. Železnikar Volariceva 8, Ljubljana, Slovenia s51em@lea.hamradio.si http://lea.hamradio.si/~s51em/ Executive Associate Editor - Deputy Managing Editor Mitja Luštrek, Jožef Stefan Institute mitja.lustrek@ijs.si Executive Associate Editor - Technical Editor Drago Torkar, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 251 93 85 drago.torkar@ijs.si Executive Associate Editor - Deputy Technical Editor Tine Kolenik, Jožef Stefan Institute tine.kolenik@ijs.si Editorial Board Juan Carlos Augusto (Argentina) Vladimir Batagelj (Slovenia) Francesco Bergadano (Italy) Marco Botta (Italy) Pavel Brazdil (Portugal) Andrej Brodnik (Slovenia) Ivan Bruha (Canada) Wray Buntine (Finland) Zhihua Cui (China) Aleksander Denisiuk (Poland) Hubert L. Dreyfus (USA) Jozo DujmoviC (USA) Johann Eder (Austria) George Eleftherakis (Greece) Ling Feng (China) Vladimir A. Fomichov (Russia) Maria Ganzha (Poland) Sumit Goyal (India) Marjan Gušev (Macedonia) N. Jaisankar (India) Dariusz Jacek Jak6bczak (Poland) Dimitris Kanellopoulos (Greece) Samee Ullah Khan (USA) Hiroaki Kitano (Japan) Igor Kononenko (Slovenia) Miroslav Kubat (USA) Ante Lauc (Croatia) Jadran Lenarčič (Slovenia) Shiguo Lian (China) Suzana Loskovska (Macedonia) Ramon L. de Mantaras (Spain) Natividad Martinez Madrid (Germany) Sando Martincic-Ipišic (Croatia) Angelo Montanari (Italy) Pavol Ndvrat (Slovakia) Jerzy R. Nawrocki (Poland) Nadia Nedjah (Brasil) Franc Novak (Slovenia) Marcin Paprzycki (USA/Poland) Wieslaw Pawlowski (Poland) Ivana Podnar Žarko (Croatia) Karl H. Pribram (USA) Luc De Raedt (Belgium) Shahram Rahimi (USA) Dejan Rakovic (Serbia) Jean Ramaekers (Belgium) Wilhelm Rossak (Germany) Ivan Rozman (Slovenia) Sugata Sanyal (India) Walter Schempp (Germany) Johannes Schwinn (Germany) Zhongzhi Shi (China) Oliviero Stock (Italy) Robert Trappl (Austria) Terry Winograd (USA) Stefan Wrobel (Germany) Konrad Wrona (France) Xindong Wu (USA) Yudong Zhang (China) Rushan Ziatdinov (Russia & Turkey) https ://doi.org/10.31449/inf.v44i2.3166 Informatica 44 (2020) 305-268 263 Reminder of the First Paper on Transfer Learning in Neural Networks, 1976 Stevo Bozinovski South Carolina State University, Orangeburg, SC, USA E-mail: sbozinovski@scsu.edu Overview paper Keywords: transfer learning, neural networks Received: June 10, 2019 This paper describes a work on transfer learning in neural networks carried out in 1970s and early 1980s, which produced its first publication in 1976. In the contemporary research on transfer learning there is a belief that pioneering work on transfer learning took place in early 1990s, and this paper updates that knowledge, pointing out that the transfer learning research started more than a decade earlier. This paper reviews the pioneering 1970s research, and addresses important issues relevant for the current transfer learning research. It gives a mathematical model and geometric interpretation of transfer learning, and a measure of transfer learning indicating positive, negative, and no transfer learning. It presents experimental investigation in the mentioned types of transfer learning. And it gives an application of transfer learning in pattern recognition using datasets of images. Povzetek: Ta članek opisuje delo na področju prenosa učenja v nevronskih omrežjih, opravljeno v sedemdesetih in zgodnjih osemdesetih letih prejšnjega stoletja, ki je prvo publikacijo izdalo leta 1976. V sodobni raziskavi o transfernem učenju obstaja prepričanje, da je pionirsko delo na področju transfernega učenja potekalo v začetku devetdesetih let, in ta članek to znanje posodablja. poudarja, da so se raziskave o transfernem učenju transfernem učenju začele 15 let prej. Ta članek pregleduje raziskave in obravnava pomembna vprašanja za sedanje raziskave o transfernem učenju. Daje matematični model in geometrijsko razlago transfernega učenja. Daje merilo transfernega učenja, vključno s pozitivnim, negativnim in tabula rasa prenosnim učenjem. Predstavlja eksperimentalno raziskovanje omenjenih vrst transfernega učenja. Uporablja prenosno učenje pri prepoznavanju nabora podatkov. 1 Introduction Transfer learning is a machine learning method where a learning model developed for a first learning task is reused as the starting point for a learning model in a second learning task (Tan et al. 2018). It is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem (Wikipedia > Transfer Learning, October 2020). Often previous learning is referred to as source and the next learning as target (Pratt 1993, Pan and Yang 2010, Weiss et al. 2016). Basically it is using a pre-trained neural network (trained for Task1) for achieving shorter training time (positive transfer learning) in learning Task2. Transfer learning is an emphasized way of learning in contemporary multistage neural networks named deep neural networks (e.g., Goodfellow et al. 2016). According to (Wikipedia > Transfer Learning, October 2020), the earliest work on transfer in machine learning is attributed to Lorien Pratt (1993). That work points out the earlier work on the subject (Pratt et al. 1991). After 1993, as pointed in Pan and Yang (2010) the fundamental motivation for transfer learning in the field of machine learning was discussed at a NIPS-95 workshop on "Learning to Learn" (Baxter et al. 1995). In the context described above, this paper informs on an explicit work on transfer learning which took place fifteen years before the Pratt et al. (1991) work. That research, reviewed here, started 1972 producing some unpublished reports (Bozinovski 1972, 1974) and a published report in 1976 (Bozinovski and Fulgosi, 1976) which explicitly in the title addressed the transfer learning concept. Research continued after that, and reports were given in (Bozinovski et al. 1977, Bozinovski 1978, 1981, 1985a, 1985b, 1995). That initial research on transfer learning is important to the current effort in transfer learning, because in addition of presenting initial concept of transfer learning in neural networks, it describes an early approach of defining a measure of transfer learning which is of interest to current efforts in transfer learning (Tan et al. 2018). The review presented here, in addition to mathematical treatment of transfer learning, describes the experimental investigation on transfer learning which took place during 1976-1981. This paper also gives an 292 Informatica 44 (2020) 291-302 S. Bozinovski application of transfer learning, in obtaining shorter training sequences in learning a dataset of images representing letters. In the sequel the paper first reviews the neural network used in early research on transfer learning, during 1972-1981. Then it gives a mathematical model of supervised learning, in which it explicitly introduces transfer learning. Then it gives a geometrical model of transfer learning, including positive, negative, and no transfer learning. Then, in Section 5, it defines a mathematical index, a measure of transfer learning. In Section 6 the paper discusses a search for a solution of pattern classification problem in case of negative transfer learning. In Section 7 the paper discusses the multi-class multi-template problem of transfer learning. Section 8 shows results of experimental investigation in transfer learning. It first shows experiments with small set of low resolution images representing letters, demonstrating experimentally the effect of tabula rasa, positive, and negative transfer. The paper then extends to an application of transfer learning in case of learning a dataset of three sets each containing 26 images representing letters. The section 9 reviews the related work by other authors which appeared after 1986, influenced by the renewed interest in neural networks due to the book of Rumelhart, McClelland, and the PDP Group (1986), including the work of Pratt et al. (1991) and Pratt (1993). The paper ends with a discussion and conclusion section. 2 The neural network The neural network used in our study (Bozinovski 1972, 1974, 1995) is shown in Figure 1. Figure 1: A 5-layer neural network used in supervised learning for pattern recognition in the research described here (Bozinovski 1974, 1995). The network contains 5 computational stages (layers). The first one, M, is the sensor layer, with sensors arranged according to a need, for example as a matrix retina. Sensors are binary giving values 0 or 1. Second layer, Z, is a feature extraction layer. Feature is a pattern which is used as input in recognition of a higher level pattern. Examples of features might be "horizontal line", "circle", "upper left corner", or a rather complex feature. Important is that the feature is a first stage in recognizing a pattern, which is a set of features. One way of defining a feature is to pre-wire all sensors in a horizontal line and to create an output from Z layer, with interpretation "horizontal line". The other way is to add a Z-element with trainable weights and produce an output with interpretation "horizontal line". The number of outputs from Z-layer is often larger than the number of input sensors. For example each sensor can be considered a feature, plus some needed features such as "middle horizontal line", "left corner" or "square". The outputs of the layer Z are inputs to the third layer, the A-layer. It contains A-elements, or associative units, as named originally by Rosenblatt (1958, 1962), and used in early neural learning research (e.g. Glushkov, 1967). We will use that term, but we will also use the term associative weights. A weight represents the relevance of the feature in creating the concept of a pattern. They are divided into subsets A1, A2, ..., An, each subset having inputs from the feature layer Z. The subsets are associated to a cognitive concept, a class to which input patterns are classified. If there are n possible classes and Ns possible features, then each A element can be represented by values wis, i=1,..,n; j=1,..,Ns. They are in general real numbers. Each class of A elements represent a concept, a cognitive class, that will be learned in the pattern classification process. For example, if a task is to classify images, then one set of A elements will be devoted to recognize image "E", another to recognize the image "F" etc. Next layer, S, are elements that perform some computation over the subsets of A elements representing cognitive classes. An S element si computes some function y(wij) over the elements wij , i = 1,..,n; j =1,.., Ns. Most often these elements compute a weighted and thresholded sum yi = Ejwij xj - pa where pa is named threshold of the element si. Further in the text we will use the 9-notation for threshold , i.e. pa = 0i. There are n S-elements in this layer, s1, s2, ..., sn A subset of A elements and the corresponding S element is named a neuron of the neural network. The next layer, D, is an arbiter layer, which chooses an S-element out of n alternative S-elements. Usual way is computing a maximum function. This layer can be composed by set of neurons which have a common threshold. Such an Isothreshold Neural Network (e.g. Bozinovski 1985a) has a common threshold value equal to maximal value of the individual neuron thresholds. Such a network provides a mechanism for computing maximal value in neural networks. In addition, the maximal value might be normalized to 1, and the maximum computing network can be viewed as computing fuzzy union if the input values are also normalized between 0 and 1. The output of this layer is an integer from 0 to n. For example, output d = 2 means that the observed pattern belongs to class 2 out of the considered n classes. The output d = 0 means that the classification is undecided, possibly there are two S-elements computing the same largest value, so there is no single maximal value. The next layer E, is output interface layer. It activates some device that is controlled by this neural Reminder of the First Paper on. Informatica 44 (2020) 291-302 293 network. For example if d=2 is computed, then this layer may activate a speech device telling the sound representation of the class 2. The neural network presented in Figure 1 was the one we started our research in neural networks with. The first task we considered was distinguishing a horizontal vs vertical line on a matrix retina (Bozinovski 1972). That is not reviewed here. This paper is focused on modeling transfer learning. 3 Mathematical modeling of transfer learning in a neural network For purpose of presenting the concept of transfer learning, here we use a simplified version of the 5-layer network on Figure 1. Let the layer M consists of m synapses or sensors. Let layer Z does not compute any additional feature besides the sensor inputs, so it just represents connections from sensors to A-elements. Let each subset of A elements has the same connections to the sensors as the other subset of A-elements. The Aelements will be named synaptic weights, such that the weight wis represents the s-th synapse element in the i-th class of A elements. Then the S-element si computes the function yi = SswBxs - 0i. Let the layer D is represented by a maximum selector function: (di = 1 if yi = maxi{yi} otherwise di = 0). Other way of denoting a maximum selector is d = indmax{yi} where indmax{ } returns the index of the maximal element in the considered set. In the literature this function is usually written as d = argmax{ }, but we use our original notation (Bozinovski and Fulgosi, 1976). 3.1 An approach toward modeling supervised learning in neural networks The principal learning concept of the neural network approach toward machine learning is the concept of (synaptic) weights (e.g. Rumelhart et al. 1986, Goodfellow et al. 2016). In pattern classification with neural networks the principal representation spaces are the pattern feature space and weights space. However, it should be noted that while in artificial neural nets synaptic weights are observable, in real biological systems they are not observable. So it is interesting to use a representation of the supervised learning problem which will not deal with synaptic weights as primary representation concept. Here we will describe such a representation which is a weights-free and we call it teaching space (Bozinovski 1981, 1985b). Let us note that in a supervised learning there is a system named teacher who has a reference model of the knowledge to be transferred in the other system named learner or student. The teaching space approach is based on the following notation: Let x be a body of knowledge to be learned by the student. For example x might be a visual pattern to be classified in a class. The supervised learning procedure (training) contains both teaching trials (where the teacher presents the knowledge about x), and examination trials (where the student presents its knowledge about x). After the training is completed there will be many exploitation trials, where the learner will show its knowledge in an application. Let !(x) denotes a teaching (or advising) trial, representing the teacher's reference model knowledge about x. Let ?(x) denotes a test (or examination) trial, representing the current learner's knowledge about x. Then, the goal of the teaching process becomes ?(x) = !(x) for all considered x. (1) The learner we use is a maximum selector classifier (Figure 1). For each input pattern x in an test trial, the learner computes n alternatives, i.e., computes n functions y1(x),..,yn(x), and chooses the one with maximal value. If there is no maximal value the learner gives special answer meaning "undecided", for example value 0. Lets define a set X of N objects (patterns), ,xn}, to be classified into n classes, C1,..,Ck,..,Cq,..,Cn, where N > n. Let, by teachers reference model, the i-th pattern belongs to the k-th class and j-th pattern belongs to the q-th class. That can be written as !(xi)= Ck; i=1, ..., N; k=1,...,n; (2.1) !(xj)= Cq; j = 1,., N, q=1,...,n; j * i, q*k; (2.2) In an examination trial it is computed the maximum value, which means that the correct classification is achieved if the following pair of inequalities holds ?(xi) = ! (xi) = Ck yk(xi ) > yq(xi ) (3.1) ?(xj) = ! (xj) = Cq » yq(x ) > yk(xj ) (3.2) Further, we assume that the patterns are represented as feature vectors x:,..,xn and that the weight vectors are represented with w1,..,wn, where wk is associated with the class Ck. The learning process is governed by a consequence driven teaching process with an error correction learning rule if ?(xi) is different than (!(xi) = Ck ) then correct wk toward xi: wk = wk + cxi (4) where c is a constant. In words, if the classifier erroneously classifies the pattern xi in an test trial, a teaching trial is introduced in which the pattern xi is added to the weight wk, lecturing that xi belongs to Ck. Here c is a learning rate which is a constant and we use the value c=1. 3.2 Introducing transfer learning Consider neural network from Figure 1 which has capability to classify N patterns into n classes, N > n. Consider the simplest task, two patterns xi and xj to be classified into two classes Ck and Cq. The problem is stated with relations (3). However let us emphasize that k and q are arbitrary in the set of {1,...,n| k*q}, and also i and j are arbitrary in the set {1,...,N| i*j}. Now we introduce transfer learning. Let assume the considered neural network has been subject of a learning task which we call first learning task. After that first learning task the neural network learner is now subject to 294 Informatica 44 (2020) 291-302 S. Bozinovski a second learning task. The second learning task will be carried out by a supervised learning (or teaching) process represented by a teaching sequence L. The teaching sequence contains both teaching and test (examination) trials. However, the memory of the learner is updated only during the teaching trials. The test trials demonstrate the knowledge already stored in the memory of the neural network learner. Let yk(xi) be the output of Sk element of the neural network at the completion of the first learning task. It is the initial knowledge as demonstrated by this neural network before the second learning task. We emphasize that with notation yk°(xO := yk(xi), pointing out with a superscript 0 that it is initial knowledge for the second learning task. So the output yk0(xi) manifests the transfer learning from the first teaching task about the concept class k, before the second teaching task with teaching sequence L is applied. Let yk(xi/L) be the output of Sk element representing class k when shown pattern xi after the learning in the second task with the learning sequence L. So the second learning task will be modeled with the following outputs from elements Sk and Sq yk(xi/L) = yk°(xO+ j (5.1) yq(xj/L) = yq°(xj)+ aJlR (5.2) where pi is the number of appearance of pattern x in a teaching trial of the teaching sequence L, i.e. the number of application of the learning rule (4), and aij is the inner product between patterns, aij = xiTxj. So, in order a correct pattern classification to be achieved in the second task, by a neural network with maximum selector as in Figure 1, it is necessary and sufficient that the following system of inequalities holds yk(xi/L) > yq(xi/L) (6.1) yq(xj/L) > yk(xj/L) (6.2) which leads to aii pi - aij pj > - yk0(xi) + yq0(xi) -aji pi + ajj pj > yk0(xj) - yq0(xj) That reasoning leads to the following Theorem: Theorem 1. (Transfer learning in case of learning arbitrary two patterns from a set of patterns) Let xi and xj be arbitrary patterns from a set X ={x1,..,xi,..,xj,..,xN} of N patterns, which a maximum selecting neural classifier should learn to classify into given two classes Ck and Cm respectively, from a set C={C1,..,Ck,..,Cm,..,Cn} of n classes. Let aij = xiTxj. Let the lecture (teaching trial) !xi = Ck is presented pi times, and let !xj = Cm is presented pj times in the teaching sequence L. Then, the problem of correct classification learning is equivalent to the problem of finding pi and pj which satisfy the pair of inequalities where (7.1) (7.2) equ -aji aij pi pj Tqk(xi) Tkq(xj) which in compact form can be written as Ap > T (8) ( 9) T = Tqk(xi) = Jkq(xj) yq0(xi) -yk0(xi) yk0(xj) - yq0(xj) (10) is named transfer learning vector. Before we present the proof of the Theorem 1 we will give interpretation of the variables which appear in the theorem. First we point out that the left side of the inequalities (8) contain a matrix of all inner products between patterns. The inner product aij between two patterns xi and xj shows how much their features overlap. It can be viewed as covariance, a manifestation of pattern similarity. We denote that matrix A = [aij], and name it a matrix of mutual similarity between patterns. Note that this matrix is invariant to the teaching process, it simply describes relation between the given patterns. The vector p=(pi pj)T shows how many times patterns were shown in a teaching trial in the teaching sequence L. It is a training vector of the second learning task. The right side of inequalities contain the variables are due to transfer learning from a learning task prior to this considered task of training using curriculum L. It contains differences of outputs of S-elements for each pattern shown in the teaching process, i.e. yqk0(xi) = yq0(xi) - yk0(xi) for shown pattern xi, and ykq0(xj) = yk0(xj) - yq0(xj) for shown pattern xj. So the left side of matrix inequalities, Ap, contains all controllable and observable parameters of the teaching process. If patterns are known, the matrix A is known. The teaching sequence L is the one it is looked for, and after it is found, the vector p will be known. However, the right side of inequalities, vector T, which represents transfer learning, is in general case not known. Teaching of a biological brain does not assume that initial values of weights are known. Often the task is to teach a learner regardless the transfer learning. However, because of unknown transfer learning teaching process might converge in a longer time. The proof of the Theorem can be expressed using a reasoning flow diagram as shown in Figure 2. The equations and inequalities used have been already described in the text. Note that if all thresholds in the network are equal, then the transfer learning can be expressed as Tkq(xj) = (wk0-wq0)xj (11) 3.3 Modeling positive and negative transfer learning In this section we will address formally the following questions. Given a neural network that has been subject to a learning Task1, is it possible to find a teaching sequence L which will solve the teaching Task2 regardless the transfer learning from Task1? > a Reminder of the First Paper on. Informatica 44 (2020) 291-302 293 !(xi) = Ck !(xj) = Cq.. yk(xi)|L > ym(xi)|L yq(xj)|L > yk(xj)|L yk(xi)|L = (wk|L)xi -8k (wk|L)xi - 8k > (wq|L)xi - 8q (wq|L)xj - 8q > (wk|L)xj - 8k wk|L = wk0 + cpixi aij := xixj cpiaii - cpjaji > (wq0xi- 8q) - (wk0xi-8k) cpjajj - cpiaii > (wk0xj- 8k) - (wq0xj-8q ) yq0(xi) := wq0xi-8q Tqk0(xi) = yq0(xi)-yk0(xi) c=1 aii -aji pi rvw ^ > -aij ajj pj tkq0(xj) V J J (14.2) pj > (aji/ ajj) pi + (yk0(xj) - yq0(xj))/a, (15.2) and finally pj < (aii / aij) pi + xkq(xi) / aij (16.1) pj > (aji / ajj) pi + Xkq(xj) / ajj (16.2) These inequalities can be observed geometrically as in Figure 3. Tkq(xj)/ajj Figure 2: Proof of the Theorem 1 in a reasoning flow representation. Case of positive transfer of learning. Is it possible that Task1 helps achieving shorter sequence L in Task2, than if starting from no previous transfer of learning? Case of negative transfer of learning. Is it possible that Task1 will produce a longer sequence L in Task2, than if starting from no previous transfer of learning? In order to answer those questions we will further elaborate on the inequalities (7). We repeat them here for clarity and renumber them (12) for keeping the sequence: aii pi - aij pj > - yk0(xi) + yq0(xi) (12.1) -aji pi + ajj pj > yk0(xj) - yq0(xj) (12.2) The inequalities (12) can be rewritten to see explicitly how pj depends on pi. To see that, we move terms with pi on the right side of the inequalities (12) and we obtain the following system of inequalities: - aij pj > - aii pi - yk0(xi) + yq0(xi) (13.1) ajj pj > aji pi + yk0(xj) - yq0(xj) (13.2) Now we multiply equation (13.1) with -1, which changes the inequality sign from > to <. We obtain the following system of inequalities: aij pj < aii pi + yk0(xi) - yq0(xi) (14.1) ajj pj > aji pi + yk0(xj) - yq0(xj) . where from pj < (aii / aij) pi + (yk0(xi) - yq0(xi))/aj (15.1) F ' pi Figure 3: Geometric interpretation of Theorem 1. Note that because aij =xiTxj, the coefficient a^/a^ > 1, and the coefficient a^/ay < 1. Because xi ^ xj those coefficients are never at the same time equal 1. Because coefficients a^/ay > 1 and aji/ajj <1 are slopes of the boundaries of the solution region, it means because patterns are different, xi ^ xj, the angle p on Figure 3 always exists, and the solution points for (pi, pj) inside the shaded region defined by the angle p always exist. So we can formulate the following statement. Theorem 2. It is always possible to chose a teaching sequence L which will contains patterns xi and xj (xi ^ xj), such that after training with L the learner is able to correctly classify the patterns regardless transfer of learning from a previous learning task. The proof is given in the previous reasoning using equations (12)-(16). The teaching space in which we observe transfer training is an integer space. The components pi and pj are non-negative integers. In Figure 3 it is shown that only the integer points are solutions for correct classification of xi and xj. 4 Geometric interpretation of transfer learning: positive, negative, and tabula rasa From Figure 3 we can see that the solution region of the teaching process is a convex cone defined by two parameters: 1) the position of the coordinate origin relative to the vertex of the cone, and 2) the angle of the convex cone. The orientation of the cone is always such that most of it lies within the first quadrant, although the vertex may be in any quadrant. We call this a positive convex cone. The angle of the convex cone is determined solely by inner products between patterns. The angle represents the similarity between patterns in a sense of overlapping features. 292 Informatica 44 (2020) 291-302 S. Bozinovski Transfer learning is geometrically represented by the position of origin of the coordinate space (pi, pj) relative to the convex cone. That is illustrated in Figure 4. Figure 4: A geometric interpretation of transfer learning. The plane (pi, pj), the convex cone, and various coordinate origins representing transfer learning from a previous learning task. As Figure 4 shows, if the peak of the convex cone is in the coordinate origin (coordinate system T0), then there is no transfer learning. The learner starts from tabula rasa initial conditions. It means that the memory values are all equal, for example all zero. However they are not necessary zero, they need only to be all equal (homogenous initial conditions). In this condition a learning process must take place for both patterns (or lessons) xi and xj in order the learner to correctly recognize those patterns. If the coordinate origin is inside the solution region (coordinate system T4) the learner has positive transfer learning. There is no need of additional learning. The previous learning is enough for the correct recognition of the patterns. If the coordinate origin is in region symmetrically opposite the solution region, (negative convex cone), it is an example of negative transfer learning. Coordinate system T-4 is such a case. Both patterns xi and xj have been previously, in Task1, classified into classes which are incorrect according to the new Task2. So the new learning process must include both patterns. The learning process will be longer than in case of tabula rasa condition. If the coordinate system is in the area outside the positive and negative convex cones (examples T2 and T-2 coordinate systems), then there are situations in which for one pattern there is positive transfer learning and for the other is negative. 5 Defining an index of transfer learning in a neural network Based on the geometrical interpretation of transfer learning in Figure 4 we will now define an index of transfer learning, a numerical representation of transfer learning. Measure of negative transfer as well as transferability measure are emphasized in contemporary transfer learning research (Tan et al. 2018). The index which we will discuss here is proposed in (Bozinovski and Fulgosi 1976). The mathematical measure of transfer learning was introduced using the following reasoning. Observe the segments the lines in Figure 4 define intercepting with ordinate pj. For T0 coordinate system both lines have intercept 0. For coordinate system T4, one intercept is positive (for the boundary line pj > pi) and the other is negative (for the boundary line pj < pi). For coordinate system T2 one intercept is positive and the other is negative. For Ti both intercepts are negative. For T3 both intercepts are positive. Note that also in Figure 3 above, it is shown a case of both positive intercepts. So we will only observe the sign of the intercepts, positive, negative, or zero, and we will define index of transfer learning. Note that the intercepts are defined as Tkq(xi)/aij and Tkq(xj)/ajj and consequently their signs are defined as sign(Tkq(xi)/aij) and sign(xkq(xj)/ajj) where sign( ) is a function that gives 1 for positive, 0 for zero, and -1 for negative argument. Now we can define an index, a measure of transfer learning on the basis of signs of intercepts of the boundary lines for patterns xi and xj. TL(xkq(xi), Tkq(xj)) = 3sign (Tkq(xi) -sign(Tkq(xj) (17) According to this index, if both signs are positive then TL =2. That corresponds to a coordinate system T2 in Fig. 4. If both are negative, then TL = -2 and that corresponds to the coordinate system T-2 is Fig 4. If both are zero, then TL=0, which corresponds to coordinate system T0 in Fig. 4. If sign (tkq(xi) =1 and sign(xkq(xj) = -1, then TL = 4 which corresponds to coordinate system T4 in Fig. 4. Note that the index TL considers all the integer values in the interval [-4,+4]. Figure 5 shows the TL values and their geometric interpretation. Figure 5: A geometric interpretation of index TL, a numerical index of transfer learning. It shows the values of the regions of the (pi, pj) plane where a learner finds itself after the first learning task Task1, and facing the second learning task Task2. The introduced index of transfer learning shows position of the coordinate origin in the plane (pi, pj) relative to the peak of the vertex inside which is a solution of the pattern recognition problem. It shows where in the (pi, pj) plane is the starting point to learn Task2 by a learner with transfer learning from previous Task1. From Fig. 5 we can give following interpretations for transfer learning index TL: If TL = 4 the learner correctly classifies both patterns, without need for additional learning. It is a positive transfer learning from a previous Task1. Reminder of the First Paper on. Informatica 44 (2020) 291-302 293 If TL = +1 or +3 the learner recognizes one pattern but is undecided about the other. The coordinate origin lies on a boundary line of inequalities. For example, if TL=3 the coordinate origin lies on the right boundary line of the positive convex cone. In such a case, if the convex cine angle is not too small. then only one presentation of the pattern xj in a teaching trial is enough that the learner correctly classify both patterns. If TL = 0 the learner is undecided about both patterns. There is no transfer of a previous learning, the learner is in tabula rasa condition.. If TL= - 4 the leaner incorrectly classifies both patterns. It is example of negative transfer learning. If TL= +2 the learner correctly classifies one pattern but incorrectly the other one. In this case there is a transfer learning, positive for one pattern but negative for other one. The considered index of transfer learning (17) can be normalized for value between -1 and 1 if the right side of equation (17) is divided by 4. 6 Search for a learning solution in case of negative transfer learning To illustrate further the learning process including transfer learning, we will consider the search for a learning solution in case of negative transfer learning. Figure 6 shows such an illustration. First let us note that the orientation of the solution convex cone in space is regardless of the transfer learning. The solution cone orientation depends solely on the considered patterns and their mutual position on a medium they are shown. If the patterns are digital images on a binary retina, then their mutual overlapping aij = aji and self overlapping aii and ajj will define the solution region. As example, imagine image patterns E, T, and F on the retina of 7x5 binary sensors. The considered learning Task2 in Figure 6 can have different coordinate origins, due to a transfer learning. Consequently, a learning process will have different trajectory in the Task2 teaching space, depending on transfer learning from Taskl. It can be seen from Figure 6 that due to a negative transfer learning it is possible that a teaching sequence L never finds a pattern classification solution, as is the case with learning trajectory starting with initial condition A. The other cases of negative transfer learning can be compensated with carefully chosen teaching sequence L, as shown with teaching sequences B, C, and D. In case of initial condition B, it is enough that only the pattern xi is shown several times until a solution point is found. On case of initial condition C both patterns must be shown for correct classification. In case of initial condition D, it is shown that a teaching sequence containing equal number of xi and xj will eventually reach a solution region. However, one can observe that also a sequence containing only xj will eventually reach the solution region. 7 Multi-class, multi-template task Pattern classification usually assumes several template patterns for each class to be included in the teaching process. In the test task (or in exploitation task) there might be patterns that are not shown as template patterns. In this section we will discuss two topics. First is how the model given by Theorem 1 applies in case of several templates for a class, and second is how transfer learning is represented in the synaptic weights in an artificial neural network. As opposite to natural neural networks where weights are not observable, in artificial neural networks usually it is assumed that the synaptic weights are observable. Consider a task in which three patterns are to be classified into two classes: xi, x2eCi, x3 e C2. The two neurons associated with the two classes have weight vectors wi and w2, and thresholds 81 and 82 respectively. The maximum selector layer for each presented pattern computes the following inequalities: (xieCi): wixi -81 > w2xi -82 (x2eCi): wix2 -81 > w2x2 -82 (x3eC2): w2x3 -82 > w^ -81 (18) In case of transfer learning, where weights have initial values w0i (i=1,2) we have (x:/L): (w01+p:x1+p2x2)x: - 81 > (w02 + p3x3)x: - 82 (x2/L): (w01+p:x:+p2x2)x2 - 81 > (w02+p3x3)x2 - 82 (19) (x3/L): (w02+p3x3)x3 - 82 > (w01+p1x1+p2x2)x3 - 81 . After rearrangement, and introducing w0kq = w0k-w0q and 8kq = 8k - 8q, where k, q e {1, 2} and k^q, we obtain matrix representation of the classification problem which includes transfer weights Figure 6: Some learning trajectories in teaching space of Task2, due to transfer learning from a Task1 (Bozinovski 1981). (x1/L): an a21 -a31 p1 w0 (x2/L): a12 a22 -a32 p2 > 0 (xs/L): -a13 -a23 a33. p3 0 0 w021 0 0 0 w012 x1 812 x2 + 812 . x3 821 (20) The shaded areas are diagonal sub-matrices of classes. Each class sub-matrix has number of rows (and 0 292 Informatica 44 (2020) 291-302 S. Bozinovski columns) equal to number of templates for that class. In case of inequalities (20), the first class contains two templates and the second contains one template pattern. From this case study we can generalize the transfer learning model for a multi-class and multi-template per class case as (X/L): Ap > W0X + 9 (21) screen. The experiments were carried out on a computer VAX/VMS. Figure 7 shows the Computer Terminals dataset. As can be seen, the letters of the three terminals are mostly identical on an image with resolution 7x5, with differences in letters A, B, D, G, J, M, N, O, V, and W. where X ={xi,..,xn}ís the set of patterns which should be learned in the second task with the curriculum sequence L. Note that the mathematical model of transfer learning (21) divides the left side of relation to be a teacher's side, and right side a learner's side. At the teacher side are similarity matrix A and distribution vector p showing how many times each pattern appeared in a teaching trial of the curriculum L. Matrix A shows that what matters in the teaching process are not the patterns themselves but rather their correlations, inner products, which can be interpreted as similarities. At the learner side, W0 represents difference of initial conditions of the memory due to transfer learning, X is the vector of template vectors, a matrix containing patterns to be classified, and 9 represents difference between thresholds of neurons representing classes. Note that the matrix W0 contains blocks showing which template is assigned to which class. As pointed before, the space p = (pi,..,pN) is an integer space. Dealing with neural network learning is actually an integer programming problem. We are interested in the most efficient training, and we are looking for a training sequence L of the minimal length. So we look for a criterion pi+p2+...+pN = min (22) Such a criterion will observe the appearance of patterns only in a teaching trial. If we are interested in minimal sequence that includes test trials, then the optimality criterion is (pi+qi ) + (p2+q2) +...+(pN+qN) = min (23) where qi is number of appearances of the pattern xi in a test trial, which does not change the memory of the learner, but affects the length of the training sequence L. 8 Experimental investigation on transfer learning Experimental investigation on transfer learning was carried out in the period 1976-1981. Initial experiments was with a dataset containing images of letters A, B, E, F, and T taken from the terminal IBM29 card puncher. Those experiments were carried out on the computer IBM 1130. Later experiments were carried out with two datasets. One dataset contained 40 images, consisting of 26 letters, 10 numbers, and 4 special symbols from the terminal IBM29. The other dataset can be described as Computer Terminals dataset, consisting of 3x26= 78 images, taken from three computer terminals: IBM29 card puncher, VR14 video screen, and VT50 video IBM29 VR14 VT50 Figure 7: The dataset Computer Terminals used in experimental investigation. 8.1 An experiment in tabula rasa condition, showing influence of pattern similarity Here we will show an experiment in tabula rasa learning , to see the influence of similarity (overlapping pattern features) on the learning process. Consider the patterns E, T, F, shown in Fig. 7. They are the same for all considered terminals. Figure 8 shows the search through the (pE, pT, pF ) space that the learning process performs. Figure 8: Learning trajectory in case of tabula rasa learner, learning similar patterns E and F, together with the pattern T (Bozinovski 1981, 1985b). As Fig. 8 shows, the problem is the distinction between the patterns F and E. The convex cone angle is narrow, and it is possible that in some search steps the cone does not contain an integer point. The search for an integer solution is what makes necessary to repeat images E and F several times until they are distinguished by the learner. This experiment emphasizes the problem of feature overlapping and the problem of one image included in another image. To emphasize the image-subimage relation, a measure of similarity between patterns is introduced in (Bozinovski and Fulgosi 1976). The following index Reminder of the First Paper on. Informatica 44 (2020) 291-302 299 SL(xi, xj) = x1TxJ/min{x1Tx1, xjTxj} (24) has values between 0 and 1. If SL = 0 the solution convex cone covers the entire first quadrant of the teaching space (pi, pj). If 0 < SL < 1, the convex cone includes the line pj=pi. If SL = 1, one of the cone boundaries is the line pj = pi. Such a measure is used to predict the length of the teaching sequence L. and with that the efficiency of the training. 8.2 Experimental investigation in positive and negative transfer learning Experiments shown here are carried out during 19761978 on a IBM1130 computer. Table I shows the results of the transfer learning experiments which show both positive and negative transfer. (Bozinovski et al. 1977, Bozinovski 1978). Table 1: Experiments in transfer learning. Cases of tabula rasa, negative, and positive transfer learning. In presenting the results of the experiments with transfer learning here we introduce the notation LD2/D1, meaning training sequence of Task2, trained with a set of patterns D2, after the Task1 in which the learner is trained with a set of patterns D1. For a tabula rasa training, we use notation Ld2/0. Experiment with no transfer learning. As can be seen from the presented experiments, learning the patterns E,T, and F with no transfer learning needs the teaching sequence Letf/0 = EFFEFEFEFTTT. The length of the sequence is due to similarity between E and F. Experiment showing positive transfer learning. If the neural network is previously exposed to the Task1 where it learned to recognize A and B, and after that is exposed to Task2 to learn E, T, and F, then the teaching sequence for Task2 is Left/abt = EFT. The teaching sequence for learning E, T, F in this case is shorter than in case of tabula rasa. That is experimental evidence of positive transfer learning. Experiment showing negative transfer learning. If the neural network is previously exposed to a Task1 to learn E and F, and after that in Task2 to learn A and B, the teaching sequence for learning A and B is Lab/ef = ABABABAB. It is longer than in case of learning A and B in tabula rasa condition, LAB/0 = AB. That is an experimental evidence of negative transfer learning. 8.3 Application of transfer learning Here we show results of experiments carried out during 1980-1981 on a VAX/VMS computer (Bozinovski 1981). The experiments consider real application, learning to recognize letters from computer terminals. Consider the dataset Computer Terminals from Figure 7. The question we would like to answer experimentally is: If in the Task1 we teach a learner to recognize the letters from the terminal VR14, how faster the learner will be able to learn in Task2 to recognize the letters from the terminal IBM29, comparatively to learning from tabula rasa condition. In these experiments we used the following teaching strategy (Bozinovski 1981) named perceptron teaching strategy: Procedure PerceptronTeachingStrategy iteration: teachflag = 0; i:=0; n=26; while i < n do i:=i+1 grade = test(xi); if grade = 'incorrect" then teach(xi), teachflag=1; endwhile; if teachflag = 1 goto iteration; end. This strategy performs test trials on all n=26 images, and only when needed, a teaching trial is applied for a particular image. After such an iteration (or epoch), another iteration takes place, and so on, until no teaching trial appeared in an iteration (teachflag=0). That means there were only test trials in the last iteration and the learner now recognizes all the patterns correctly. Using this strategy applied to the set of letters IBM29, in case of tabula rasa, it gives the 9 iterations as shown in Figure 9. T* = ABCDEFGHIJKLMNOPQRSTUVWXYZ CEFGHIJKLMNOPQRSTUVWXYZ ABCEfGJLOPQRSUWZ ACEEFHIJKLMNPRSTUVXY BEFGHJKLMOPQRUWZ ACJPRTVWXY BDEFHIKLMNPQVZ FOU CEGJLS Figure 9: Teaching sequence of learning the set of letters IBM29 with no transfer learning. With T* we denote the solution teaching sequence in which only the teaching trials appear. With |T*| we denote its length, in trials. With C* we denote teaching sequence containing both teaching and test trials, and with |C*| its length. For the experiment on Fig. 9 we obtained |T*|ibm29/0 = 135 and |C*|ibm29/0 = 395. If before learning the set IBM29 in Task2, the set VR14 was learned in Task1, then in Task2 the teaching process completes in 4 iterations, with the teaching sequence shown in Fig. 10. Task 1 Task 2 Task 2 Images Images Teaching sequence No transfer learning, tabula rasa A, B AB. A, B, T ABT. E, F EFFEFEFEF. E, F, T EFFEFEFEFTTT. Negative transfer learning E, F A, B ABABABAB. Positive transfer learning A, B, T E, F, T eft. 300 Informatica 44 (2020) 291-302 S. Bozinovski T* a ADGIMNOCfrUW ACDfflJKNOQRSUWX C DGHIJMNP5VW YL ABEFKLMX Figure 10: The teaching sequence in case when set IBM29 is leaned, providing that previously was learned the set VR14. In the experiment shown in Fig. 10 we obtained |T*|ibm29/vri4 = 48 and |C*|ibm29/vr14 =178. The experiment shown in Figure 9 and 10 shows an application of positive transfer learning. We obtained shorter training sequence |C*|lBM29/VR14 =178 < |C*|lBM29/0 = 395. The teaching time is 178/395 = 0.45 of the tabula rasa teaching time, and the speed of learning increases 1/0.45 = 2.2 times. When we carried out an experiment of learning the set VT50 if previously learned the set VR14, the result was |T*|vt50/0 =207, |T*|vt50/vr14 = 43, |C*|VT50/VR14 =199 < |C*|vt50/0 = 545 The transfer learning teaching time is 199/545 = 0.36 of the tabula rasa teaching time, and the speed of learning increases 1/0.36 = 2.8 times. This application shows the reason of use of transfer learning. If you have a knowledge of a dataset classification stored in a neural network in Task1, then transfer that knowledge to a different task which learns classification of a similar dataset. The training time will be shorter. Here in this application subsection we give also the result of learning a dataset IBM29(40) of 40 images, defined as IBM29(40) = IBM29u{+, -, =, /}u{0,1,...,9} starting with tabula rasa condition. The result we obtained is: 10 iterations, |T*|iBM29(40)/0 = 204 and |C*|ibm29(40)/0 =604. This is an example of a 1981 machine learning experiment with 40 patterns (Bozinovski 1981). 9 Transfer learning research after 1986 The main focus of this paper is to give a review of the initial work on transfer learning in neural networks which took place between 1972 (Bozinovski 1972) and 1985 (Bozinovski 1985a, 1985b). To the best of our knowledge during that time period, there was no other work on transfer learning in neural networks. That was the period when neural networks were not the main topic in Artificial Intelligence, due to the book of Minsky and Papert (1969) which pointed out some limitations of perceptron type neural networks. Although during 1970's and 1980's there were works on multilayered neural networks (e.g. Fukushima, 1975, 1980), the interest in multilayered neural networks significantly increased after 1986, due to appearance of the book by the Parallel Distributed Processing (PDP) Group (Rumelhart et al. 1986). That book reignited the interest in neural networks, and after some time, the interest in transfer learning in neural networks. Here we will give a short review on the works on transfer learning after 1986. Early works after 1986 used other terms to describe transfer learning. One such term was "sequential learning", where negative transfer learning was covered with the term "interference" (McCloskey and Cohen, 1989). Other terms used were "adaptive generalization" (Sharkey and Sharkey, 1992), 'learning by learning" (Naik and Mammone, 1993), and "lifelong learning" (Thrun and Mitchell, 1993). In 1991 the term transfer learning related to neural networks reappeared in literature. That was the work of Pratt. Mostow, and Kamm (1991). That paper introduced a framework of transfer learning, pointing out various types of transfer learning. That framework was also described in the work of Pratt (1993). The framework is shown in Figure 11. expert symbolic -►! representation \ i i * r knowledge i extraction insertion Figure 11. A general framework for transfer learning (adopted from Pratt et al., 1991). As can be seen from Fig. 11, the general framework for transfer learning proposed in 1991 includes four types of transfer. One is named literal transfer learning, and it is the transfer learning we used in our work (Bozinovski and Fulgosi, 1976), and is reviewed in this paper. The second type is a transfer learning which uses direct intervention in the weights of a neural network. We call this direct memory access (DMA) type transfer of knowledge. It is an intervention in a neural network knowledge without a process of incremental learning. The weights change is named weights perturbation. An example of direct weight change described in (Pratt at al. 1991) is w = w+rw, where r is a random number between (-0.6, 0.6). Weights perturbation method was also used in the work (Agarwal et al. 1992) The third type uses problem decomposition into subproblems, represented by subnetworks, and training the subnetworks for the subproblems, and then insert the subproblem knowledge into the target network. The fourth type of transfer is indirect transfer, where the weight-based knowledge is extracted, then it is represented as a rule-based knowledge, then it is updated using rule based representation, and then it is inserted in a target neural network as weights-based knowledge. Reminder of the First Paper on. Informatica 44 (2020) 291-302 301 A review of transfer learning in the early 1990's is given by Pratt and Jennings (1996). A review by Pan and Yang (2010) covers the period after that. Tan et al. (2018) review the deep transfer learning. 10 Discussion and conclusion The contribution of this paper is a review of an early period of transfer learning research, a period which was not known to the current researchers in transfer learning. In current history part of transfer learning, as covered by Wikipedia >Transfer Leaning >History (2020) there is information which suggests that the beginning of transfer learning research is in 1993. This paper gives information on the transfer learning research during 1970's and early 1980's. In this discussion let us mention that the original 1976 paper was published in the Proceedings of the symposium Informatica 1976, which took place in Bled, Slovenia, one year before appearance of the first issue of the journal Informatica, in 1977. The paper was published in Croatian, not in English, which is the main reason why the paper was not known for a rather long time. In the review of the period 1990 - 2000 given in this paper, we can notice that the research during that period was focused on forms that transfer learning can take, and directions it can go. The fundamental concepts like a measure of transfer learning was not covered. The interest of fundamental notions was pointed out again in 2000s (Tan et al, 2018). That relates the research in 1970's to the contemporary research in transfer learning. Let us mention that the application of transfer learning with real datasets of images described here, such as Computer Terminals dataset containing 3x26 letters and the IBM29(40) containing 40 characters on a matrix 7x5 is an early use of datasets of characters in machine learning. An example of a character dataset used in contemporary research (e.g. Wang et al. 2019) contains 9 characters (digits 0 to 9) on a matrix 28x28, with variety of templates. In conclusion, this paper extends the knowledge in transfer learning with a relation between the pioneering work (in 1970's and early 1980's) and the current research on transfer learning, giving also a review of the period in early 1990's. Important part of that relation is the reminder of the theoretical 1976 paper, which presented the first mathematical and geometrical modeling, and a measure of transfer learning. The experimental work during 1976-1981 with datasets representing images of characters also relates to the contemporary research in machine learning. 11 Acknowledgement The author wishes to thank professor Gjorgi Cupona from Mathematical Institute with Numeric Center of the University of Skopje, who in 1968 encouraged the author to prepare a matural thesis from the Glushkov's book. He also wishes to thank professor Ante Santic from Electrical Engineering Department of University of Zagreb, who in 1972 allowed author's work on perceptrons. Professor Ante Fulgosi from Psychology department of University of Zagreb was coauthor of the first published paper on transfer learning in 1976. The author also wishes to thank professors Michael Arbib and Nico Spinelli who enabled him to work on the research related to this paper at the Computer and Information Sciences (COINS) Department of the University of Massachusetts at Amherst for the period 1980-81, and to thank professor Andrew Barto for the period 1995-96. The work at University of Massachusetts in both periods was supported by Fulbright grants. Part of the research reported here was supported by a Sigma Xi grant-in-aid in 1981. The author also wishes to thank the reviewers of this paper for their valuable comments. 12 References [1] A. Agarwal, R. Mammone, and D. Naik (1992) An on-line training algorithm to overcome catastrophic forgetting. In Intelligence Engineering Systems through Artificial Neural Networks. volume 2, pages 239-244. The American Society of Mechanical Engineers, AS~IE Press. [2] J. Baxter, R. Caruana, T. Mitchell, L. Pratt, D. Silver, S. Thrun (organizers) Learning to Learn: Knowledge Consolidation and Transfer in Inductive Systems, NIPS*95 Post-conference workshop, Vail, Colorado http://socrates.acadiau.ca/courses/comp/ dsilver/NIPS95ltl.nips95.workshop.pdf [3] S. Bozinovski (1972) Perceptrons: Training in pattern recognition. (original in Croatian: Perceptroni i obucavanje u prepoznavanju oblika) unpublished student scientific competition paper, University of Zagreb [4] S. Bozinovski (1974). Perceptrons and possibility of simulation of a teaching process (original in Croatian: Perceptroni i mogucnost simuliranja procesa obucavanja), unpublished M.Sc. thesis, Electrical Engineering Department, University of Zagreb [5] S. Bozinovski, A. Fulgosi (1976). The influence of pattern similarity and transfer of learning upon training of a base perceptron B2. (original in Croatian: Utjecaj slicnosti likova i transfera ucenja na obucavanje baznog perceptrona B2), Proc. Symp. Informatica 3-121-5, Bled. [6] S. Bozinovski, A. Santic, A. Fulgosi (1977). Normal teaching strategy in pair-association in the case teacher:human-learner:machine. (original in Croatian: Normalna strategija obicavanja u obucanju asocojacije parova u slucaju ucitelj:covjek-ucenik:masina), Proc. Conf. ETAN, 21:IV-341-346, Banja Luka, [available online]. [7] S. Bozinovski (1978). Experiments with non-biological systems teaching. (original in Macedonian: Eksperimenti na obucuvanje na nebioloski sistemi) Proc. Conf ETAN, 22:IV-371-379, Zadar [available online]. [8] S. Bozinovski (1981). Teaching space: A representation concept for adaptive pattern 302 Informatica 44 (2020) 291-302 S. Bozinovski classification. COINS Technical Report, University of Massachusetts at Amherst, No 81-28 [available online: UM-CS-1981-028.pdf] [9] S. Bozinovski (1985a). Adaptation and training: A viewpoint. Automatika 26 (3-4) 137-144 [10] S. Bozinovski (1985b). A representation theorem for linear pattern classifier training. IEEE Transactions on Systems, Man, and Cybernetics 15(1): 159-161 [11] S. Bozinovski (1995). Neuro-genetic agents and a structural theory of self-reinforcement learning systems. CMPSCI Technical Report 95-107, University of Massachusetts at Amherst [available online: UM-CS-1995-107.pdf]. [12] K. Fukushima (1975) Cognitron: A self organizing multilayered neural network. Biological Cybernetics 20: 121-136 https://doi.org/10.1007/BF00342633 [13] K. Fukushima (1980) Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36: 193-202, doi: 10.1007/BF00344251. [14] V. Glushkov (1967) Introduction to Cybernetics (original in Serbian: Uvod u Kibernetiku, translated from Russian, published by Zavod za izdavanje udzbenika Srbije) [15] I. Goodfellow, Y. Bengio, A. Courville (2016) Deep Learning, MIT Press, D0I:10.1007/s10710-017-9314-z [16] M. McCloskey, N. Cohen (1989) Catastrophic interference in connectionist networks: the sequential learning problem. The Psychology of Learning and Motivation, 24 D0I:10.1016/S0079-7421(08)60536-8 [17] M. Minsky, S. Papert (1969) Perceptrons. The MIT Press, 1969 [18] D. Naik, R. Mammone (1993) Learning by learning in neural networks, In R. Mammone (ed.) Artificial Neural Networks for Speech and Vision, Chapman and Hall, London. [19] S. Pan, Q. Yang (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345- 1359 DOI: 10.1109/TKDE.2009.191 [20] L. Pratt, J. Mostow, C. Kamm (1991). Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), p. 584-589, Anaheim, CA. [21] L. Pratt (1993). Discriminability-based transfer between neural networks. In NIPS Conference: Advances in Neural Information Processing Systems 5 Morgan Kaufmann Publishers. pp. 204211 [22] L. Pratt, B. Jennings (1996) A survey of transfer between connectionist networks, Connection Science 8(2) 163-184. https://doi.org/10.1080/095400996116866 [23] F. Rosenblatt (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65: 386-408. DOI: 10.1037/h0042519 [24] F. Rosenblatt (1962). Principles of Neurodynamics. Spartan Books. D0I:10.2307/1419730 [25] D. Rumelhart, J. McClelland, and the PDP Group (1986). Parallel Distributed Processing. MIT Press. [26] N. Sharkey and A. Sharkey (1992) Adaptive generalisation and the transfer of knowledge, Proceedings of the Second Irish Neural Networks Conference, Belfast. [27] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu (2018). A Survey on Deep Transfer Learning, arXiv:1808.01974v1 [cs.LG] 6 Aug 2018. D0I:10.1007/978-3-030-01424-7_27 [28] S. Thrun, T. Mitchell (1993) Lifelong robot learning, Technical Report IAI-TR-93-7, Institute for Informatics III, University of Bonn. https://doi.org/10.1016/0921-8890(95)00004-Y [29] H. Wang, C. Li, X. Zhen, W. Yang, B. Zhang (2019) Gaussian Transfer Convolutional Neural Networks, IEEE Transactions on Emerging Topics in Computational Intelligence 3 (5) 360-368. D0I:10.1109/TETCI.2018.2881225 [30] K. Weiss, T. Khoshgoftaar, D. Wang (2016) A survey of transfer learning. Journal of Big Data 3:9. DOI: 10.1186/s40537-016-0043-6 [31] Wikipedia > Transfer Learning (October 2020) https://en.wikipedia.org/wiki/Transfer_learning https://doi.org/10.31449/inf.v44i3.1907 Informatica 44 (2020) 303-310 303 Minimum Flows in Parametric Dynamic Networks - the Static Approach Grigora§ (Avesalon) Nicoleta Transilvania University of Bra§ov, 50091 Bra§ov, Iuliu Maniu, 50, Romania E-mail: nicole.grigoras@gmail.com Keywords: dynamic network, parametric network, minimum flow Received: October 15, 2018 The problems offlows in parametric networks extend the classical problems of optimal flow to some special kind of networks where capacities of certain arcs are not constants but depending on several parameters. Consequently, these problems consist of solving a range of ordinary (nonparametric) optimal flow problems for all the parameter values within certain sub-intervals of the parameter values. Although classical network flow models have been widely used as valuable tools for many applications [1], they fail to capture the essential property of the dynamic aspect of many real-life problems, such as traffic planning, production and distribution systems, communication systems, evacuation planning, etc. In all these cases, time is an essential component, either because the flows take time to pass from one location to another, or because the structure of the network changes over time. Accordingly, the dynamic flow models seem suited to catch and describe different real-life dynamic problems such as network-structure changing over time or timely decision-making, but, because of their complexity, these models have not been as thoroughly investigated as those of classical flows. This article presents and solves the problem of the minimum flows in a parametric dynamic network. The proposed approach consists in applying a parametric flow algorithm in the reduced expended network which is obtained by expanding the original dynamic network. A numerical example is also presented for a better understanding of the used approach. Povzetek: V članku je predstavljena metoda za rešitev problema najmanjšega pretoka v parametričnih dinamičnih mrežah. 1 Introduction The parametric maximum flow problem, as well as that of the related minimum flow one represent generalizations of ordinary problems for the maximum, respectively minimum flow in which the upper/lower bounds of some arcs depend on a single parameter, being monotonically increasing (or decreasing) functions of the parameter. For the parametric maximum flow problem with zero lower bounds and linear capacity functions of a single parameter, G.Ruhe proposed [16] an original 'piece-by-piece' approach while papers [11] and [13] solves the problems of parametric minimum, respectively maximum flows via an partitioning approach. Finally, the same partitioning approach is extended to a discrete time dynamic network in paper [3]. This class of problems is known as the flow in parametric networks problem. Beside the applications of the ordinary, nonparametric flow problems, the applications of those of flows in parametric networks may include: multiprocessor scheduling with release times and deadlines, integer programming problems, computing sub-graph density and network vulnerability and partitioning a data base between fast and slow memory, product selection, flow sharing, database record segmentation in large shared databases, optimizing field repair kits, etc.[15] Besides this, the static network flow models arise in a num- ber of combinatorial applications that on the surface might not appear to be maximum flow problems at all. The problem also arises directly in problems as far reaching as machine scheduling, the assignment of computer modules to computer processors, tanker scheduling etc. [1]. However, in some other applications, time is an essential ingredient, whether it involves problems like maximum or minimum flows in time-varying networks [2], [3], dynamic flows of minimum cost [12], general problems of temporally repeated flows [6] such as earliest arrival flow [17] or dynamic resource allocation problem for large-scale transportation network evacuation [7]. Starting from the complexities of the classic algorithms, an important series of other papers concern with the complexities of dynamic network algorithms. Cai et al. [4] proved that the complexity of finding a shortest (quickest) dynamic flow augmenting path, by exploring the forward and reverse arcs successively, is O(nmT2). For algorithms which explores the two sub-networks (the forward sub-network, consisting of the set of direct arcs and the reverse sub-network consisting of the set of reverse arcs simultaneously, Miller-Hooks and Patterson [8] also reported a complexity of O(n2T2). By using special node addition and selection procedures, Nasrabadi and Hashemi [9] succeeded to reduce significantly the number of node time pair that needs to be visited. The worst-case complexity of their algorithm is 304 Informatica 44 (2020) 303-310 G.A. Nicoleta O(nT(n + T)). Finally, Orlyn reported [10] an extremely good running time of O(nm) for the problem of maximum flow. In this case we need to use dynamic network flow models. Due to the powerful versatility of dynamic flow algorithms, it is not surprising that these algorithms are often more difficult to design than their static counterparts [10] but still they are also very challenging problems. The approaches for solving the minimum parametric flow over time problem via applying classical algorithms can be grouped in two main categories: i) by applying a non-parametric minimum dynamic flow algorithm [12] in dynamic residual networks generated by partitioning the interval of the parameter values [9], [14]. This first category approach was also used [3] for finding a maximum parametric flow in discrete-time dynamic networks; ii) by applying a static (classical sequential or parallel [5]) parametric flow algorithm for the maximum [13], [16] or for the minimum [11] flow in a (static network) reduced expended network, which is obtained by expanding the original dynamic network. In this paper, the case of the minimum flows in parametric dynamic networks is considered. The proposed approach consists in transformation of the problem of minimum flow in parametric dynamic network into that of the minimum flow in parametric static network. This problem generalizes the problems of flow in dynamic network and of flow in parametric static networks through the following assumptions: (1) the dynamic network and corresponding expanded static network are with lower bounds, (2) we address the minimum flow problem on a dynamic network with time varying transit times as well as time varying lower and upper bounds on arc i.e. the dynamic network is not stationary. The remainder of this paper is organized as follows. In Section 2 some dynamic network notations and terminology are presented. Then, in Section 3 we expose, the minimum flow in parametric static network, while in Section 4 the minimum flow in parametric dynamic network problem is presented. In Section 5 an example is given. In the presentation to follow, some familiarity with flow problems is assumed and many details are omitted. 2 The minimum flows in dynamic networks. Let G = (N, A, l, u) be a static network with the nod set N = {1,..., i,..., j, ...,n}, the arc set A = jai,..., ak,..., am}, ak = (i, j), the lower bound function l : A ^ R+ and the upper bound (capacity) function u : A ^ R+, where R is the real number set. To define the minimal static flow problem, we distinguish two special nodes in the static network G = (N, A, l, u): a source node 1 and a sink node n. Let N be the natural number set and let H = {0,1, ...T} be the set of periods, where T is a finite time horizon, T g N. Let h : A x H ^ N be the transit time function, lh : A x H ^ R+ the time lower bound function and uh : A x H ^ R+ the time upper bound function. For each arc (i, j) g A the h(i, j; t) represents the transit time of arc (i, j) at time t, t g H. A dynamic flow from source node 1 to sink node n is any flow from 1 to n in which not less than lh(i, j; t) and not more than uh(i, j; t) flow units starting from node i at time t and arriving at node j at time 0 = t + h(i, j; t) for all arcs (i,j) and all t. The minimal dynamic flow problem for T time periods is to determine a flow function : A x H ^ N, which should satisfy the following conditions in dynamic network Gh = (N, A, h, lh, uh): Ef=o(Ej fh(i,j;t) - Efc Et fh(k,i;t)) = vn, i = 1, (2.1.a) Ej fh(i,j; i) - Ek Et fh(k,i; T)) = o, i = 1 ,n,t e H, (2.1.b) eT=0(Ej fh(i,j;t) - Ek Et fh(k,i;T)) = vn, i = n, (2.1.c) 1h(i, j; t) < fh(i,j; t) < uh(i,j; t), V(i,j) e A and Vt e H (2.2) min vn, (2.3) where t = t - h(k, i; t), = J2t=o v(t), v(t) is the flow value at time t and fh(i, j; t) = 0, (i, j) g A, t g{T - h(i,j; t) + 1,..., T}. Obviously, the problem of finding a minimum dynamic flow is more complex than the problem of finding a minimum static flow. Fortunately, this complication can be solved by rephrasing the dynamic flow problem into a static flow problem in a static network G' = (N', A', l', u') called reduced expanded network. First, we form the expanded network GH = (NH, AH, lH, uH) with Nh = jit|i G N,t g H}, Ah = {(it, j)|(i,j) G A, t = 0,1,..., T - h(i, j; t)}, Ih(it, je) = lh(i,j; t), uh(it, j) = uh(i, j; t), (it, j) G Ah. We have |NH | =_ n(T + 1) and |AH | < m(T +1) - Xm h(i, j), where h(i,j) = min{h(i, j; 0),..., h(i, j; T)}. Clearly, any dynamic flow from the source node 1 to the sink node n in dynamic network Gh is equivalent to a static flow from the source nodes 10,11,..., 1T to the sink nodes n0,n1,...,nT in static network GH and vice versa [2]. We can further reduce the multiple source, multiple sink problem in network GH to the single source, single sink problem by introducing a supernode 1* and a supersink node n* building superexpanded network GH = (NH,AH,lH,uH), where N* = Nh U {1*,n*}, AJj = A* U{(1*, 1t)|t G H} U {(nt,n*)|t G H}, l*(it,je) = Ih(it,je), u*(it,je) = u*(it,je), (it, je) G A*, l* (1*, 1t) = l* (nt,n*) = 0, u*(1*, 1t) = u*(nt,n*) = to, t G H. Minimum Flows in Parametric Dynamic Networks. Informatica 44 (2020) 303-310 305 Next, we build the reduced expanded network G' = (NA', /',u') as follows. We define the function h*, h* : AH ^ N, h*(1*, 1t) = h*(nt,n*) =0, t G H, h* (it,jg) = h(i, j; t), (it, jg) G Ah. Let d* (1*,it) be the length of the shortest path from the source node 1* to the node it and d*(it, n*) the length of the shortest path from node it to the sink node n* with respect to h* in network gh . The computation of d*(1*,it) and d* (it,n*), it G NH is performed by means of the usual shortest path algorithms [1]. In network G' we rewrite the nodes 1*,n* by 1' respectively n'. We obtain N' = {1' ,n'} U {it|it G Nh,d*(1*,it) + d*(it,n*) < T}, A' = {(1', 1t)|1t G Nh,d*(1t,n*) < T}U{(nt,n')|nt G Nh,d*(1*,nt) < T} U {it,jg)|(it,jg) G Ah,d*(1*,it) + h*(it,jg) + d*(jg, n*) < T} and /', u' are restriction of /*, u*H at A'. It is easy to see that the network G' is always a partial sub-network of G*H. Since an item released from a node at a specific time does not return to that location at the same or an earlier time, the networks GH, G*H, G' cannot contain any circuit, and are therefore always acyclic. In the most general dynamic model, the parameter h(i) = 1 is the waiting time at node i, and the parameters /h(i, t),uh(i,t) are defined as lower bound and upper bound, which represents the minimum respectively the maximum amount of flow that can wait at node i from time t to t+1. This most general dynamic model is not discussed in this paper. The maximum dynamic flow problem for T time periods in dynamic network Gh formulated in conditions (2.1), (2.2), (2.3) is equivalent with the maximum static flow problem in static network G' as follows: Eie /' (it,je) - E /'(kr ,it) = {v', if it = 1', (2.4.a) 0, if it = 1', n', (2.4.b) —v', if it = n', (2.4.c) 1'(it, je) < /'(it,je) < u'(it, je), (it, je) e A', (2.5) min v', (2.6) where by convention it = 1' for t = —1 and it = n' for t = T +1. For further details we recommend the works [2], [3], [4], [6], [7], [17]. defined as to find all the breakpoints and their corresponding minimum flow and maximum cuts. The approach presented in this section is presented in [11], the first approach for minimum flow in parametric static network. A static network G = (N, A, /, u) with the lower bounds /(i, j) of some arcs (i, j) G A, functions of a real parameter A is referred to as a parametric static network and is denoted by G = (N, A,T, u). The upper bound function I : A x R+ ^ R+ is defined by the relation: f(i,j; A) = 1o(i, j) + A • L(i,j), A e [0, A] = I (3.1) where L : A ^ R is the parametric part of the upper bound function T and /0 : A ^ R+ is the non parametric part of the function Twith T(i, j;0) = /0(i, j), (i, j) G A. The L(i,j) and /0(i,j) must satisfy —/0(i, j)/A < L(i, j) < (u(i, j) — /o(i,j))/A and 0 < /o(i, j) < u(i,j), (i,j) G A. The minimum flow problem in parametric static network G = (N, A,T, u) is to compute all minimum flows for every possible value of A in I: Ej j; A) - £fc /(M; A) = v (A), if i=1 (3.2.a) 0, if i =l,n (3.2.b) -v (A), if i=n (3.2.c) 1(i, j ; A) < /(i,j; A) < u(i, j), (i, j) e A , (3.3) min v (A) (3.4) For the minimum flow problem in the parametric static network G = (N, A,/, u), the sub-intervals Ik = [Ak,Afc+i], k = 0,1,...,K of the parameter A values can be determined such as a maximum 1 — n cut in the nonparametric static network Gk = (N, A, /k,u), /k (i, j) = T(i, j; Ak), remains the maximum 1 — n cut for A G Ik .A parametric 1 — n cut in parametric static network G = (N, A, I, u) can be defined as a finite set of cuts [Sk, Tk], k = 0,1,..., K together with a partitioning of the interval I in disjoints subintervals Ik, k = 0,1,..., K, such that I = I0 U I1 U ... U IK. The [Sk,Tk] is denoted by [Sk; Ik] for each k, k = 0,1,..., K The capacity of [Sk; Ik] is defined as: 4Sfc; Ifc] = E(sk ,Tk) 1 (i,j; A) - E(Tk,Sk ) u(i,j), k = 0, l,..., K (3.5) 3 The minimum flow in parametric static networks A natural generalization of the minimum flow problem in static networks can be obtained by making the lower bounds of some arcs function of a single parameter. Since the minimum flow value function in a parametric network is a continuous piecewise linear function of the parameter, the parametric minimum flow problem can alternately be A parametric maximum 1 — n cut in network Gk is denoted by [S*; Ik], k = 0,1,..., K. For f in parametric network G = (N, A, I, u) the parametric residual capacity T (i, j; A), (i, j) G A is given by: r(i, j; A) = u(j,i) — f(j, i; A) + f(i, j; A) — / (i, j; A), A e Ik, k = 0,1,..., K (3.6) For a flow f in parametric static network G, we define the set T (i, j) = {A|T (i, j; A) > 0}, (i,j) G A. The 306 Informatica 44 (2020) 303-310 G.A. Nicoleta static network G = (N, A, f), with AL = {(i, j)|(i, j) e A, f(i, j) = is named the parametric residual static network. If (i, j) e A and (i, j) e Al, then f(i, j) = Let P be a directed path from the source node 1 to the sink node n in the parametric residual static network G. If P verifies the restriction: * (P ) = Dp * (i, j) = 4 (3.7) then P is named conditional decreasing directed path. The parametric residual capacity of a conditional decreasing directed path P is f(P; A) = min{f(i, j; A)|(i,j) e P, A e f(P)}. From paper [11] we have the theorem: Theorem 1 (11). A flow f is a minimum flow in parametric static network G if and only if the parametric residual static network G contains no conditional decreasing directed path P. If residual static network G contains no conditional decreasing path P, then the minimum flow in parametric network Gf is computed as: f(i,j; X) = l(i,j; X)+max{r(i,j; X)-u(j,i)+l(j,i; X), 0} (3.8) The first phase of finding a minimum flow in network Gf consists in establishing a feasible flow, if one exists, in non-parametric network G = (N, A, I, u) with I(i, j) = 10(i, j) for L(i, j) < 0 and /(i, j) = 1o(i, j) + A • L(i, j) for L(i, j) > 0. After a nonparametric feasible flow f (see [1]) we compute the parametric residual network G0 for this flow f . The parametric residual capacities in G0 can be written as r0(i,j; A) = ao(i,j) + A^o(i,j), where ao(i,j) = u(j,i) - f(j,i) + f(i,j) - 1o(i,j) and &(i,j) = L(i, j), A e 1o = [0,Ai]. The second phase of the algorithm starts with the parametric residual network G0, A0 = 0 and 10 = [0, A]. The algorithm for minimum flow in a parametric static network (the algorithm MFPSN) is presented in Figure 3.1. (01) Algorithm MFPSN; (02) BEGIN (03) compute a feasible flow f0 in network G0; (04) compute parametric residual network G0; (05) B:={0}; k :=0; Xk := 0; (06) REPEAT (07) SDDP( k, Xfc, B); (08) k:=k+1; (09) UNTIL( Xk = A); (10) END. Figure 3.1.a. The algorithm for the minimum flow in parametric static network (01 (02 (03 (04 (05 (06 (07 (08 (09 (10 (11 (12 (13 (14 (15 (16 (17 (18 (19 (20 (21 (22 (23 (24 (25 (26 (27 PROCEDURE SDDP( k, Xk, B); BEGIN compute the network G k; compute exact distance labels d(i) in Gk; p=( n+1,..., n+1); ak(P) := 0; f3k(P) := 0; Xfc+i := A;j:=n; WHILE d(n) 0,t g H,A g I}. In this paper, the proposed approach consists in applying the algorithm MFPSN presented in Section 3 in parametric static reduced expanded network G' = (N', A', l', u') which is constructed similar with construction of the network G' = (N', A', l', u') presented in Section 2. The algorithm for the minimum flow in parametric dynamic network (the algorithm MFPDN) is presented in Figure 4.1. (1) ALGORITHM MFPDN; (2) BEGIN (3) construct the network G'; (4) apply the algorithm MFPSN in network G'; (5) END. Figure 4.1. The algorithm for minimum flow in parametric dynamic network. Theorem 4 (Theorem of Correctness). The algorithm MFPDN computes correctly a minimum flow in parametric dynamic network Gh = (N, A, h, lh,uh) and A G I. Proof. This theorem results from the fact that the minimum flow in parametric dynamic network with lower bounds Gh = (N,A, h, lh,uh) is equivalent with the maximum flow in parametric static network G' = (N', A', l', u') and Theorem 2. □ 308 Informatica 44 (2020) 303-310 G.A. Nicoleta Theorem 5 (Theorem of Complexity). The algorithm MF-PDN runs in O(KT3n2m) time, where K+1 is the number for X values in the set B at the end of the algorithm. Proof. From Theorem 3 results that the algorithm MFPSN runs in O(K(n'H)2m'H) time. From Section 2 we obtain n' = O(nT) and m' = O(mT). Therefore we obtain that algorithm MFPDN runs in O(KT3n2m) time. □ In accordance with the remark in Theorem 3, we note that the algorithm MFPDN runs in O(KnmT2) time. 5 Example The support parametric dynamic network is presented in Figure 5.1.(a) and the time horizon is set to T = 3, therefore H={0,1, 2, 3}. The transit times h(i,j; t) and the dynamic upper bounds (capacities) uh(i,j; t), as well as the parametric dynamic lower bounds Th(i,j; t; X) = l0h (i,j; t) + X • Lh (i,j; t) the for all arcs in Gh are indicated in the two tables in Figure 5.1.b. The interval of parameter X values is set to [0,1], i.e., A=1. (i,j) h(i,j; t) Uh(i,j; t) (1, 2) 1,t = 0 2, t = 1, 2, 3 5 (1, 3) 1,t = 0,1 2, t = 2, 3 5 (2, 3) 1,t = 0,1, 2, 3 5 (2,4) 1,t = 0,1 2, t = 2, 3 5 (3,4) 2, t = 0,1 1,t = 2, 3 5 (i,j) l0h j;t) ¿h^ j;t) (1, 2) 3, t = 0 -2, t = 0 0,t = 1, 2, 3 0,t = 1, 2, 3 (1, 3) 1,t = 0,1 0,t = 2, 3 4, t = 0 1,t =1 0, t = 2, 3 (2, 3) 0,t = 0,1, 2, 3 3, t =1 0,t = 0, 2, 3 (2,4) 0,t = 0,1, 2, 3 0,t = 0,1, 2, 3 (3,4) 0,t = 0 2, t = 1, 2 0,t = 3 -2, t = 2 0,t = 0,1, 3 (b) Figure 5.1. The parametric dynamic network Gh. The support graph for parametric super-extended network GH is presented in Figure 5.2. Figure 5.2. The support graph for parametric superextended network G*H. The support graph for parametric reduced expanded network G' is showed in Figure 5.3. 'i k. ( 2' ( 3l 3,1 \ T 4- \4'J Figure 5.3. The support graph for parametric reduced expanded network GT '. The lower bounds T'(it,jo; X) = l'(it,je) + X • L'(it,jo) and the parametric upper bounds u'(it ,jo) for all arcs in G' are indicated in table from Figure 5.4. (it, je) l'(it, je) L' (it, je) u' (it, je) f(it,je) (1', 1o) 0 0 10 (1', 1i) 0 0 ro 2 (1o,2i) 3 -2 5 5 (1o,3i) 1 3 5 5 (1i,32) 1 1 5 2 (2i,32) 0 3 5 3 (2i,42) 0 0 5 2 (3i,4s) 2 0 5 5 (32,4s) 2 -2 5 5 (42, 4') 0 0 ro 2 (4s, 4') 0 0 ro 10 Figure 5.4. The V(it,jo; X) = l'(it,jo) + XL'(it, jo),u'(it, jo) and f'(it, jo) in network G'. In the first phase we determine in static residual network G': P[ = (1', 1o, 2i, 42, 4'), r'(P[) = 2 ; P2 = (1', lo, 3i, 43,_4'), r'(P2) = 5; P3 = (1', 1i, 32, 4a, 4'), r'(P3) = 2; P4 = (1', 1o, 2i, 32, 4a, 4'), r'(P'^) = 3. The feasible flow f' in network G' is presented in the table from Figure 5.4. Minimum Flows in Parametric Dynamic Networks. Informatica 44 (2020) 303-310 309 In the second phase by applying algorithm MF-PSN in network G' we obtain in the parametric static residual network G' the following directed path: P[ = (1', 1i, 32, 2i, 42,4'), P2 = (1', 1o, 2i, 42,4'), P3 = (1,1o, 3i, 43,4'), p = (1', 1o, 2i, 32,43,4') in G'0, G1, G2. The results of this example are synthetically indicated in the table from Figure 5.5. The graphic of v'(A) is presented in Figure 5.6. k Ak 5 5k (P5/) Ak+1 v'(A) 1 - A 1 0 0 1 + A 1 6 - A P' 3 1/4 1 + A 1/4 1 - A 1 1 1/4 p' 1 + A 1 5 + 3A P' 4 - 4A 1 1 + A 3/5 1 + A 1 2 3/5 p' 1 + A 1 2 + 8A P' 4 - 4A 1 P' 4 - 4A 1 Figure 5.5. Results of applying the algorithm in G'. Figure 5.6. The graphic of v'(A) References [1] Ahuja R., Magnanti T., Orlin J. (1993) Network Flows. Theory, algorithms and applications, Prentice Hall, Inc., Engleewood Clifss, New Jersey. [2] Aronson J.A. (1989) A survey of dynamic networks flows, Annals of Operations Research, 20 (5), pp.1-66.https://doi.org/10.1007/ bf02216922 [3] Avesalon N., Ciurea E., Parpalea M. (2017), Maximum parametric flow in discrete-time dynamic networks, Fundamenta Informaticae, 156 (2), pp.125-139.https://doi.org/10.1007/ bf02216922 [4] Cai X., Sha D., Wong C. (2007) Time-varying Network Optimization, Springer. [5] Ciurea E., Ciupala L. (2004) Sequential and parallel algorithms for minimum flow, Journal of Applied Mathematics and Computing, 15(1-2), pp.53-75.https://doi.org/10.1007/ bf02935746 [6] Ciurea E. (2002) Second best temporally repeated flows, The Korean Journal of Computational and Applied Mathematics, 9(1), pp.77-86.https://doi. org/10.1007/bf03012341 [7] He X., Zheng H., Peeta S. (2015) Model and solution algorithm for the dynamic resource allocation problem for large-scale transportation network evacuation, Transportation Research Part C.: Emerging Technologies, 59, pp.233-247.https://doi. org/10.1016/j.trc.2015.05.005 [8] Miller-Hooks E., Patterson S.S. (2004) On solving quickest time problems in time-dependent dynamic networks, Journal of Mathematical Modelling and Algorithms, 3, pp.39-71.https://doi.org/10. 1023/b:jmma.000002 6708.57419.6d [9] Nasrabadi E., Hashemi S.M. (2010) Minimum cost time-varying network flow problems, Optimization Methods and Software, 25 (3), pp.429-447.https: //doi.org/10.1080/10556780903239121 [10] Orlin J. (2013) Max Flows in O(nm) time, or better, Proceeding of the forty-fifth Annual ACM Symposium on Theory of Computing, ACM Press, New York, pp.765-774.https://doi.org/10. 1145/2488608.2488705 [11] Parpalea M., Ciurea E. (2016) Minimum parametric flow. A partitioning approach, British Journal of Applied Science and Technology, 13 (6), pp.1-8.https://doi.org/10.9734/bjast/ 2016/22636 [12] Parpalea M., Ciurea E. (2011) The quickest maximum dynamic flow of minimum cost, International Journal of Applied Mathematics and Informatics, 3 (5), pp.266-274. [13] Parpalea M., Ciurea E. (2013), Partitioning algorithm for the parametric maximum flow, Applied Mathematics, 4 (10A), pp.3-10.https://doi.org/ 10.42 3 6/am.2013.410a1002 310 Informatica 44 (2020) 303-310 G.A. Nicoleta [14] Parpalea M., Avesalon N., Ciurea E. (2018) Minimum parametric flow in time dependent dynamic networks, Revue d'Automatique, d'Informatique et de Recherche Opérationnelle - Theoretical Informatics and Applications (RAIRO: ITA), 52 (1), pp.43-53.https: //doi.org/10.1051/ita/2018002 [15] Rashidi H., Tsang E. (2015) Vehicle Scheduling in Port Automation: Advanced Algorithms for Minimum Cost Flow Problems, CRC Press, 2 edition, Boca Raton, London, New York.https://doi.org/10. 1201/b18984 [16] Ruhe G. (1991) Algorithmic Aspects of Flows in Networks, Kluwer Academic Publisher, Dordrecht, The Netherlands.https://doi.org/10 .1007/ 978-94-011-3444-6 [17] Zheng H., Chiu Y.C., Mirchandani P.B. (2015) On the System Optimum Dynamic Traffic Assignment and Earliest Arrival Flow Problems, Transportation Science, 49, pp.13-27.https://doi.org/10. 1287/trsc.2013.0485 https://doi.org/10.31449/inf.v44i3.1907 Informatica 44 (2020) 311-310 303 Investigating Algorithmic Stock Market Trading Using Ensemble Machine Learning Methods Ramzi Saifan, Khaled Sharif, Mohammad Abu-Ghazaleh, and Mohammad Abdel-Majeed Computer Engineering Department, School of Engineering, University of Jordan, Queen Rania Street, Amman, Jordan E-mail: r.saifan@ju.edu.jo, kldsrf@gmail.com, mohd.ag@live.com, m.abdel-majeed@ju.edu.jo Keywords: machine learning, stock price prediction, ensemble methods, gradient boosting, extremely randomized trees, random forest, stock market simulation, algorithmic trading, financial forecast, forecasting returns, risk analysis, volatility forecasting Received: July 21, 2019 Recent advances in the machine learning field have given rise to efficient ensemble methods that accurately forecast time-series. In this paper, we use the Quantopian algorithmic stock market trading simulator to assess ensemble methods performance in daily prediction and trading. The ensemble methods used are Extremely Randomized Trees, Random Forest, and Gradient Boosting. All methods are trained using multiple technical indicators and automatic stock selection is used. Simulation results show significant returns relative to the benchmark and large values of alpha are produced from all methods. These results strengthen the role of ensemble method based machine learning in automated stock market trading. Povzetek: Razvit je nov algoritem za napovedovanje delnic s pomocjo ansambla programov za strojno ucenje. 1 Introduction Predicting the stock market has been the ultimate goal of stock investors since its existence. Everyday billions of dollars are traded in stock markets around the world, and behind each dollar is an investor hoping to profit by correctly forecasting the rise or fall of the associated stock price. If an investor somehow predicts that a stock price will rise, he will buy a certain amount of that stock, wait for a specified period of time, and then sell those stocks at their increased price; this method of trading is referred to as longing. It is also possible for the investor to profit from the decrease of a stock through a different process called shorting; this is when the investor predicts that a stock will fall, borrows a certain amount of that stock and sells them, buys the same amount of stocks after their price has decreased, then returns the stocks he has borrowed to the lender. Longing and shorting stocks combined with an accurate way of stock market price forecasting makes it possible for an investor to profit from any change in the stock market. This creates a dire need for strong prediction methods. There are various ways for stock price prediction; they basically fall into two categories, either Fundamental Analysis (FA) or Technical Analysis (TA). Many experts use a combination of the two for finer predictions. For decades, investors have been using a human-based prediction method called fundamental analysis (FA); this technique involves acquiring all the relevant information that a person can collect about a certain stock in order to determine its "true value". It goes into the economics of the company itself, such as sales and profit data. External factors are also taken into consideration, such as politics, regulations, and industry trends [1]. Methods that aid an investor in FA include financial statements, asset ratios {1!, liquidity ratios {2!, debt ratios {3!, market value ratios, and portfolio management [2]. Based on the determined true value, the investor will decide what sort of position to take with the stock; if it is overpriced the investor will short the stock, or long the stock if it is underpriced, under the belief of the investor that the price will eventually fall or rise respectively to meet its true value). One of the limitations of FA is that it has been practiced for decades without any unifying theoretical framework [3]. Since it lacks a solid mathematical foundation, there is an emotional factor that may cause the investor to make the wrong decisions. The second method used is called technical analysis (TA). It is a method that does not take into account anything about the company, because the investor is interested only in short term movements in the stock price. It concentrates on the movement of stock prices; by examining past stock price movements, future stock price can be accurately predicted. Investors that use TA believe that all the information you need to know about a stock, and the stock prices future movement, is embedded in its historical data. Based on visual examination of the historical data such as price changes and volume of transactions, usually in graphical form and charts, trading advice can be provided [4]. The volume of a stock is the total number of shares that are traded in a security during a certain period of time, and a security with higher volume means that it is more active. TA can use the fluctuations in a stock's volume and price over a certain period of time to try to determine the future movement of the price. With new advances in technology, and the emergence of high speed computing, computer programs can 312 Informatica 44 (2020) 311-325 R. Saifan et al. automatically run complex TA methods on big amounts of historical data and automatically trade stocks based on the program's inferred predictions. This entire workflow is known in the financial industry as algorithmic trading (AT) [5]. AT has revolutionized the market and the way financial assets are traded after it became popular in the early 2000s. Investors wanted to make sure to use all the tools that can be offered from the increasing technological advancements, which will place them in a better position to address the changing market environment [6]. Figure 1 shows the trend of using AT through years 2003 to 2012. These estimates include even investors that do not directly deal with the AT program, but deal with a stock broker who eventually will use an AT program to place the order on the stocks required. The concept of automated prediction is known in the world of computer science as machine learning (ML), and is a term that relates to the construction of algorithms that can learn from and make predictions on data. The emergence of strong machine learning methods that can accurately identify stock market patterns and predict the future movement of a stock has led to a surge in research in AT based on ML methods. The increasing usage of AT makes perfect sense, and this can be accredited to multiple reasons: firstly, the use of AT completely eliminates any emotional and psychological factors that might affect any trade undertaken by an investor. Secondly, placing orders through an AT system occurs instantly with precision and accuracy. Thirdly, AT allows the investor to monitor huge amounts of stock market and financial data in real-time, without the risk of manual human errors. Lastly, because of the programmatic nature of AT systems, simulating an algorithm on large amounts of historical data provides 1 When compared to manual investing approaches, algorithmic trading is more likely to produce a similarly performing result given the same data, and this is because relatively accurate 1 indication of portfolio performance. A paper published by Hendershott et. al. (2007) studied the effect of AT on the New York Stock Exchange (NYSE). In it they concluded that AT most likely causes an improvement in market liquidity [7]. Another study on the foreign exchange market concluded through evidence that due to AT programs being highly correlated to each other, the use of AT had reduced volatility in the stock market [8]. The use of ML in AT has been met with resistance by economists due to three main reasons: firstly, the complexity of ML methods from the perspective of fields other than computer science; secondly, the random nature of a machine learning method and the inconsistency in its prediction results; thirdly, the insufficient amount of published academic work (in the area of stock market prediction) that include AT simulations showing the predictions being undertaken in live trading. However, the ML methods currently being investigated rarely perform well enough (i.e.: make enough accurate predictions to be considered profitable) for them to be used in real trading situations. Existing methods also suffer from low returns over long trading periods, making them less attractive to traders when compared to existing algorithms reliant on human predictions. The problem with currently published research that attempts to investigate AT that uses ML for prediction is either the results are undesirable, or that no simulation is included in the results, or both. This lack of research is, in the opinion of the authors, the main reason stopping the widespread use of machine learning prediction in stock market trading today. This paper will thoroughly investigate using efficient ML techniques to accurately it depends on a series of steps rather than an investor's intuition. Investigating Algorithmic Stock Market Trading. Informatica 44 (2020) 311-325 313 predict the future movement of a stock, taking into account the three reasons for resistance mentioned above. It is therefore the chief goal of this paper to encourage the economic world to undertake AT using new ML methods, by providing them with solid, consistent, and repeatable simulations. In this paper, we will focus on three new ML methods, namely Gradient Boosting, Random Forest, and Extremely Randomized Trees; we have chosen these methods because they have all been published recently in the ML world. Moreover, these methods have been tested before on time series prediction and have shown accurate prediction results even with noisy data (i.e.: data that fluctuates randomly) and very large datasets (i.e.: datasets that are too large for weaker ML methods to work on in sufficient time). To simulate these ML methods in AT, we will use Quantopian, a browser-based AT platform that can be used to write trading strategies in Python [9] and back-test them against 13 years of minute-level US stock price and fundamental data. In each simulation, the returns of the algorithm {4! are compared with a suitable benchmark, and performance is evaluated according to eight evaluation methods. Our simulation results will prove to the readers that using our suggested ML methods in AT will consistently provide better revenue than the benchmark. In the Literature Review section, we will review the state of the art literature and academic research that revolves around AT, and the application of ML methods into AT. In the Trading Strategy section, we will discuss how the ML model is created and trained, how stocks are automatically selected during the AT process, and briefly go over some simulator settings. In the Methodology section, we will go over the performance indicators that will be used to judge how well the ML methods perform relative to the performance of known financial benchmarks. Finally, in the Results section we will compare and comment on the simulation results of the ML methods when using Quantopian. 2 Literature review While academic journals are filled with projects discussing stock trading techniques [10] [11] [12], the world of algorithmic trading is relatively new, and therefore the application of machine learning to algorithmic trading is the new trend of academic research [13] [5]. The three machine learning techniques, interchangeably referred to as classifiers {5!, we will be using are the Gradient Boosting [14], Random Forests [15], and Extremely Randomized Trees algorithms [16]. The Gradient Boosting algorithm produces a prediction model that is in the form of an ensemble {6} of weak decision tree prediction models, also known as estimators [17]. The Random Forest and Gradient Boosting algorithms are much related, because both of the algorithms are techniques for regression and classification problems by constructing a multitude of decision trees [18]. The Random Forest algorithm is easier to tune than the Gradient Boosting algorithm, although the Gradient Boosting algorithm will, in general, outperform Random Forests with proper tuning. This is because the Gradient Boosting algorithm attempts to add new trees that complement the already built trees, and usually this produces better accuracy with fewer trees. The Extremely Randomized Trees algorithm is one step further than the Random Forests algorithm in the way it chooses to split each node in the decision tree during the construction of the decision tree and how the parameters for the node is computed. [19] Algorithmic trading (AT) is using the computational power at our disposal in the stock market. Computers programmed with a specific set of instructions, large amounts of data, and mathematical models that decide how to trade in a speed and frequency that humans are not capable of achieving, in order to generate more profit ruling out human errors and emotions. Multiple studies show the effect of algorithmic trading on the stock market. A study was done from 2001 to 2011 on the stock market and how AT affects it and it showed that it improved liquidity, efficiency, but also increased volatility [20]. However, a paper showed that results were not uniform across different stocks and there were different outcomes under different conditions [21]. Machine learning (ML) has been a hot topic between researchers for its use in a lot of fields. We are concerned with it being used in the stock market to assist investors in trading, by trying to predict the behavior of the stock market through computations of large amounts of historical stock market data. A considerable amount of effort was also put into using Neural Networks as a prediction technique. One of the first papers that attempted to apply that to the stock market was used to predict the index of the Tokyo Stock Market [22]. A much recent paper about Neural Networks used two kinds of neural networks, namely a feed forward Multilayer Perception (MLP) and an Elman recurrent network [23]. The paper concluded that MLP has more potential in predicting stock value changes than Elman recurrent network and linear regression, although a simple linear regression model was better than the other two when it comes to predicting the direction of stock price changes one day ahead [24]. The authors of [25] proposed a trading agent that is based on deep reinforcement learning, to autonomously make trading decisions and gain profits in the dynamic financial markets. [26] paper developed a machine learning framework for algorithmic trading with virtual bids in electricity markets. Also, a budget and risk constrained portfolio optimization problem was solved. Another paper proposed a model that combined the Support Vector Machine algorithm with other classification methods, in a way such that the weakness of a method will be balanced out by the strength of another (i.e., early attempts at ensemble methods in stock market prediction) [27]. Papers that used technical indicators for their machine learning methods typically computed the Exponential Moving Average (EMA) {7! and compared it to the stock markets, specifically using the Google and Yahoo stocks (NYSE: GOOG and NASDAQ: YHOO); in one particular paper, the authors suggested using other indicators as they believe that might provide more accurate results instead of just using the EMA [28]. 314 Informatica 44 (2020) 311-325 R. Saifan et al. Results from papers in the field have been both positive and negative towards the idea of using ML in AT. An example of a paper that was negative towards the idea used ML to facilitate automated stock portfolio optimization; the authors used the Dow Jones Industrial Average Index as a benchmark; they concluded that none of the techniques they used outperform the index, mainly because the index resulted in more returns at a lower risk than their proposed method [29]. An example of another paper that was positive towards the idea used a method that consisted of linear regression, generalized linear model, with the aid of the Support Vector Machine algorithm, to predict future stock market prices; results were desirable and they generated a higher profit than the selected benchmark [30]. Another positive paper proposed a stock price prediction system also based on the Support Vector Machine algorithm and was tested on the Taiwan stock market; the method performed better than conventional stock market prediction systems (in terms of accuracy) [31]. More advanced papers have used hybrid combinatorial methods of clustering {8! and classification. One of these papers first applies a clustering algorithm such as K-Nearest Neighbors and partitions the clustered values into number of parties, and then applies a horizontal partition based decision tree algorithm; the paper used the algorithm on data from the Shanghai Stock Exchange and their predicted results were very close to the actual values [32]. In this project, we will compare our efficient ensemble methods with the K-Nearest Neighbors and the Support Vector Machine algorithms as a way of comparing our methods to those used in previous literature. Our simulation results will show that our efficient ensemble methods outperform those used in previous literature in predictive accuracy. The use of Quantopian in academic research is rare; one of the few papers to use it begins with an explanation of the Efficient Market Hypothesis 2 and Self-Defeating Strategies 3, and uses these two ideas to reason why there aren't enough academic papers showing positive results predicting the market using machine learning; in the author's opinion, if a model succeeds and is distributed to the public, it will not be successful for too long. The author also used different methods of trading using Quantopian and machine learning, but showed that results were undesirable [33]. As we can see from the aforementioned literature, there have been many different techniques tried and tested in an attempt to predict the stock market and automate stock market trading. All methods used different algorithms, factors, and parameters that could be tuned to 2 In financial economics, the efficient-market hypothesis states that current stock prices fully reflect all available information. It is therefore, according to the hypothesis, impossible to find a pattern in stock price movement. 3 A self-defeating strategy is a term used for a strategy that will eventually stop working (or reduce in effectiveness) after it is applied to the stock market. deliver better results. In this paper, we will use what we consider to be the latest machine learning methods to try and produce positive results in prediction and simulation. 3 Trading strategy 3.1 Model creation In this section, we will explain the trading strategy that we will simulate. It is coded entirely in the Python language and it runs on the Quantopian simulator. As mentioned earlier, we will use three machine learning methods for our daily predictions. The classifiers used are the Gradient Boosting, Extremely Randomized Trees, and Random Forest classifiers, and they are all part of the open source scikit-learn library [34]. Creating the model is the first step in the algorithm, and the model creation is scheduled to happen at the beginning of every month throughout the simulation period. It is created by training the classifier 4 data based on the previous 1000 days (which we define as the history range) relative to the model creation date, and based on this data we generate features, namely the Average True Range (ATR) and the Bollinger Bands (BB). The ATR is a measure of the volatility (volatility is defined later in Section 4.2) for the stocks: it is calculated through a 14-day period by finding the moving average of the "true range". Simply put, if stocks are experiencing high volatility, then they would have higher ATR, and they will have lower ATR at lower volatility, and the difference between the maximum and minimum moving average is deemed the true range. The BB is another popular method to measure volatility: the prices of the stock along with a ten-day period moving average are banded by an upper band and a lower band, and the bands keep changing according to the market conditions. A wider band from the moving average means that the stock price is becoming more volatile, whereas tighter bands mean that the volatility is decreasing. If stock price moves closer to the upper band, this means that the stock is being overbought, and the stock is being oversold if the prices are moving closer to the lower band. The following list outlines the organization of the features and the predicted target, before being used to train the classifiers. There is a total of 89 features 5, and the 90th column contains the target to be predicted by the classifier. The value of the prediction target is a function that is detailed in Equation 1. The feature organization in the dataset used to train the classifiers is as follows: • Price Changes • ATR Upside Signal 4 In machine learning, creating a model by training a classifier means that we feed the classifier with historical data to 'train' on. The created model will decide which class to allocate the newly observed data based on previous data. 5 The selection of 89 features is arbitrary. The number 89 comes from six technical indicators, each of which has a 14-day period. The use of a 14-day period is also arbitrary. Investigating Algorithmic Stock Market Trading. Informatica 44 (2020) 311-325 315 Table 1: This table outlines the workflow for each of the two trading strategie. Output of One Classifier Outputs of Two Classifiers Action taken by the AT program Classifier predicts increasing price with strong probability. Both classifiers agree on an increasing price prediction. Begin longing the stock. If we are already shorting the stock (betting that it will decrease), stop trading the stock. Classifier predicts decreasing price with strong probability. Both classifiers agree on an increasing price prediction. Begin shorting the stock. If we are already longing the stock (betting that it will increase), stop trading the stock. Classifier either predicts no change with strong probability or predicts any outcome with weak probability. The classifiers either agree on no change in stock price or disagree on a prediction. Make no changes to our ongoing action with the stock. • ATR Downside Signal • Upper Bollinger Band • Middle Bollinger Band • Lower Bollinger Band PCT(p) Where p is the price change for tomorrow 3.2 Automatic stock selection The algorithm is also able to choose certain stocks automatically every month, and therefore fully automates the trading process and keeps our simulations free of survivorship bias. The selection is based on fundamental data 6, and it does so by filtering according to a stock's Price-to-Earnings Ratio (PER) and Market Capitalization (MC). The PER of a stock is measured by dividing the current share price over its earnings per share, and this is used as an indication of the value of the company. MC is calculated by multiplying the current market price of one share with the company's total number of shares, and this shows the total market value of the shares in a company. 3.3 Longing and shorting stocks The final stage of the trading strategy of our AT program is the longing and shorting of the selected stocks and the program is scheduled to long and short stocks daily (i.e.: every trading day in the NYSE during the selected time period). Our AT programs will use two different techniques to base our trading on: the first technique is using one classifier and the other is using two classifiers working simultaneously. If one classifier is used, the algorithm will long or short based on how sure the 6 The fundamental data of a stock is in the broadest terms any data, besides the trading patterns of the stock itself, The following equation is used to determine the target for prediction. predictor is of its prediction, and we specify that it should be more than a certain value (defined as the minimum probability) for the AT program to take the appropriate action of longing or shorting. When two classifiers are used, the AT program takes action when both predictions are the same. The actions taken by either of the classification methods is outlined in detail in Table 1 below. 3.4 Slippage and commission For all simulations in this project, we are using the default slippage and commission models that are being used on the Quantopian simulator. Slippage calculates and simulates the impact of our order on the market, and it is measured by assessing how large our order is in comparison with the current trading volume; this is used to check if an order is too big (given that a trader cannot trade more than the market' s volume at any given time); therefore, our algorithm will be limited to ordering up to 2.5% of the total available stocks, a percentage defined by the simulator to make the simulation results more realistic. The commission is set to $0.03 dollars per share, as is the default on the simulator. which can be expected to impact the price or perceived value of a stock. +lifpis > acertainpercentageofthepricechangethedaybefore,A ispositive Oifpiswithinacertainpercentageofthepricechangethedaybefore (Eq. 1) HfP > acertainpercentageofthepricechangethedaybefore,A isnegative 316 Informatica 44 (2020) 311-325 R. Saifan et al. 4 Methodology 4.1 Testing the chosen machine learning methods in predictive accuracy Before beginning to trade with the model predictions, it is better to first test the accuracy of the algorithms in predicting the future stock price movement. This would give us a better understanding of each algorithm's performance in prediction only, and lets us tune the algorithm's parameters to get better accuracy. Quantopian provides a research environment to experiment and try out trading strategies without running them through a simulator. We will assess the accuracy of each algorithm by using a confusion matrix (a table that counts the predictions that were classified and misclassified) and repeat each simulation multiple times to gain confidence in the results. Each algorithm has its own set of adjustable parameters, which we will try to fine-tune to attain the best accuracy from each algorithm. 4.2 Performance indicators After finding the best parameters for an accurate prediction of stock price movement, we can move our algorithms from the research environment to the simulation. The algorithms will be part of the larger workflow, which was discussed in detail in the previous section. We define a certain time-period (greater than two years) for the simulation to run through day-by-day, and a fixed starting capital of 1 million US dollars. At the end of each simulation, the algorithm's performance is assessed automatically through eight performance indicators that are usually used to assess and compare different trading strategies together; they are outlined in Table 2 below. We will provide the mathematical equations that were used in determining six of the eight performance indicators below. The remaining two indicators (i.e., cumulative returns and maximum draw-down) are considered to be straight forward and will not be explained due to lack of space. We have chosen to consider the market to be reasonably approximated by the Standard and Poor 500 index (NYSE: SPY), and the risk-free rate to be reasonably approximated by the US Treasury Index (NYSE: BIL). 4.2.1 Alpha and Beta The values for alpha and beta are found from an equation that is a part of the capital asset pricing model (CAPM), shown below. a = Rv-[Rf + (Rm-Rf)^} (Eq. 2) Rp is the realized return of portfolio (this is the portfolio that is being simulated). Rm is the market return (this can be approximated by a portfolio with only the SPY Standard & Poor 500 stock longed with initial capital). Rf is the risk-free rate (this can be approximated by a portfolio with only the BIL US Treasury Bill Index stock longed with initial capital). ¡3 is calculated as in Eq. 3. We find beta first using the Eq. 3, then we substitute it in Eq. 2 to get alpha: Cov(Rp,Rm) ß = Var(Rp) Performance indicator Brief description of the indicator Algorithm Returns Cumulative returns (as a percentage) of the algorithm relative to the starting capital at the beginning of the simulation Alpha The return on an investment that is not a result of general movement in the greater market. Beta The tendency of the algorithm's price movement to respond to swings in the market. A beta value of 0 means the algorithm is uncorrelated to the market, and in some sense is risk-free. Sharpe Ratio A measure for calculating risk-adjusted return; it is defined as the average return earned in excess of the risk-free rate per unit of volatility or total risk. Sortino Ratio A modification of the Sharpe ratio that differentiates harmful volatility from general volatility by taking into account the standard deviation of negative asset returns (downside deviation). A large Sortino ratio indicates that there is a low probability of a large loss. Information Ratio A ratio of portfolio returns above the returns of a benchmark (usually an index) to the volatility of those returns. The information ratio (IR) measures a portfolio manager's ability to generate excess returns relative to a benchmark, but also attempts to identify the consistency of the investor. Volatility An identification of price ranges and breakouts; the ratio uses a true price range to determine an algorithm's true trading range and is able to identify situations where the price has moved out of this true range. Maximum Drawdown The maximum draw-down experienced by the cumulative returns of the algorithm during a certain period of time defined by the simulator. Table 2: Description of the eight performance indicators the simulator produces. Investigating Algorithmic Stock Market Trading. Informatica 44 (2020) 311-325 317 Stock actually decreased in price Stock price actually stayed almost the same Stock actually increased in price Stock predicted to decrease in price 36% 7% 6% Stock price predicted to stay almost the same 12% 17% 15% Stock predicted to increase in price 1% 2% 4% Table 3: The confusion matrix from the best classifier (Gradient Boosting) with fine-tuned parameters (obtained through a grid-search) after predicting 970 stock price movements. (Eq. 3) Cov(X, Y) is the covariance between the two variables X and Y. Var(X) is the variance in the variable X. 4.2.2 Sharpe ratio Sharpe = Mean(R\ \p - Rf) StdDev(Rp - Rf) (Eq. 4) StdDev(x) is the standard deviation in x, and Mean(x) is the average value of x. 4.2.3 Sortino ratio Sortino = Mean(R\ \p - Rf) StdDev (Eq. 5) F(x, y) is a set of values that only contain the value of x -y when y was greater than x. 4.2.4 Information ratio Information = Mean(R\ \p - Rm) StdDev(Rp - Rm) (Eq. 6) 4.2.5 Volatility ratio Volatility = EMAn(T) ^T = Max(Ht-Lt,Ht-Ct-1,Ct (Eq. 7) Lt) (Eq. 8) T is coined the "true range" and is determined by Eq. 8. EMA n (T) is the exponentially moving average of T over a time period of n days. H is the highest price a stock reached during the day. L is the lowest price a stock reached during the day. C is the closing price of the day for a given stock. The subscripts of the variables in the true range definition indicate which day the variables are taken from. 4.3 Other considerations It is a difficult task to compare all trading strategies with only one performance indicator. During our experimentation we will find strategies that perform well through some indicators but poorly in others, so we will compare algorithms using all indicators and leave it to the investor to decide which algorithms are the most favorable. We will first show how changing the prediction algorithm affects the performance indicators, and then we will show how fine-tuning the different algorithm parameters affect the indicators too. In the conclusion of our simulations and analysis of all the different methods and trading strategies, we will try to find the strengths of each prediction algorithm when applied to a trading strategy and show their defining characteristics when that trading strategy is used in AT. 5 Results 5.1 Results from initial testing of predictive accuracy We precede the simulations with initial testing in the research environment provided by Quantopian. Using a gradient boosting classifier, we trained the classifier on a normalized SPY index during a certain period of time (between 2006 and 2010), and then used that trained classifier to predict three randomly selected stocks, as shown in Table 3. The classifier outputs one of the three classes of prediction: the stock price one day from today will either increase by a certain percentage, decrease by that percentage, or stay within that percentage (which we considered to be negligible movement). Table 3 contains a confusion matrix7, and it counts the number of predictions and their outcomes, as a way of assessing the performance of the classifier. The result is the average of ten simulations. The matrix in Table 3 yields an accuracy of 57%, and the accuracy is calculated by summing the diagonal of the matrix. Following that experiment, we can compare the accuracy of each classifier when used to predict each of the stocks separately. Because there are three classes to predict, we can consider a random guess to be a uniform 7 A confusion matrix is a commonly used tool in classification tasks to assess the accuracy of a classifier. T 318 Informatica 44 (2020) 311-325 R. Saifan et al. Apple Inc. (NYSE: AAPL) JPMorgan Chase (NYSE: JPM) Microsoft Corp. (NYSE: MSFT) K-Nearest Neighbors Classification 41.9% 42.6% 48.0% Support Vector Machine 39.4% 41.2% 42.4% Random Forest Classification 50.5% 56.6% 51.1% Extremely Randomized Trees Classification 50.1% 56.0% 50.5% Gradient Boosting Classification 52.2% 57.0% 53.1% Table 4: This table provides a useful side by side comparison of the predictive performance each of the three classifiers with fine-tuned parameters, along with two classifiers from previous literature. distribution between the three classes (i.e.: 33%). This is a good reference to be used when comparing the classifiers, because any classifier that has a predictive accuracy below random guessing is not considered useful. Observing the values in Table 4, all classifiers achieve significantly higher accuracy than both the reference and classifiers from previous work (refer to Section 2); this leads us to believe the market is not random 8 and we can try to use these predictions in trading. In Table 4, all classifiers are trained on the Standard and Poor 500 index (NYSE: SPY) and then used to predict the stock indicated in each column. In most cases, the Gradient Boosting classifier reaches the highest accuracy amongst the three classifiers, reaching almost two times the accuracy of the reference. 5.2 Simulations using pre-selected stocks Following the results presented in Table 4, we have confidence in the prediction ability of our ensemble methods, and we can move these methods onto the trading simulation. The details of the inner workings of the trading algorithm are in the previous section (i.e.: Trading Strategy). As described before, there are two versions of the algorithm, each differing by either using one or two classifiers, and by either using preselected stocks or automatic stock selection. We will present the cumulative returns of each version of the algorithm and discuss them briefly. All methods are compared to the Standard and Poor 500 index (NYSE: SPY) as a benchmark for assessing performance. The time period throughout which the classifiers are simulated is selected based on simulation complexity, and we kept all periods to a minimum of two years, usually starting no earlier than 2010, and all algorithms traded daily. In Figure 2, we will show the cumulative returns of each of the three classifiers when using preselected stocks and the one classifier method.9 The preselected stocks are a random selection of 36 stocks that were constituents of the Standard and Poor 500 index (NYSE: SPY) during the year of 2010, which is the starting year for all of the simulations in this project. It can be deduced from Figure 2 that the Extremely Randomized Trees classifier and the Random Forest Classifier strongly outperform the benchmark, with the Random Forest Classifier having higher cumulative returns towards the end of the period. The Gradient Boosting Classifier under-performs compared to the other classifiers if we compare them using cumulative returns. Gradient Boosting outperforms the other two when it comes to stability and volatility of the simulation. We will now discuss a problem that may arise when using only one classifier in a trading strategy for stock price prediction. The uncertainty a classifier has in its own prediction, as per the discussed trading strategy, can sometimes lead the AT program to not act upon the prediction. A more complex approach that solves this problem is to use two similar classifiers and only act upon their agreement; we will call this method hereafter the Two Classifier method, and the method that uses only one classifier and a probability threshold will be called hereafter the One Classifier method. The result of this alternative approach, namely the Two Classifier method, is shown in Figure 3 below with preselected stocks only. The reader can see from Figure 3 that all classifiers easily outperform the benchmark, with the ETC and the RFC classifiers having the best overall cumulative returns. It is worth noting that, compared to the one classifier method the two classifier methods usually have less volatility but also less cumulative returns as well. 8 We refer to this because there is a popular hypothesis in financial literature named the "Efficient Market Hypothesis", and it is an investment theory that states it is impossible to find a pattern in the stock market because stock market efficiency causes existing stock prices to always incorporate and reflect all relevant information. 9 The Gradient Boosting Classifier uses fewer estimators than its counterparts, and this is due to the greater complexity of the algorithm compared to the other algorithms. In general, it takes significantly less processing time to train a Random Forest Classifier (RFC) or Extremely Randomized Trees (ETC) Classifier than a Gradient Boosting Classifier (GBC) with an equal number of estimators for all three classifiers. Investigating Algorithmic Stock Market Trading. Informatica 44 (2020) 311-325 319 c/a C s> '3 3 u 500% 450% 400% 350% 300% 250°% 200% 150% 100% 50% 0% -50% a ETC with 1200 estimators each GBC with 300 estimators each RFC with 1200 estimators each ■Benchmark jLT AAAjS - << 3 O 0 0 20 0 0 40 1 00 0 50 1 0 60 J 2 00 1 2 00 1 0 1 0 0 2 1 0 00 60 40 80 60 20 00 0 0J 1 1 1 1 0 1 1 1 1 1 0 0 2 2 3 3 4 9 I-» I-» Date Figure 3 : The graph shows the cumulative returns of each of the three algorithms when working with 36 preselected stocks and using the agreement of two classifiers to predict the trading stocks. 1000% 800% 600% 400% 200% 0% -200% a - ETC with 1000 estimators -♦—RFC with 1000 estimators GBC with 300 estimators •Benchmark Date Figure 2: The graph shows the cumulative returns of each of the three algorithms when working with 36 preselected stocks and using one classifier to predict the trading stocks. 5.3 Simulations using automatically selected stocks The main problem with using preselected stocks is that we are prone to survivorship bias; this means that our selection of stocks manually uses our information of the future, relative to the time of the simulation. Our proposed solution to survivorship bias is to let our AT program automatically select stocks every month using basic fundamental analysis. The selection scheme for the 100 stocks we used was a simple one based on the stock's Price-to-Earnings Ratio (PER) and Market Capitalization (MC), and all stocks are selected from the NYSE and 320 Informatica 44 (2020) 311-325 R. Saifan et al. zn C 13 S O 2000% 1800% 1600%% 1400% 1200% 1000% 800% 600% 400% 200% 0% -200% GBC with 100 estimators —♦—RFC with 100 estimators ERT with 100 estimators 0 0 0 0 0 1 1 0 1 0 << 3 20 40 00 50 60 J 2 00 1 2 00 1 0 1 0 1 0 00 60 40 80 60 2 00 0 0J 1 1 1 1 0 1 1 1 0J 1 0 0 2 2 1 4 9 I-» I-» 3 3 Date Figure 4: The graph shows the cumulative returns of each of the three algorithms when working with 100 automatically selected stocks (selected at the star t of each month) and using one classifier to predict the trading stock. c/a C st e e > 1 s o 2500% 2000% 1500% 1000% 500% 0% -500% a ETC with 1200 estimators each —♦— GBC with 300 estimators each 0< 3 0 0 80 0 20 0 0 40 1 00 0 50 1 0 60 00 J 0 2 1 2 90 1 0 1 0 0 2 1 6 3 20 00 60 40 80 60 20 00 n 1 1 1 1 0 1 1 1 1 1 0 0 0 2 2 4 9 i-» i-» 3 3 Date Figure 5: Cumulative returns of each of the three algorithms when working with 100 automatically selected stocks (selected at the start of each month) and using the agreement of two classifiers to predict the trading stock. NASDAQ exchanges. Our selection used stocks the first 100 stocks that were above $100 million in MC, had a PER less than 10, and were sorted by their MC value in descending order. Our choice of filter values for the automatic stock selection was based on trial and error through multiple simulations. At the beginning of each month, the available stocks are filtered and reselected, ensuring that the algorithm has the best selection of stocks to analyze and trade. The two graphs that follow, Figures 4 and 5, will show the cumulative returns over time when using the one classifier and two classifier methods respectively with automatic selection of trading stocks. In Figure 4, we can see that all three algorithms seem to be prone to a heavy decline between 2011 and 2012, and we will see that this decline is less prominent in the Investigating Algorithmic Stock Market Trading. Informatica 44 (2020) 311-325 321 2500% £ 2000%% G U 1500%% 9; j-) { } Heapify^(Array, i^int 3) { if (2 « i == j) { } else if (Array[2 * i] > Array[2 * i + 1]) { } elseP{ Settings panel Reset |<< | | ■ | [ ► | (m • "P Algomaster - □ X AVL tree Create new Vertex as Child; ELSE IF (Target VertL has Exactly One child'.id Figure 2: Algomaster platform Figure 3: An environment of AlgoCreator application Figure 1: VizAlgo platform According to experiences gained with the VizAlgo, the second of the platforms, Algomaster [18], also has the plugin-based architecture, but it was intended to provide some more advanced features. The features include functionality for algorithm stepping in both directions [13], call stack visualization for recursive algorithms and a special mode for practical student testing in a visual way. In contrast to the VizAlgo, the Algomaster is based on .NET framework development and execution platform [9]. Later on, the platform was extended significantly [2] in order to provide the support for visualization of complex algorithms with the ability of changing input data during the visualization. Examples of visualizations using the new features are operations on B-tree, 2-3 tree or AVL tree (Figure 2). In addition to the ability to define input data dynamically, extensions were also oriented towards a real-time student testing and support for simplified development of plugins for the platform. In order to simplify the creation of Algomaster plugins, a separate application named AlgoCreator (Figure 3) was developed. The application uses a pattern for generating plugins of particular algorithm class, e.g. pattern for comparison-based sorting algorithms. A pattern consists of text tem- plates for source code generation and an interpreter for interpreting user defined model. In short, a process of creating a plugin module can be described as follows: a user can select one of available patterns, provide basic algorithm-related information and the algorithm pseudocode, define the behavior of the algorithm and initiate library generation. The process is described in deeper detail in [2]. 4 Experiments As it was yet mentioned within introductory part of the paper, the main motivation behind the accomplished experiments was the comparison of the ability of students to solve algorithmic problems in two distinct ways. One of them was based on visual "simulation" of given algorithm operation, using one of our visualization tools, described in section 3. The another one consisted of programming the particular algorithm in given programming language. Experiments considered in this work were conducted with students of four study groups (G1-G4) and they were focused on two basic areas. The first area was oriented on traversing trees using different strategies (T) and the second one on simple comparison-based sorting algorithms (S). Thus assignments of the particular area consisted of two 330 Informatica 44 (2020) 327-334 S. Simonâk ■S Algomaster - □ X Binary Tree: Postorder reset \ Task: Traverse the tree using the Postorder algorithm: — - Experiment Results TP-G2 0000 10 1-0 1 00000 1 1 1 0 1 1 TV-G2 1 0.67 0.09 1 1 0.21 0 1 0.38 0.4 1 0.14 1 1 1 0.6 1 1 1 0.79 0.67 SP-G2 10 1110 110 1 10010111011 SV-G2 1 1 0.33 10 11111 0.81 0 0.05 0.62 1 1 1 1 1 1 1 Table 2: Results achieved by the students of the study group G2 Figure 4: Algomaster in check mode - traversing binary tree parts: solving the problem in pure visual way using the Algomaster platform (V) in one case (Figure 4) and programming the particular algorithm in C programming language (P) in the another one. This way we got the four combinations (two areas and two ways of solving a problem from the given area - TP, TV, SP and SV) for each of four study groups (G1-G4). In the area of tree traversing, three basic traversing strategies were used (in-order, pre-order and post-order). In the area of sorting, simple sorting algorithms (like Insert sort and Bubble sort) were used. Within the following four tables (Table 1, Table 2, Table 3, and Table 4), individual scores are presented, achieved by students of particular study groups (G1-G4) in all experiments (TP, TV, SP, SV). Experiment Results TP-Gi 000 1100 100 00000 1 0 1 0 1 0 TV-Gi 0.06 1 1 1 1 1 1 0.9 0 0 1 1 1 0.2 0.07 1 1 1 0 1 0 SP-Gi 0 111110000 00000 1-00 10 SV-Gi 1 1 1 1 1 0.06 0.15 1 1 0.77 1 1 1 0.57 0.05 1 - 1 0 1 0.18 Table 1: Results achieved by the students of the study group Gi Experiments described within this section were conducted in Fall 2017. 82 students were considered on experiments in total, of which 72 were males and 10 were females. Since all the activities were not necessarily conducted on a single class, not all students were necessarily present on all activities. Such situation can be distinguished in particular table by the presence of "-" character within the Results column. This fact can be perceived as a slight disadvantage, but it is generally hard to influence the presence of students on classes. And since it was registered only in few individual cases from all considered students, we believe it was not affecting the results significantly. Experiment Results TP-G3 00000 11111 -11111-010 TV-G3 0.46 0 1 0.75 1 0.14 1 1 0.13 1 0.2 1 0.1 1 1 1 1 1 1 1 SP-G3 0 110010110 0011111010 SV-G3 1111111 0.93 1 1 1 1 1 1 1 1 0.04 0.29 1 1 Table 3: Results achieved by the students of the study group G3 5 Analysis of the results of experiments Within this section we provide a sketch of approach for calculating some of the resulting values, summarize the obtained results and formulate some comments on them. For calculating the average scores (mean) of the first group (G1) of students in particular experiments (TP, TV, SP, SV), the following formulas (1 - 4) were used. The average score (G1 AvTP) achieved by the study group (G1) in the experiment TP is given by the formula (1). Within the formula, G1TStp represents the total score achieved by the group G1 in the TP experiment and G1NSTP the number of students participating in the experiment. The mean values (G1Avtv, G1AvSP, G1AvSV) for remaining experiments (TV, SP, SV) of the study group G1 were calculated analogically. Experiment Results TP-G4 1000000 101 0 110010100 TV-G4 1 0.63 1 0 0.4 1 0 1 0.33 1 10 111 0.86 1111 SP-G4 1 00000 1 1 - 1 0 1110 110 10 SV-G4 1 0.7 0.16 11111-1 1111111111 Table 4: Results achieved by the students of the study group G4 Increasing the Engagement Level in Algorithms and. Informatica 44 (2020) 327-334 331 GiAVtp GiAVtv G\T St p G i NStp G1TStv 14.23 — = 0.286, 21 ' G\NStv 21 0.678, G1TSSP 7 GiAVsp = —-= — = 0.35, 1 G1NSSP 20 ' giavsv G1TSSV 14.78 20 0.739. (1) (2) (3) (4) G-iNSsv Similarly, the mean values were calculated for remaining groups (G2 - G4), based on the data presented in tables Table 2, Table 3, and Table 4. Variance and standard deviation values for all experiments were calculated as well and the overall results are available in the table Table 5. Exper. Group Mean Variance Std. deviat. TP 0.286 0.204 0.452 TV G1 0.678 0.201 0.448 SP 0.35 0.228 0.477 SV 0.739 0.153 0.391 TP 0.4 0.24 0.490 TV G2 0.712 0.125 0.353 SP 0.667 0.222 0.471 SV 0.8 0.128 0.358 TP 0.611 0.238 0.487 TV G3 0.739 0.147 0.383 SP 0.55 0.248 0.497 SV 0.913 0.064 0.253 TP 0.35 0.228 0.477 TV G4 0.761 0.141 0.376 SP 0.526 0.249 0.499 SV 0.94 0.038 0.196 When we further average the results obtained in particular experiments, better results in visual tasks become clearly visible (Figure 6). These results practically support our informal experiences and the hypothesis expressed within the introductory part of the paper. Table 5: Statistical results of experiments As we can observe from the graph of average scores (Figure 5) achieved by students in particular activities, the scores achieved in visual tasks are usually significantly higher than the scores achieved in corresponding programming tasks. The only differences are TP/TV relation for the group G3 and SP/SV relation for the group G2. Also in these cases the scores achieved by solving problems in visual way are higher, but maybe not so significantly. Figure 5: Graph of average scores (study groups) Figure 6: Graph of average scores (experiments) The results can be also interpreted in a way, that algorithm visualizations provide the solid potential we would like to build upon and examine the new ways of utilizing them in the filed of algorithms and data structures education. 6 Proposal of the new approach and study supporting system In order to cope with the situation and stimulate further students' algorithmic and programming skills, while taking advantage of algorithm visualizations, we propose a new teaching approach supported by the prototype of new study supporting system. As it was mentioned before, the teaching approach is based on the idea that students would participate on creating simple visualizations, and this way interact with algorithm visualization on a higher level of the engagement taxonomy. The role of the proposed system is to provide the environment, that allows students to control the pre-arranged visualizations from their code by using simple programming constructs. The approach, together with the system are intended to be used in conjunction with other teaching methods, not to replace them. The prototype of the system with a working name DSAV (Figure 7) combines algorithm visualizations with programming tasks and so increases the engagement level and supports active learning. As a proof of concept, we implemented the support for several (Bubble sort, Selection sort, Insertion sort, Quick sort, Heap sort, Merge sort) sorting algorithms [15] and algorithms for traversing binary trees (Figure 8) using various strategies (Inorder, Preorder, Postorder and Levelorder). We would like to enhance the system in the future, and perspective areas for such enhancements would be visualizations of operations on lists, trees, or graphs. 332 Informatica 44 (2020) 327-334 S. Simonâk Figure 7: The working prototype of DSAV system Technically, the system essentially consists of two parts: the main part, managing the user interface and visualization, and a separate thread implementing the algorithms to be visualized. There is a simple API consisting of several supporting operations which can be used appropriately by a programmer implementing a particular algorithm. The basic operation available is (RedrawAndWait(int millis)) telling the system to update the visualization according to current values in a data structure shared by both parts and wait for a specified amount of time. - CSortSetColorInt(int begidx, int endidx) - CSortClrSetColorRW(int index, int millis) As some of operations tend to be used often together, we also provide special calls for performing combined operations (e.g. CSortClrSetColorRW(index, millis) combines CSortClrSetColor(index) for rendering specified element in different color and RedrawAndWait(millis)). The reason for introducing such combined calls is to leave the code of particular algorithm closer to its original form. Figure 9 provides an example of using one of the operations within a simple sorting algorithm. Figure 8: Visualization of traversing binary tree In case of sorting algorithms we provide several simple API calls for rendering some elements of sorted array in different color. They can be useful in cases we want to put special emphasis on particular element (elements) of the sorted sequence (Figure 7). Some of them are given in the following list: Figure 9: A simple sorting algorithm implementation within the DSAV system Analogically, there is a set of simple supporting API calls for visualization of tree traversing algorithms. Some of them are provided in the following list: - Btree3AGetDepth(int root) - Btree3AGetLevel(int root, int d) - BT3GetListVisited() - Btree3ASetVisited(int root) An example of implementation of simple traversing algorithm is given in Figure 10. Debugging outputs for particular algorithm can be printed using console output, if needed. The DSAV is a Win32 application, written in C/C++ programming language, since students mainly use this language in exercises within our subject presently. CSortClearColorArr() 7 Conclusion - CSortSetColor(int index) - CSortClrSetColor(int index) Within the paper we described our experiments based on solving problems from given areas by students in two different ways. The first way was purely visual, accomplished Increasing the Engagement Level in Algorithms and. Informatica 44 (2020) 327-334 333 void inorder(int root){ if(left;ioot; := 0}inorder(leftjoot; ) ; 3tree3A5etV±s±ted[root); RedrawiL^dWait(300); if(Eight;root; !=0)inoider(right|root; ) > Figure 10: A simple traversing algorithm implementation within the DSAV system by using the Algomaster platform and the second way was based on programming a particular algorithm using C programming language. The results acquired are presented and analyzed. We found that the scores achieved in visual tasks are usually significantly higher than the scores achieved in corresponding programming tasks. This correlates with our previous informal experiences and supports the validity of the hypothesis expressed within the introductory part of the paper. The solution is proposed based on the idea of involving students into the process of creating algorithm visualizations. By the proposed solution we would like to help students not only to understand the basic principles of the particular algorithm in a convenient visual way, but also to stimulate their ability to implement it in particular programming language. Based on our experiences, confirmed by the results of accomplished experiments we believe, we should develop both of the skills in order to better prepare our students for their future professional career. It would be interesting to further develop the proposed approach and the supporting system and study the contributions of the approach. Except the additional sorting algorithms, perspective areas for further extension would include visualizations of lists, trees, graphs or hash tables. We believe, that if system is enhanced properly and utilized in a right way, it would contribute to the quality of education in the subject. However, the further research is required in order to evaluate the benefits and efficiency of the proposed solution. References [1] Bacfkovi M., Poruban J.: Ergonomic vs. Domain Usability of User Interfaces, HSI 2013: 6th International Conference on Human System Interaction, June 6. - 8. 2013, Sopot, Poland, Piscataway, IEEE, 2013, pp. 1-8. https://doi.org/10.1109/ hsi.2013.6577817 [2] Benej M., SimoMk S.: Algomaster platform extension for improved usability, Journal of Electrical and Electronics Engineering, vol. 10, no. 1, 2017, pp. 2730. [3] Boyle E.A., Connolly T.M., Hainey T.: The role of psychology in understanding the impact of computer games, Entertainment Computing, vol. 2, no. 2, 2011, pp. 69-74. https://doi.org/10.1016/ j.entcom.2010.12.002 [4] Boyle E.A., Hainey T., Connolly T.M., Gray G., Earp J., Ott M., et al. An update to the systematic literature review of empirical evidence of the impacts and outcomes of computer games and serious games, Computers & Education, 94, 2016, pp. 178-192. https://doi.org/10.1016/j. compedu.2 015.11.0 03 [5] Dicheva D., Hodge A.: Active Learning through Game Play in a Data Structures Course, Proceedings of the 49th ACM Technical Symposium on Computer Science Education (SIGCSE '18), ACM, New York, NY, USA, 2018, pp. 834-839. https: //doi.org/10.1145/31594 50.3159605 [6] Grissom S., McNally M.F., Naps T.: Algorithm visualization in CS education: comparing levels of student engagement, Proceedings of the 2003 ACM symposium on Software visualization (SoftVis '03), ACM, New York, USA, 87-94. https://doi. org/10.1145/774833.774846 [7] Hundhausen C. D., Douglas S. A. and Stasko J. T.: A meta-study of algorithm visualization effectiveness, Journal of Visual Languages and Computing, 13, 2002, pp. 259-290. https://doi.org/10. 1006/jvlc.2002.0237 [8] Karavirta V., Shaffer C. A.: Creating Engaging Online Learning Material with the JSAV JavaScript Algorithm Visualization Library, IEEE Transactions on Learning Technologies, vol. 9, no. 2, pp. 171-183, April-June2016. https://doi.org/10.110 9/ tlt.2015.2490673 [9] Microsoft .NET, https://dotnet.microsoft.com/ [10] Naps T. L., RoBling G, et al.: Exploring the role of visualization and engagement in computer science education, Working group reports from ITiCSE on Innovation and technology in computer science education (ITiCSE-WGR '02), ACM, New York, NY, USA, 131-152. https://doi.org/10.1145/ 960568.782998 [11] Petri G., von Wangenheim C. G.: How games for computing education are evaluated? A systematic literature review, Computers & Education, vol. 107, April 2017, pp. 68-90. https://doi.org/10. 1016/j.compedu.2017.01.004 [12] Pietrikovi E., Chodarev S.: Towards Programmer Knowledge Profile Generation, Acta Elec-trotechnica et Informatica, vol. 16, no. 1, 2016, 334 Informatica 44 (2020) 327-334 S. Simonâk pp. 15-19. https://doi.org/10.15546/ aeei-2016-0003 [13] RôBling G.: A First Set of Design Patterns for Algorithm Animation, Electronic Notes in Theoretical Computer Science, Volume 224, 2009, pp. 6776. https://doi.org/10.1016/j.entcs. 2008.12.050 [14] RôBling G., Mihaylov M., Saltmarsh J.: Ani-malSense: Combining Automated Exercise Evaluations with Algorithm Animations, Proceedings of the 16th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education, ITiCSE 2011, Darmstadt, Germany, June 2729, 2011, pp. 298-302. https://doi.org/10. 1145/1999747.1999831 [15] Silvási F., Tomásek M.: Lean Formaliza-tion of Insertion Sort Stability and Correctness, Acta Electrotechnica et Informatica, vol. 18, no. 2, 2018, pp. 42-49. https: //doi.org/10.15546/aeei-2018-0015 [16] Simonák S.: Algorithm Visualization Using the VizAlgo Platform, Acta Electrotechnica et Informatica, vol. 13, no. 2, 2013, pp. 54-64. http://aei.tuke.sk/papers/2 013Z2/ 0 8_%C5%A0imo%C5%8 8%C3%A1k.pdf [17] Simonák S.: Using algorithm visualizations in computer science education, Central European Journal of Computer Science, vol. 4, no. 3, 2014, pp. 183-190. https://doi.org/10. 2478/s13537-014-0215-4 [18] Simonák S., Benej M.: Visualizing Algorithms and Data Structures Using the Algomaster Platform, Journal of Information, Control and Management Systems, vol. 12, no. 2, 2014, pp. 189-201. [19] Simonák S.: Algorithm visualizations as a way of increasing the quality in computer science education, SAMI 2016, Danvers, IEEE, 2016, pp. 153-157. https://doi.org/10.110 9/sami. 2016.7422999 [20] Urquiza-Fuentes J., Velázquez-Iturbide J. Á.: Pedagogical Effectiveness of Engagement Levels - A Survey of Successful Experiences, Electronic Notes in Theoretical Computer Science, Volume 224, 2009, pp. 169-178. https://doi.org/10.1016/j. entcs.2008.12.061 https://doi.org/10.31449/inf.v44i3.1907 Informatica 44 (2020) 335-310 303 Similarity Measure of Multiple Sets and its Application to Pattern Recognition Shijina V, Adithya Unni and Sunil Jacob John National Institute of Technology Calicut, Kerala, India- 673601 E-mail: shiji.chan@gmail.com, adithyaunni1998@gmail.com and sunil@nitc.ac.in http://nitc.ac.in/index.php/?url=users/view/178/11/3 Keywords: fuzzy sets, similarity measure, multiple sets, pattern recognition Received: July 17, 2019 Multiple set is a newborn member of the family of generalized sets, which can model uncertainty together with multiplicity. It has the power to handle numerous uncertain features of objects in a multiple way. Multiple set theory has the edge over the well established fuzzy set theory by its capability to handle uncertainty and multiplicity simultaneously. Similarity measure of fuzzy sets is well addressed in literature and has found prominent applications in various domains. As multiple set is an efficient generalization of fuzzy set, the concept and theory of similarity measure can be extended to multiple set theory and can be developed probable applications in various real-life problems. This paper introduces the concept of similarity measure of multiple sets and proposes two different similarity measures of multiple sets and investigates their properties. Finally, this work substantiates application of the concept of similarity measure of multiple sets to pattern recognition. A numerical illustration demonstrates the effectiveness of the proposed technique to this application. Povzetek: V clanku je predstavljena teorija podobnosti multipnih množic z namenom uporabe prepoznavanja vzorcev. 1 Introduction Various mathematical models are available in the literature to represent the concepts like uncertainty, vagueness and inexactness. Such models includes fuzzy sets, L-fuzzy sets [1], multisets [2], rough sets[3], intuitionistic fuzzy sets[4], fuzzy multisets [5], vague sets [6], multi fuzzy sets [7], etc. Each of these models has advanced into an elaborated theory and has numerous practical applications. [3] A fuzzy set is characterized by a membership function which assigns a grade of membership to each object in the universal set. Even though, the concept of fuzzy set is strong enough to handle uncertain data successfully, it can manage only one uncertain feature of the object at a time. Also, fuzzy set fails to handle the multiplicity of objects. Later, The notion of fuzzy multiset was defined as an extension of a fuzzy set. Fuzzy multiset gives fuzzy membership values for identical copies of each object. The main advantage of fuzzy multiset over fuzzy set is that it can handle the multiplicity of objects. However, it can handle only one feature of the object at a time. On the other hand, multi fuzzy set is also an extension of fuzzy set, and gives fuzzy membership values for different features of objects. The main advantage of multi fuzzy set over fuzzy set is that it can simultaneously manage numerous uncertain characteristics of objects, but fails to handle the multiplicity of objects. Recently, multiple set is introduced to model uncertainty together with multiplicity. The advantage of multiple set lies in the fact that it simultaneously accumulates numerous uncertain features of objects together with its multiplicity, in a better way. It was put forward by Shijina et al.[8, 9] as a generalization of fuzzy set, multiset, fuzzy multiset and multi fuzzy set. Later, Shijina et al.[10,11] defined more operations, viz. aggregation operators and matrix norms on multiple sets. Then, the concept of relation on multiple sets is introduced and applied this concept in medical diagnosis problem [12]. As a continuation, this work is aspired as an attempt to extend the concept of similarity measure to multiple sets. Measuring the similarity between objects plays a crucial role in many real life problems involving image processing, image retrieval, image compression, pattern recognition, clustering, information retrieval problems, etc. Many measures of similarity have been proposed and researched in literature and it has been shown that similarity measure is proficient in coping with uncertain information. For example, the theory of fuzzy sets, introduced by Zadeh[13], is a successful approach in confronting uncertainty. Fuzzy set has enormous power to describe the objective world that we live in and the strength of fuzzy set has transpired in several real life applications. Zadeh himself initiated the idea of similarity measure of fuzzy sets [14]. Later, similarity measure of fuzzy sets has been explored widely by many researchers[15,16,17,18,19, 20, 21, 22, 23] and have applied them to real life problems involving pattern recognition[24], image processing[25,26,27,28,29,30], etc. As an extension of fuzzy set theory, intuitionis-tic fuzzy set theory has found to be highly useful in dealing with imprecision and uncertainty. Many different similarity measures between intuitionistic fuzzy 336 Informatica 44 (2020) 335-347 V. Shijina et al. sets have been proposed and are extensively applied to many areas such as decision making [31,32], pattern recognition[33,34,35,36,37,38, 39], etc. As a combined concept of intuitionistic fuzzy set and interval valued fuzzy set, Atnassov[40] introduced interval valued intuitionistic fuzzy sets. It greatly furnishes the additional capability to deal with vague information and model non-statistical uncertainty by providing both membership interval and non-membership intervals. Similarity measure of interval valued intuitionistic fuzzy sets was also proposed and it has found applications in pattern recognition and multi-criteria decision making[41]. Type-2 fuzzy sets, which is an extension of fuzzy sets was also proposed by Zadeh[42]. Their membership values are fuzzy sets on the interval [0,1]. Type-2 fuzzy sets can improve certain kinds of inference better than fuzzy sets with increasing imprecision, uncertainty and fuzziness in information. Hung and Yang [43] presented a similarity measure of type-2 fuzzy sets based on the fuzzy Hausdor distance. There were further studies of similarity measures on Type-2 fuzzy sets[44, 45, 46] and have found applications in clustering[47,48,49], pattern recognition [50], students' evaluation[51], etc. Hesitant fuzzy set was first introduced by Torra[52] and Torra and Narukawa[53]. It permits the membership degree of an element to a set comprising of several possible values between 0 and 1. Hesitant fuzzy sets are very useful in dealing with situations where people are hesitant in providing their preference over objects in a decision making process. Therefore hesitant fuzzy set has played a significant role in the uncertain system and received much attention from researchers. Similarity measures of hesitant fuzzy sets[54] have been proposed, but it has not yet gained wide acceptance. Analogously, several similarity measures between sets have been proposed and have found many real life applications. But, here we will restrict our attention to the theory of similarity measures of fuzzy sets and its various applications, so that it can be explored to define the similarity measure of multiple sets. Before presenting the theory of similarity measure of fuzzy sets, it is desirable to have a short discussion on its application in day-to-day life. So, in the following, the potential of similarity measure of fuzzy sets in real life applications is reviewed. Weken et al.[25] gave an overview of similarity measures of fuzzy sets which can be applied to images. These similarity measures are all pixel-based and fail to produce satisfactory results consistently. To overcome this drawback,Weken et al.[26] extended their work to propose similarity measures based on neighbourhoods so that the relevant structures of the images are observed better. In his survey paper on similarity measures of fuzzy sets, Weken et al.[27] established measures for image comparison. The same authors presented an overview of the possible application of similarity measures of fuzzy sets to colour images in[28]. Nachtegael et al. [30] presented a color image retrieval system using a specific similarity measure of fuzzy sets. Li et al.[55] presented a faster algorithm on similarity measure using cen- ter of gravity of fuzzy sets in content-based image retrieval. The discussion in [55] nearly covers all the similarity measures of fuzzy sets, which may be greatly helpful to both the development and application of fuzzy set theory for content based image retrieval. Chen et al.[29] proposed a novel algorithm viz., normalized fuzzy similarity measure to deal with the nonlinear distortion in finger print images. Chaira and Ray [24] presented a region extraction algorithm to identify a color region similar to the query image from an image database containing images with different types of colors. Here, the matching process is based on similarity measure of fuzzy sets between the query image and the images in the database. Capitaine[56] proposed a general framework of designing similarity measures based on residual implication functions. They presented some new families of parametric similarity measures using parametric residual implications and modeled an algorithm to learn the parameter of each similarity measure based on relevance degrees. El-Sayedand Aboelwafa[57] introduced a new approach for face recognition based on similarity measure of fuzzy sets. Xu et al. [58] proposed a new similarity measure of fuzzy sets based on the extension of the Dice and cosine similarity measures and then applied the variation coefficient similarity to the emergency group decision-making problems. Also, they gave a practical example to evaluate the emergency management capability of major snow disaster in Hunan province of China. Baccour[59] applied similarity measures of fuzzy sets reported in existing literature to classification of shapes, mosaic recognition and Arabic sentence recognition. As discussed above, similarity measure of fuzzy sets have found widespread application in various fields such as image processing, pattern recognition, decision making, etc. Multiple set, which is an extension of fuzzy set, is capable of handling uncertainty and multiplicity simultaneously. Motivated by the benefits of similarity measure of fuzzy sets, this work intends to extend similarity measure to multiple sets. This paper proposes two different types of similarity measures- one is based on similarity measure of fuzzy sets; other one is based on the similarity measure of fuzzy sets and fuzzy aggregation operators. We strongly believe that similarity measure of multiple set can handle uncertain information in a better way. It must, therefore, have a better scope of real life applications. To substantiate our claim, we have applied the concept of similarity measure of multiple sets to pattern recognition, which is the first of its kind. The rest of the paper is organized as follows. In section 2, we briefly review some standard facts on multiple sets and the similarity measures of fuzzy sets. In section 3, we derive two interesting formulas for similarity measure on multiple sets and establish some of their properties. In section 4, we indicate how these techniques may be used to pattern recognition problems. In section 5, we end the paper by encapsulating the main conclusions. Similarity Measure of Multiple Sets. Informatica 44 (2020) 335-347 337 2 Preliminaries In this section, we first give some basic concepts related to multiple sets. Then, we proceed with a brief exposition of similarity measures of fuzzy sets. Throughout this paper, the following notations are used. R+ = [0, to); X is the universe of discourse; | X | is the cardinality of X; capital letters A, B, C, etc. are fuzzy sets on X and also represents corresponding membership functions; A(x) is the fuzzy membership value of the element x in X; ^ is the fuzzy set with all membership values equal to 0; I is the fuzzy set with all membership values equal to 1; M is the fuzzy set with all membership values equal to 0.5; A is the complement of fuzzy set A; FS(X) is the class of all fuzzy sets of X; P(X) is the class of all crisp subsets of X. Let M = Mnx k ([0,1]) denotes the set of all matrices of order n x k with entries from [0,1] and for e G [0,1], [e]nxk denotes the matrix in M with all its entries equal to e. Definition 2.1. Let M = [Mij], N = [Nij] G M. Then, 1. M < N —^ Mij < Nij for every i = 1, 2, • • • n and j = 1, 2, • • • k. 2. M > N —^ Mij > Nij for every i = 1, 2, • • • n and j = 1, 2, • • • k. 3. M = N —^ Mij = Nij for every i = 1, 2, • • • n and j = 1, 2, • • • k. 4. Join of M and N, denoted by M V N, is a matrix in M defined by (M V N)ij = Mij V Nij for every i = 1, 2, • • • n and j = 1, 2, • • • k. 5. Meet of M and N, denoted by M A N, is a matrix in M defined by (M A N)ij = Mij A Nij for every i = 1, 2, • • • n and j = 1, 2, • • • k. From this definition it can be noted that, (M, < , [0]„xfc, [1]nxfc) is a bounded lattice. 2.1 Multiple sets Multiple set is a unified structure to represent numerous uncertain features of objects simultaneously, in a multiple way. Multiple set utilizes distinct fuzzy membership functions to delineate each uncertain features of the object and assigns various values to each membership function according to the multiplicity. This is symbolized by assigning a matrix to each object, where each row in the matrix indicates distinct fuzzy membership function corresponding to each feature of the object. Further, entries in a row points out different values of the corresponding membership function according to its multiplicity. Multiple set can be defined as follows: Definition 2.2. Let X be a non-empty crisp set called the universal set and A1, A2, • • • An be n distinct fuzzy sets of X. For each i = 1,2, ••• n, Al(x),A2(x), ••• Ak (x) are membership values of the fuzzy set Ai for k identical copies of the element x G X, in descending order. Then, multiple set A of order (n, k) over X is an object of the form A = {(x, A(x)) : x G X} where for each x G X its membership value is an n x k matrix in M given by A(x) Al(x) A?(x) Al(x) A|(x) An(x) An(x) Ak (x)' Ak (x) Ai(x). The matrix A(x) is called the membership matrix of the element x. Note that, fuzzy sets Ai, A2, • • • An evaluates n distinct properties of objects and are called underlying fuzzy sets of the multiple set A. Further, each underlying fuzzy set Aj corresponds to k fuzzy sets Aj = {(x, Aj (x)) : x G X}, for j = 1,2, ••• k. Clearly, for every i = 1, 2, ••• n, Ai D A2 D • • • Ak. The universal multiple set X is a multiple set of order (n, k) over X for which the membership matrix for each x G X is [ljnxfc. The empty multiple set $ is a multiple set of order (n, k) over X for which the membership matrix for each x g X is [0]nxk. The set of all multiple sets of order (n, k) over X is denoted by MS(n,k)(X). It is perceived that a multiple set A of order (n, k) over X can be viewed as a function A : X ^ M, which maps each x G X to its n x k membership matrix A(x) in M. As an example, multiple set can be used to represent the evaluation of a set of students under the characteristics of intelligence, extra curricular activities, communication skill and personality by three experts. Example 2.3. Suppose X = {x1,x2,x3} is the universal set of students under consideration and there is a panel consisting of three experts evaluating the students under the criteria of intelligence, extra curricular activities, communication skill and personality. Then the performance of the students can be represented by a multiple set of order (4, 3) as follows: A = {(xi, A(xi)), (x2, A(x2)), (x3, A(x3))} where A (xj) for i = 1,2,3 are 4 x 3 matrices given as follows; A(x1) = 0.7 0.6 0.5 0.6 0.5 0.4 0.7 0.5 0.3 0.9 0.9 0.8 A(x2) = 0.8 0.6 0.7 0.9 0.6 0.5 0.5 0.8 0.6 0.4 0.4 0.7 A(x3) = 0.8 0.7 0.7 0.8 0.7 0.6 0.4 0.8 0.5 0.4 0.4 0.7 338 Informatica 44 (2020) 335-347 V. Shijina et al. Here, first, second, third and fourth row of the membership matrix indicates the fuzzy membership function corresponding to the features intelligence, extra curricular activities, communication skill and personality, respectively. Corresponding to each feature, three entries in the row are the values given by the three experts, written in descending order. For example, for the student x1 the membership values corresponding to intelligence are 0.7, 0.6 and 0.5, corresponding to extra curricular activities are 0.6, 0.5 and 0.4 and so on. Next, we discuss the standard operations on multiple sets. Let A and B be two multiple sets in MS(„,fc) (X). Definition 2.4. A is a subset of B, denoted as A C B , if and only if A(x) < B(x) for every x G X. Definition 2.5. A is equal to B , denoted as A = B , if and only if A C B and B C A that is, if and only if A(x) = B(x) for every x G X. Definition 2.6. The union of A and B is a multiple set in MS(„,fc) (X), denoted as AuB, whose membership matrix is (AuB)(x) = A(x) VB(x) for every x G X. Definition 2.7. The intersection of A and B is a multiple set in MS(„,fc) (X), denoted as A n B, whose membership matrix is (A n B)(x) = A(x) A B(x) for every x G X. Definition 2.8. The complement of A is a multiple set in MS(„,fc) (X), denoted as A, whose membership matrix for each x g X is an n x k matrix, A(x) = [Aj (x)] where Aj (x) = 1 - A j = 1, 2, ..., k. (k-j + 1) (x) for every i = 1, 2, ...,n and 2.2 Similariy measure of fuzzy sets Being an important topic in the theory of fuzzy sets, similarity measure of fuzzy sets has been investigated extensively by many researchers from different point of view. But, there does not exist a unique definition of similarity measure of fuzzy sets. There do exist many special purpose definitions which have been employed with success in cluster analysis, pattern recognition, image processing, classification, diagnostics and many other fields. Recently, several similarity measures are proposed and used for various purposes. For example, Zwick et al.[15] reviewed 19 measures of similarity and compared their performance in a behavioral experiment. Xuecheng[16] systematically gave an axiom definition of similarity measure of fuzzy sets as: Definition 2.9. A real function S : FS(X) x FS(X) ^ R+ is called a similarity measure, if S has the following properties: 1. S(A, B) = S(B, A) for all A, B G FS(X). 2. S(D, D) = 0 for all D G P(X). 3. S(C, C) = max S(A, B) for all C G FS(X). A,B€FS(X) 4. For all A, B, C G FS(X), if A C B C C, then S(A, B) > S(A, C) and S(b, C) > S(A, C). On account of this definition, Xuecheng proposed a similarity measure on the basis of a measurable function with respect to borel field B1: Let X = [0,1] and F = {A G FS(X); A(x) is a measurable function with respect to borel field B1} Then, for p > 1 Sp(A, B) = 1 ^ | A(x) - B(x) |p d^j (2.1) for all A, B g F, is a similarity measure on F. Pappis and Karacapilidis[17] presented three similarity measures as follows: (1) Measure based on the operations of union and intersection: Y, min{A(x), B(x)} S (A, B) = ^-... , w n (2.2) max{A(x), B(x)} i£X (2) Measure based on the maximum difference: S(A, B) = 1 - max{| A(x) - B(x) |} (2.3) xEX (3) Measure based on the difference and the sum of grades of membership: E | A(x) - B(x) | S(AB) = 1 - XEX (A(x) + B(»)) (24> xex The authors summarized that similarity measures (2.2) and (2.4) satisfies the following properties: xcvbnm,. (p1) S(A, B) = S(B, A). (p2) A = B ^ S(A, B) = 1. (p3) A n B = ^ ^ S(A, B) = 0. (p4) S(A, Al) = 1 ^ A = M. (p5) S(A, Al) = 0 ^ A = I or A = The similarity measure (2.3) satisfies properties (p1), (p2) and (p4) and (p3') A n B = ^ ^ S(A, B) = 1 - max {A(x), B(x)}. xEX (p5') S(A, Al) = 0 ^ A and A are normal fuzzy sets. Hyung et al.[18] proposed a similarity measure of fuzzy sets using maximum and minimum operators: S(A, B) = maxmin{A(x), B(x)} (2.5) xEX and showed that it satisfies the properties (p1) and Similarity Measure of Multiple Sets. Informatica 44 (2020) 335-347 339 (p6) The similarity degree is bounded: 0 < S(A,B) < 1. (p7) If A and B are normalized and A = B then S(A,B) = 1. (p8) A n B = ^ ^ S(A, B) = 0. (p9) If A and B are crisp sets, then S(A, B) =0 if A n B = ^ and S(A, B) = 1 if A n B = Chen et al. [20] extended the work of Pappis to further investigate measures of similarity of fuzzy values. They proposed 3 similarity measures: (1) Measure based on geometric distance model: S(A,B) = 1 - E |A(x) - B(x) xex X (2) Measure based on the set theoretic approach: S(A, B) = sup (A n B)(x) (3) Measure based on the matching function [60]: S(A, B) = — (2.6) (2.7) E A(x)B(x) xEX max^ A(x)2, E B(x)2 ,xEX xEX (2.8) S (A, B) = min{ inf I (A(x), B(x)), xEX inf I(B(x),A(x))} xEX (2.9) where I is any fuzzy implication operator. Wang [21] proposed two new similarity measures of fuzzy sets: S (A,B) S(A,B ) = E xEX min{A(x),B(x)} max{A(x),B(x)} X | E (1 — |A(x) - B(x)|) xEX X | (2.10) (2.11) of [17] and [18]. Razaei et al.[22] developed a new similarity measure of fuzzy sets based on their relative sigma count:. S(A, B ) E min {A(x), B(x)} xEX max < E A(x), E B(x) xEX xEX (2.12) They summarized that similarity measure (2.6) satisfies the properties (p1), (p2), (p4) and (p5) and fails to satisfy (p3), similarity measure (2.7) satisfies the properties (p1) and (p3) and fails to satisfy (p2), (p4) and (p5) and similarity measure (2.8) satisfies the properties (p1) to (p5). Later, Wang et al. [19] made a comparitive study of similarity measures. They commended on the study of similarity measures introduced by Pappis [17]. Also, they introduced a new class of similarity measures extracted from the work of Bandler and Kohout on fuzzy power sets[61], as: They examined that similarity measures (2.10) and (2.11) satisfies the Definition 2.9. They also made a comparison between similarity measures put forward by them with that where A = ^ or B = ^ and also define S(^, = 1. They probed that this similarity measure satisfies the Definition 2.9 and also satisfies the properties (p1) to (p5). 3 Similarity measure of multiple sets In this section, we first introduce the axiom definition of similarity measure of multiple sets. Let £(n,k)(X) be the subset of MS(n,k)(X), which is the collection of all multiple sets over X whose membership matrices are either [0]nxfc or [1] nx k. Definition 3.1. A real function S : MS(n k)(X) x MS(n k)(X) ^ R+ is called a similarity measure of multiple sets, if S satisfies the following axioms; 1. S(A, B) = S(B, A) for all A, Bg MS(n,k)(X). 2. S(D, D)=0 forall Dg £(n,k)(X). 3. S (C, C) = max S (A, B) forall C g A,BEMS(n,fc)(X) MS(n,k)(X). 4. For all A, B, C g MS(nk)(X), if ACBCC, then S(A, B) > S(A, C) and S(B, C) > S(A, C) In the following, we propose two similarity measures between multiple sets, one is based on the similarity measure of fuzzy sets; other is based on similarity measure of fuzzy sets and a fuzzy aggregation operator. Let S be any similarity measure of fuzzy sets satisfying the Definition 2.9. For multiple sets A and B in MS(n k)(X), denote n S(A, B)= £ =m2axk S(Aj, Bj) (3.1) Theorem 3.2. S(A, B) is a similarity measure between the multiple sets A and B in X. Proof. Axioms (1) and (2) are obvious, respectively, from axioms (1) and (2) of Definition 2.9 for fuzzy similarity measure S. Axiom(3): Let C be any multiple set in MS(n k)(X). Clearly, we have S(C, C) < max S(A, B) (3.2) A,BeMS(„,fc)(X) Now, for any multiple sets A, B g MS(n k)(X), from axiom (3) of Definition 2.9 for fuzzy similarity measure S, 340 Informatica 44 (2020) 335-347 V. Shijina et al. we have S(Cj, Cj) > S(Aj, Bj) for every j = 1, 2,..., k and i = 1, 2,..., n. Therefore, max S(Cj ,Cj) > max S(Aj, Bj) j = 1,2,...,k ® ® j=1,2,...,k ® ® for every i = 1,2,..., n, which implies V max S(Cj ,Cj) >V max S(Aj, Bj) i=1 i=1 j=1 , 2 ,..., k So, we have S(C, C) > S(A, B) for all A, B g MS(n,k)(X). Therefore, S(C, C) > max S(A, B) (3.3) A,B£MS(n,fc)(X) Combining inequalities (3.2) and (3.3), it follows that S (C, C )= max S (A, B) A,BeMS(„,fc)(X) Axiom(4): Suppose A, B and C are multiple sets in MS(n,k) (X) such that ACBCC. Then Aj C Bj C Cj for every j = 1,2, ...,k and i = 1,2, ...,n. Then, from axiom (4) of Definition 2.9 for fuzzy similarity measure S, we have S(Aj, Bj) > S(Aj, Cj) for every j = 1, 2,..., k and i = 1, 2,..., n. Therefore, V max S(Aj, Bj) >V max S(Aj, Cj) j=1,2 k ^ ® ^ i=i 2 k ^ ® lJ j=1,2, 0.9 0.9 0.8 0.9 0.5 0.5 A(x1) = 0.5 0.5 0.5 A(x2) = 0.6 0.5 0.3 0.3 0.2 0.1 0.5 0.4 0.3 0.8 0.7 0.7" 0.9 0.5 0.5" A(x3) = 0.6 0.5 0.3 B(x1) = 0.8 0.5 0.5 0.8 0.7 0.6 0.5 0.5 0.2 0.8 0.5 0.5" 0.9 0.7 0.5" B(x2) = 0.6 0.6 0.4 B(x3)= 0.8 0.6 0.5 0.6 0.5 0.2 0.7 0.5 0.3 Theorem 3.4. Let A and B be multiple sets in MS(n k) (X) and M be the multiple set in MS(n,k)(X) for which membership matrices for each x g X is [0.5]nxk. Then, similarity measure S(A, B) defined in equation (3.1) satisfies the following properties: 1. Suppose fuzzy similarity measure S satisfies the property A = B ^ S(A, B) = 1. Then A = B ^ S (A, B) = n. 2. Suppose fuzzy similarity measure S satisfies the property A n B = ^ ^ S(A, B) = 0. Then An B = $ ^ S(A, B) = 0. 3. Suppose fuzzy similarity measure S satisfies the properties A = M ^ S(A, A) = 1 and 0 < S(A, B) < 1. Then A = M ^ S (A, Al) = n. 4. Suppose fuzzy similarity measure S satisfies the property S(A, A) = 0 ^ A = I or A = 0. Then S (A, A) = 0 ^Ag £(„,k)(X). Remark 3.5. 1. Converse of (1) in Theorem 3.4 need not be true. For example, let X = {x1,x2,x3} be the universal set and A and B be multiple sets in MS(3,3)(X) given by the following membership matrices; 0.8 0.5 0.5" and hence S(A, B) > S(A, C). In a similar way, we can prove that S(B, C) > S(A, C). That is, S(A, B) satisfies all the axioms of Definition 3.1. Thus S (A, B) is a similarity measure between the multiple sets A and B in X. □ Example 3.3. Let X = {x1, x2, x3} be the universal set and A and B be multiple sets in MS(3 3) (X) given by the following membership matrices; A(x1) = A(x2 ) = A(x3) = B(x1) = B(x2) = B(x3) = 0.5 0.5 0.9 0.6 0.5 0.8 0.5 0.6 0.9 0.5 0.5 0.8 0.6 0.6 0.8 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.1 0.5 0.5 0.3 0.4' 0.2 0.2 0.5 0.4 0.2 0.5 0.5 0.2 0.5 0.5 0.1 Consider 3 similarity measures S1, S2 and S3 of fuzzy sets, given by the equations 2.10, 2.11 and 2.12, respectively. Then, from simple mathematical calculations, we have the similarity measures S(A, B) between multiple sets A and B based on similarity measure S1 is 2.584, based on S2 is 2.733 and based on S3 is 2.677. Using the properties of fuzzy similarity measure and definition of similarity measure of multiple set the following properties can be proved easily: Choose the fuzzy similarity measure S1 given by the equation (2.10). From simple calculations, we obtain S(A, B) = 3. But, here A = B. 2. Converse of (3) in Theorem 3.4 need not be true. Consider the multiple set A given in above example. Then complement of A is given by the following membership matrices; A(x1) 0.5 0.5 0.9 0.5 0.5 0.5 0.2 0.5 0.5 Similarity Measure of Multiple Sets. Informatica 44 (2020) 335-347 341 0.5 0.5 0.1 A(x2) = 0.6 0.5 0.5 0.7 0.5 0.5 0.6 0.5 0.2 A(x3)= 0.8 0.5 0.5 0.8 0.5 0.4 Choose the fuzzy similarity measure Si given by the equation (2.10). From simple calculations, we obtain S(A, A) = 3. But, here A = M. Based on the similarity measure of fuzzy sets and fuzzy aggregation operator, we give a similarity measure formula for multiple sets as follows: Let S be any similarity measure of fuzzy sets satisfying Definition 2.9 and H be any fuzzy aggregation operator[62]. For multiple sets A and B in MS(n,fc)(X), denote Sh(A,B)= £ S(H(Ai,42,...,Ak), i=i H (Bi,B2,...,Bk)) (3.4) for every i = 1, 2,..., n. Therefore, n £ S(H(A/, A2,..., Af), H(B/, B2,..., Bf)) > i=1 n £ S(H(A/, A2,..., Af),H(C/, C2,..., Cf)) i=i and hence SH (A, B) > SH(A, C). In a similar way, we can prove that Sh(B,C) > Sh(A,C). That is, S(A, B) satisfies all the axioms of Definition 3.1. Thus SH(A, B) is a similarity measure between the multiple sets A and B in X. □ Example 3.7. Let A and B be multiple sets given in example (3.3). Consider 3 similarity measures S1, S2 and S3 of fuzzy sets, given by the equations 2.10, 2.11 and 2.12, respectively. Here we consider three fuzzy aggregation operators H = avg, max or min. Then, the similarity measures S(A, B) between multiple sets A and B based on similarity measures S1, S2 or S3 of fuzzy sets and fuzzy aggregation operators H = avg, max or min are given in Table 1. Theorem 3.6. SH(A, B) is a similarity measure between the multiple sets A and B in X. Proof. Axioms (1) and (2) are obvious, respectively, from axioms (1) and (2) of Definition 2.9 for fuzzy similarity measure S. Axiom(3): Clearly, we have SH(C, C) < max SH(A, B) (3.5) A,BeMS(n,fc)(X) Now, for any A, Be MS(n,fc) (X), we have S(H(Ci,C?,...,Ck),H(Ci,C?,...,Ck)) > S(H (A/, A2,..., Af ),H (Bi,Bi2,...,Bk)) for every i = 1, 2, ... , n. Therefore £n=i S(H(Ci, C2,..., Cf), H(Ci, C2,..., Cf)) > £=i S(H(A/, A2,..., Af ),H(B/, B2,..., Bf)) So we have Sh(C, C) > Sh (A, B) for all A, B e MS(n,fc)(X). Therefore, Sh(C, C) > max Sh(A, B) (3.6) A,B€MS(n,k)(X) Combining equations (3.5) and (3.6), it follows that SH (C, C )= max SH (A, B) A,BeMS(n,fc)(X) Axiom(4): Suppose A, B, C e MS(n,fc)(X), such that ACBCC. Then Aj C Bj C Cj for every j = 1, 2,..., k and i = 1, 2,..., n. Then, from axiom (4) of Definition 2.9 for fuzzy similarity measure S, we have S(H(A/, A2,..., Af ),H(B/, B2,..., Bf)) > S(H(A/, A2,..., Af), H(C/, C2,..., Cf)) - avg max min Si 2.405 2.487 2.117 S2 2.645 2.633 2.566 S3 2.503 2.568 2.136 Table 1: similarity measures S(A, B) between multiple sets A and B given by the Definition (3.4). Using the properties of fuzzy similarity measure and definition of similarity measure of multiple set the following properties can be proved easily: Theorem 3.8. Let A and B be multiple sets in MS(n,fc) (X) and M be the multiple set in MS(n,j)(X) for which the membership matrix for each x e X is [0.5]nXfc . Let H denotes the fuzzy aggregation operators average, maximum or minimum. The similarity measure SH(A, B) defined in equation (3.4) satisfies the following properties: 1. Suppose fuzzy similarity measure S satisfies the property A = B ^ S(A, B) = 1. Then A = B ^ Sh (A, B) = n. 2. Suppose fuzzy similarity measure S satisfies the property A n B = ^ ^ S(A, B) = 0 and H = max or avg. Then AnB = $ ^ SH(A,B) = 0. Moreover, if H = min, then AnB = $ ^ Sh (A, B) =0. 3. Suppose fuzzy similarity measure S satisfies the properties A = M ^ S(A, A) = 1, 0 < S(A, B) < 1 and H = max or min. Then A = M ^ SH (A, A) = n. Moreover, if H = avg, then A = M ^ Sh (A, A) = n. 4. Suppose fuzzy similarity measure S satisfies the property S(A, A) = 0 ^ A = I or A = 0 and H = 342 Informatica 44 (2020) 335-347 V. Shijina et al. max or avg. Then SH (A, A) = 0 ^Ae £(n,k)(X). Moreover, if H = min, then A e £(n,k)(X) ^ S (A, A) = 0. Remark 3.9. 1. Converse of (1) in Theorem 3.8 need not be true. For example, let X = {x^x2,x3} be the universal set and A and B be multiple sets in MS(3,3)(X) given by the following membership matrices; 0.8 0.5 0.5 A(x1) = 0.5 0.5 0.5 0.6 0.5 0.1 0.9 0.5 0.4 A(x2) = 0.6 0.5 0.5 0.5 0.5 0.3 0.8 0.5 0.4 A(x3)= 0.8 0.5 0.2 0.6 0.5 0.2 0.8 0.6 0.4 B(x1 ) = 0.5 0.5 0.5 0.6 0.4 0.2 0.9 0.5 0.4 B(x2)= 0.6 0.5 0.5 0.5 0.5 0.3 0.8 0.5 0.4 B(x3)= 0.8 0.5 0.2 0.6 0.4 0.3 Choose the fuzzy similarity measure S1 given by the equation (2.10). From simple calculations we obtain; Smax (A, B) = 3 and SaVg (A, B) = 3. But, here A = B. Now, let C be a multiple set in MS(3,3)(X) given by the following membership matrices; C(xi) = C(X2) = C(X3) = 0.8 0.5 0.6 0.9 0.6 0.5 0.8 0.8 0.6 0.6 0.5 0.4 0.5 0.5 0.5 0.5 0.5 0.4 0.5 0.5 0.1 0.4 0.5 0.3 0.4 0.2 0.2 Choose the fuzzy similarity measure S1 given by the equation (2.10). From simple calculations we obtain, Smin (A, C) = 3. But, here A = C. Smin(A,B) = 0 need not imply AnB = For example, let X = {x1,x2,x3} be the universal set and A and B be multiple sets in MS(3 3) (X) given by the following membership matrices; A(xi) 0.8 0.5 0.6 0.5 0.3 0.5 0.5 0.0 0.0 0.2 0.1 0.5 0.8 0.8 0.3 0.2 0.1 0.5 0.5 0.5 0.2 0.0 0.0 0.3 0.5 0.4 0.0 A(x2) = A(x3) = B(xi) = B(x2) = B(x3) = Choose the fuzzy similarity measure S1 given by the equation (2.10). From simple calculations we obtain, Smin (A, B) = 0. But, here AnB = 3. Savg (A, A) = n need not imply A = M . For example, let X = {x1, x2, x3} be the universal set and A be multiple set in MS(3 3)(X) given by the following membership matrices; 0.4 0.2 0.0 0.5 0.5 0.5 0.6 0.4 0.2 0.9 0.5 0.5 0.6 0.5 0.5 0.5 0.2 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.6 0.4 0.3 A(x1) = A(x2) = A(x3) = 0.8 0.5 0.5 0.9 0.6 0.5 0.8 0.5 0.6 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.1 0.5 0.5 0.3 0.4 0.2 0.2 Then, the complement of A is obtained as follows; A(x1) = a(x2) = a(x3) = 0.5 0.5 0.9 0.5 0.6 0.7 0.6 0.8 0.8 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.2 0.5 0.5 0.1 0.5 0.5 0.2 0.5 0.4 Choose the fuzzy similarity measure S1 given by the equation (2.10). From simple calculations we obtain, Savg (A, A) = 3. But, here A = M. 4. Smin(A, A) = 0 need not imply A e £(n,k)(X). For example, let X = {x1,x2,x3} be the universal set and A be multiple set in MS(3,3) (X) given by the following membership matrices; Similarity Measure of Multiple Sets. Informatica 44 (2020) 335-347 343 0.3 0.2 0.0 A(xi) = 0.1 0.1 0.0 0.0 0.0 0.0 0.2 0.2 0.0 A(x2) = 0.1 0.1 0.0 0.5 0.3 0.0 0.0 0.0 0.0 A(x3) = 0.4 0.3 0.0 0.3 0.2 0.0 Then, the complement of A is obtained as follows; 1.0 0.8 0.7 A(xi) = 1.0 0.9 0.9 1.0 1.0 1.0 1.0 0.8 0.8 A(x2) = 1.0 0.9 0.9 1.0 0.7 0.5 1.0 1.0 1.0 A(x3) = 1.0 0.7 0.6 1.0 0.8 0.7 4 Choose the fuzzy similarity measure Si given by the equation (2.10). From simple calculations we obtain, Smin (A, A) = 0. But, here A £ £(3,3) (X). Applications of similarity measures to pattern recognition The capability of recognizing and classifying patterns is one of the most fundamental characteristics of human intelligence. Pattern recognition may be defined as a process by which we search for structures in data and classify these structures into categories such that the degree of association is high among structures of the same category and low between structures of different categories. There are three fundamental problems in pattern recognition. The first one is sensing problem which is concerned with the representation of input data obtained by measurements on objects that are to be recognized. In general, each object is represented by a vector, known as pattern vector, in which each component represents a particular characteristic of the object. The second problem is feature extraction problem, which concerns the extraction of characteristic features from the input data in terms of which the dimensionality of pattern vectors can be reduced. The features should be characterizing attributes by which the given pattern classes are well discriminated. The third problem is classification of given patterns. This is usually done by defining an appropriate discrimination function for each class, which assigns a real number to each pattern vector. Individual pattern vectors are evaluated by these discrimination functions, and their classification is decided by the resulting values. Each pattern vector is classified to that class whose discrimination function yields the largest value. Pattern recognition systems have found vast applications in many areas such as handwritten character and word recognition; automatic screening and classification of X-ray images; electrocardiograms, electroencephalograms, and other medical diagnostic tools; speech recognition and speaker identification; fingerprint recognition; classification of remotely sensed data; analysis and classification of chromosomes; image understanding; classification of seismic waves; target identification and human face recognition. The utility of fuzzy set theory in pattern recognition was already recognized and the literature dealing with fuzzy pattern recognition is now quite extensive. In their position paper[63], Mitra et al. gave an outline to the contribution of fuzzy sets to pattern recognition. They mentioned that the concept of fuzzy sets can be used at the feature level in representing input data as an array of membership values denoting the degree of possession of certain properties; in representing linguistically phrased input features for their processing; in weakening the strong commitments for extracting ill-defined image regions, properties, primitives, and relations among them. Also, fuzzy sets can be used at the classification level, for representing class membership of objects, and for providing an estimate (or representation) of missing information in terms of membership values. As mentioned above, fuzzy sets are very effective in representing different patterns in pattern recognition. Since multiple set is a generalization of fuzzy sets and it has the capability to represent numerous features simultaneously, they are well suited to model patterns. In this section, we establish a new procedure for pattern recognition with the aid of similarity measure on multiple sets. Assume that there exist m patterns which are represented by multiple sets A for r = 1, 2, ...m. Suppose that there be a sample to be recognized which is represented by a multiple set B. According to the principle of the maximum degree of similarity between multiple sets, we can decide that the sample belongs to the pattern A with maximum S(At, B). In the following, a fictitious numerical example is given to show application of the similarity measures to pattern recognition problems. Let three patterns be represented by multiple sets Ai, A2 and A3 on X = {x i, x2, x3}, given by the following membership matrices; "0.9 0.8 0.8 0.4 0.4 0.4 0.2 0.2 0.1 0.8 0.8 0.7 Ai(xi) Ai (x2) Ai(x3) = 0.8 0.5 0.1 0.7 0.7 0.4 0.2 0.7 0.7 0.4 0.1 0.6 0.6 0.3 0.1 0.7 0.6 0.2 0.0 0.6 0.5 0.3 0.0 0.7 344 Informatica 44 (2020) 335-347 V. Shijina et al. 0.7 0.7 0.6 0.5 0.5 0.3 0.2 0.2 0.2 0.9 0.9 0.8 0.6 0.5 0.4 0.7 0.6 0.5 0.3 0.3 0.1 0.9 0.8 0.8 0.8 0.8 0.7 0.8 0.7 0.6 0.3 0.0 0.0 0.9 0.7 0.7 0.2 0.2 0.0 0.5 0.5 0.4 0.8 0.7 0.5 0.5 0.4 0.2 0.6 0.2 0.1 0.5 0.5 0.3 0.9 0.8 0.8 0.4 0.4 0.4 0.5 0.3 0.2 0.7 0.6 0.6 0.9 0.9 0.9 0.4 0.2 0.2 A-2 (xi) A2(X2) A2 (X3) A3(xi) = A3(X2) = A3 (X3) Consider a sample B in MS(4,3)(X) which will be recognized, where B is given by the following membership matrices; B(xi) = B(x2) = B(x3) Consider 3 similarity measures Si, S2 and S3 of fuzzy sets, given by the equations 2.10, 2.11 and 2.12, respectively. Then the similarity measures S(A, B) for r = 1, 2, 3, given by the Definition (3.1) based on similarity measures Si, S2 and S3 of fuzzy sets are obtained in Table 2; Now, the similarity measures S(A, B) for r = 1, 2, 3, given by the Definition (3.4), based on similarity measures S1, S2 and S3 of fuzzy sets and fuzzy aggregation operators H = min, max or avg are given in tables 3,4 and 5. From the tables 2, 3,4 and 5, we can see that S(A, B) has the maximum value. The important point to note here is - Si S2 S3 S(Ai, B) 3.522 3.8 3.55 S(A2, B) 3.23 3.5 3.257 S (A3, B) 2.098 2.533 2.068 1.0 0.9 0.9 0.3 0.3 0.3 0.2 0.2 0.1 0.7 0.7 0.6 0.7 0.6 0.5 0.5 0.4 0.2 0.2 0.2 0.1 0.7 0.7 0.6 0.7 0.6 0.5 0.5 0.3 0.3 0.3 0.2 0.1 0.7 0.7 0.6 Table 2: similarity measures S(A, B) for r = 1, 2,3, given by the Definition (3.1). - Si S2 S3 Smi„(Ai, B) 3.062 3.766 3.069 Smi„(A2, B) 2.464 3.366 2.81 Smi„(A3, B) 1.428 2.334 1.353 Table 3: similarity measures S(A, B) for r = 1, 2,3, given by the Definition (3.4) based on H = min. - Si S2 S3 Smax (Ai , B) 3.455 3.766 3.55 Smaœ(A2, B) 3.124 3.367 3.136 Smax (A3 , B) 2.093 2.5 2.068 Table 4: similarity measures S(A, B) for r the Definition (3.4) based on H = max. 1, 2, 3, given by - Si S2 S3 S avg (Ai, B) 3.361 3.765 3.424 S avg (A2, B) 2.884 3.368 3.049 S avg (A3, B) 1.772 2.4 1.728 Table 5: similarity measures S(A, B) for r the Definition (3.4) based on H = avg. 1, 2, 3, given by that all formulae of multiple similarity measure mentioned here, results the same conclusion. Obviously, the sample B belongs to the pattern represented by the multiple set A1. 5 Conclusion Similarity measure of fuzzy sets is a mature research field and has found applications in diverse areas such as pattern recognition, image processing, decision making, etc. Comparatively, similarity measure of multiple sets is a new topic. This paper deals with the similarity measure of multiple sets. Two formulas for similarity measure of multiple sets are proposed and their properties are investigated. This new concept is applied to pattern recognition problem and the suitability of proposed method is demonstrated using a numerical example. We believe that the concept can be extended to other applications such as image processing, decision making, etc. Investigation along these lines will be considered as a part of future work. Acknowledgement The second author acknowledges the financial assistance given by Kerala State Council for Science Technology and Environment (KSCSTE), INDIA through Student Project Scheme (541/SPS63/2018/KSCSTE) for a part of the work carried out and included in this this paper. Similarity Measure of Multiple Sets. Informatica 44 (2020) 335-347 345 References [1] Goguen, Joseph A, L-fuzzy sets, Journal of mathematical analysis and applications 18(1) (1967): 145-174. https://doi.org/10.1016/ 0022-247X(67)90189-8 [2] Cerf V , Fernandez E , Gostelow K, Volansky S, Formal Control-Flow Pproperties of a Model of Compu-tation,Tech. rep. California Univ., Los Angeles. Dept. of Computer Science(1971) [3] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences 11(5) (1982): 341-356. https://doi.org/10 .1007/ BF01001956 [4] Atanassov KT, Intuitionistic fuzzy sets, VII ITKRs Session, Sofia deposed in Central Sci, Technical Library of Bulg. Acad. of Sci 1697(84) (1983): 84. [5] Yager, Ronald R, On the theory of bags, International Journal of General System 13(1) (1986): 23-37. https://doi.org/10 .1080/ 03081078608934952 [6] Goguen, Joseph A, Vague sets, IEEE transactions on systems, man, and cybernetics 23(2) (1993): 610-614. https://doi.org/10.1109/21. 229476 [7] Sebastian, Sabu and Ramakrishnan, TV, Multi-fuzzy sets, International Mathematical Forum 5(50) (2010): 2471-2476. [8] Shijina, V., Sunil, J.J & Anitha, S. Multiple sets. Journal Of New Results In Science (9)(2015) 18-27. [9] Shijina, V., Sunil, J.J & Anitha, S. Multiple sets: A unified approach towards modelling vagueness and multiplicity. Jounal Of New Theory(11)(2016) 29-53. [10] Shijina, V. & Sunil, J.J Aggregation operations on multiple sets. International Journal Of Scientific & Engineering Research. 5(9),(2014) 39-42 [11] Shijina, V. & Sunil, J.J Matrix-norm aggregation operators. Iosr Journal Of Mathematics. 1(2),(2017) 511 [12] Shijina, V. & Sunil, J.J Multiple Relations and its Application in Medical Diagnosis. International Journal Of Fuzzy System Applications, 6(4) (2017) 16 Pages. https://doi.org/10.4 018/IJFSA. 2017100104 [13] Zadeh, LA. Fuzzy sets. Information And Control. 8(3),(1965) 338-353. https://doi.org/ 10.1016/S0019-9958(65)90241-X [14] Zadeh, LA. Similarity relations and fuzzy or-derings. Information Sciences. 3(2) (1971) 177-200. https://doi.org/10.1016/ S0020-0255(71)80005-1 [15] Zwick, R., Carlstein, E. & Budescu, D. Measures of similarity among fuzzy concepts: A comparative analysis. International Journal Of Approximate Reasoning. 1(2) (1987) 221-242. https://doi. org/10.1016/0888-613x(87)90015-6 [16] L.Xuecheng, Entropy, distance measure and similarity measure of fuzzy sets and their relations. Fuzzy Sets And Systems. 52(3),(1992) 305-318. https://doi.org/10.1016/ 0165-0114(92)90239-z [17] C P. Pappis, & N.I. Karacapilidis, , A comparative assessment of measures of similarity of fuzzy values. Fuzzy Sets And Systems. 56(2),(1993) 171-174. https://doi.org/10. 1016/0165-0114(93)90141-4 [18] Lee-kwang, H.,Y.S Song, & K.M Lee, Similarity measure between fuzzy sets and between elements. Fuzzy Sets And Systems. 62 (1994) 291-293. https://doi.org/10.1016/ 0165-0114(94)90113-9 [19] X.Wang, ,B. De Baets, & E. Kerre, A comparative study of similarity measures. Fuzzy Sets And Systems. 73(2), 259-268 (1995). https://doi.org/ 10.1016/0165-0114(94)00308-t [20] S.M Chen„M.S Yeh, & P.Y Hsiao, A comparison of similarity measures of fuzzy values. Fuzzy Sets And Systems 72(1) (1995) 79-89. https://doi.org/ 10.1016/0165-0114(94)00284-e [21] W.J Wang, New similarity measures on fuzzy sets and on elements, Fuzzy Sets And Systems. 85(3)(1997) 305-309. https://doi.org/ 10.1016/0165-0114(95)00365-7 [22] H.Rezaei, , M. Emoto, & M.Mukaidono, New Similarity Measure Between Two Fuzzy Sets. JACIII. 10(6)(2006), 946-953. https://doi.org/10. 20965/jaciii.2006.p0946 [23] S.Omran, & M. Hassaballah, A new class of similarity measures for fuzzy sets. International Journal Of Fuzzy Logic And Intelligent Systems. 6(2),(2006) 100-104 . https://doi.org/10. 5391/ijfis.2006.6.2.100 [24] T.Chaira, & A.Ray, Fuzzy approach for color region extraction. Pattern Recognition Letters. 24(12)(2003) 1943-1950. https://doi.org/ 10.1016/s0167-8655(03)00033-3 346 Informatica 44 (2020) 335-347 V. Shijina et al. [25] D. Van der Weken, M. Nachtegael,& E. E. Kerre, An overview of similarity measures for images, in: Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, Vol. 4, IEEE, 2002, pp. IV-3317. https://doi.org/ 10.1109/icassp.2002.1004621 [26] D.Van der weken, , M. Nachtegael, &E.E. Kerre, Using similarity measures and homogeneity for the comparison of images. Image And Vision Computing. 22(9)(2004) 695-702. https://doi.org/ 10.1016/j.imavis.2004.03.002 [27] D. Van der Weken, M. Nachtegael, V. De Witte, S. Schulte,& E. Kerre, A survey on the use and the construction of fuzzy similarity measures in image pro-cessing.in: Computational Intelligence for Measurement Systems and Applications, 2005. CIMSA. 2005 IEEE International Conference on, IEEE, 2005, pp. 187-192. https://doi.org/10.110 9/icip. 2007.4379511 [28] D. Van Der Weken, V. De Witte, M. Nachtegael, S. Schulte,& E. Kerre, Fuzzy similarity measures for colour images, in: Cybernetics and Intelligent Systems, 2006 IEEE Conference on, IEEE, 2006, pp. 1-6. https://doi.org/10.110 9/iccis. 2006.252363 [29] X. Chen, J. Tian& X. Yang, A new algorithm for distorted fingerprints matching based on normalized fuzzy similarity measure, IEEE Transactions on Image Processing 15 (3) (2006) 767-776. https:// doi.org/10.110 9/tip.2 005.8 60597 [30] M. Nachtegael, D. Van der Weken, V. De Witte, S. Schulte, T. Mélange, E. E. Kerre, Color image retrieval using fuzzy similarity measures and fuzzy partitions, in: Image Processing, 2007. ICIP 2007. IEEE International Conference on, Vol. 6, IEEE, 2007, pp. VI-13.https://doi.org/10.110 9/ icip.2007.4379511 [31] E. Szmidt & J. Kacprzyk, A concept of similarity for intuitionistic fuzzy sets and its use in group decision making, in: Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on,Vol. 2, IEEE, 2004, pp. 1129-1134.https://doi.org/ 10.1109/fuzzy.2004.1375570 [32] E. Szmidt & J. Kacprzyk, A new concept of a similarity measure for intuitionistic fuzzy sets and its use in group decision making, in: International Conference on Modeling Decisions for Articial Intelligence, Springer, 2005, pp. 272-282.https://doi.org/ 10.1007/11526018_27 [33] L. Dengfeng, C. Chuntian, New similarity measures of intuitionistic fuzzy sets and application to pattern recognitions, Pattern recognition letters 23 (1) (2002) 221-225. https://doi.org/10.1016/ s0167-8655(01)00110-6 [34] Z. Liang, P. Shi, Similarity measures on intuitionistic fuzzy sets, Pattern Recognition Letters 24 (15) (2003) 2687-2693. https://doi.org/10. 1016/s0167-8655(03)00111-9 [35] W.L. Hung, & M.S.Yang, Similarity measures of intuitionistic fuzzy sets based on Hausdorff distance. Pattern Recognition Letters. 25(14)(2004) 1603-1611. https://doi.org/10.1016/j. patrec.2004.06.006 [36] H. B. Mitchell, On the dengfeng-chuntian similarity measure and its application to pattern recognition, Pattern Recognition Letters 24 (16) (2003) 3101-3104. https://doi.org/10.1016/ s0167-8655(03)00169-7 [37] W.L. Hung, M.S. Yang, Similarity measures of in-tuitionistic fuzzy sets based on Lp metric, International Journal of Approximate Reasoning, 46 (14) (2007) 120-136. https://doi.org/10.1016/ j.ijar.2006.10.002 [38] P. Julian, K.-C. Hung, S.-J. Lin, On the Mitchell similarity measure and its application to pattern recognition, Pattern Recognition Letters 33 (9) (2012) 1219-1223. https://doi.org/10. 1016/j.patrec.2012.01.008 [39] G. A. Papakostas, A. G. Hatzimichailidis & V. G. Kaburlasos, Distance and similarity measures between intuitionistic fuzzy sets: A comparative analysis from a pattern recognition point of view, Pattern Recognition Letters 34 (14) (2013) 1609-1622.https://doi.org/10.1016/j. patrec.2013.05.015 [40] K. Atanassov & G. Gargov, Interval valued in-tuitionistic fuzzy sets, Fuzzy sets and systems 31 (3)(1989) 343-349. https://doi.org/10. 1016/0165-0114(89)90205-4 [41] C.P. Wei, P. Wang & Y.Z. Zhang, Entropy, similarity measure of interval-valued intuitionistic fuzzy sets and their applications, Information Sciences 181 (19) (2011) 4273-4286. https://doi.org/10. 1016/j.ins.2011.06.001 [42] L. A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning-I,Information sciences 8 (3) (1975) 199-249. https://doi.org/10.1016/ 0020-0255(75)90036-5 [43] W.-L. Hung & M.-S. Yang, Similarity measures between type-2 fuzzy sets, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 12 (06) (2004) 827-841. https://doi.org/10. 1142/s0218488504003235 Similarity Measure of Multiple Sets. Informatica 44 (2020) 335-347 347 [44] G. Zheng, J. Wang, W. Zhou & Y. Zhang, A similarity measure between interval type-2 fuzzy sets,in: Mechatronics and Automation (ICMA), 2010 International Conference on, IEEE, 2010, pp. 191-195. https://doi.org/10.110 9/icma. 2010.5589072 [45] C.M. Hwang, M.S. Yang, W.L. Hung& E. S. Lee, Similarity, inclusion and entropy measures between type-2 fuzzy sets based on the sugeno integral, Mathematical and Computer Modelling 53 (9) (2011) 1788-1797. https://doi.org/10. 1016/j.mcm.2010.12.057 [46] C.M. Hwang, M.S. Yang & W.L. Hung, New similarity and inclusion measures between type-2 fuzzy sets, in: Advances in Type-2 Fuzzy Logic Systems (T2FUZZ), 2011 IEEE Symposium on,IEEE, 2011, pp. 82-87. https://doi.org/10.110 9/ t2fuzz.2011.5949547 [47] M.-S. Yang & D.C. Lin, On similarity and inclusion measures between type-2 fuzzy sets with an application to clustering, Computers & Mathematics with Applications 57 (6) (2009) 896-907. https:// doi.org/10.1016/j.camwa.2008.10.028 [48] S. S. Mohamed& A. S. AbdAla, Applying a new similarity measure between general type-2 fuzzy sets to clustering, in: Computational Intelligence and Informatics (CINTI), 2011 IEEE 12th International Symposium on, IEEE, 2011, pp. 283-286. https:// doi.org/10.110 9/cinti.2 011.610 8514 [49] D.-C. Lin& M.S. Yang, A similarity measure between type-2 fuzzy sets with its application to clustering, in: Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on, Vol. 1, IEEE, 2007, pp. 726-731. https: //doi.org/10.110 9/fskd.2 0 07.123 [50] H. B. Mitchell, Pattern recognition using type-II fuzzy sets, Information Sciences 170 (2) (2005) 409-418. https://doi.org/10.1016/j. ins.2004.02.027 [51] P. Singh, Similarity measure for type-2 fuzzy sets with an application to students' evaluation, Computer Applications in Engineering Education 23 (5) (2015) 694-702. https://doi.org/10 .1002/ cae.21642 [52] V. Torra, Hesitant fuzzy sets, International Journal of Intelligent Systems 25 (6) (2010) 529-539. https: //doi.org/10.10 02/int.2 0418 [53] V. Torra, Y. Narukawa, On hesitant fuzzy sets and decision, in: Fuzzy Systems, 2009. FUZZ-IEEE 2009. IEEE International Conference on, IEEE, 2009, pp. 1378-1382. https://doi.org/10.1109/ fuzzy.2009.5276884 [54] Z. Xu, M. Xia, Distance and similarity measures for hesitant fuzzy sets, Information Sciences 181 (11) (2011) 2128-2138. https://doi.org/10. 1016/j.ins.2011.01.028 [55] Y. Li, J.M. Liu, J. Li, W. D. C.X. Ye & Z.F. Wu, The fuzzy similarity measures for content-based image retrieval, in: Machine Learning and Cybernetics, 2003 International Conference on, Vol. 5, IEEE, 2003, pp. 3224-3228. https://doi.org/10.110 9/ icmlc.2003.1260136 [56] H. Le Capitaine, A relevance-based learning model of fuzzy similarity measures, IEEE Transactions on Fuzzy Systems 20 (1) (2012) 57-68. https:// doi.org/10.1109/tfuzz.2011.2166079 [57] M. A. El-Sayed & N. Aboelwafa, Study of face recognition approach based on similarity mea-sures,International Journal of Computer Science Issues (IJCSI) 9 (2) (2012) 133-139. [58] X. Xu, L. Zhang& Q. Wan, A variation coefficient similarity measure and its application in emergency group decision-making, Systems Engineering Procedia 5 (2012) 119-124. https://doi.org/10. 1016/j.sepro.2 012.04.019 [59] L. Baccour, A. M. Alimi& R. I. John, Some notes on fuzzy similarity measures and application to clas-siffication of shapes recognition of arabic sentences and mosaic, IAENG International Journal of Computer Science41 (2) (2014) 81-90. https://doi. org/10.1109/fuzzy.2009.5276877 [60] S.M. Chen, A new approach to handling fuzzy decision-making problems, IEEE Transactions on Systems, Man, and Cybernetics 18 (6) (1988) 1012-1016. https://doi.org/10.110 9/21. 23100 [61] W. Bandler, L. Kohout, Fuzzy power sets and fuzzy implication operators, Fuzzy Sets and Systems 4 (1) (1980) 13-30. https://doi.org/10. 1016/0165-0114(80)90060-3 [62] K. George J & Y. Bo, Fuzzy sets and fuzzy logic, theory and applications, Prentice Hall PTR, (1995). [63] S.Mitra & S. K. Pal, Fuzzy sets in pattern recognition and machine intelligence, Fuzzy Sets and systems 156 (3) (2005) 381-386. https://doi.org/10. 1016/j.fss.2005.05.035 348 Informática 44 (2020) 335-347 V. Shijina et al. https://doi.org/10.31449/inf.v44i3.1907 Informatica 44 (2020) 349-310 303 Performance Assessment of a Set of Multi-Objective Optimization Algorithms for Solution of Economic Emission Dispatch Problem Sarat Kumar Mishra Department of Electrical and Electronics Engineering, Padmanava College of Engineering, Rourkela, India E-mail: mishra.sarat@gmail.com Sudhansu Kumar Mishra Department of Electrical and Electronics Engineering, Birla Institute of Technology, Mesra, Ranchi, India E-mail: sudhansu.nit@gmail.com Keywords: differential evolution, economic emission dispatch, multi-objective optimization, non-dominated sorting, particle swarm optimization Received: October 30, 2018 This paper addresses the realistic economic emission dispatch (EED) problem of power system by considering the operating fuel cost and environmental emission as two conflicting objectives, and power balance and generator limits as two constraints. A novel dynamic multi-objective optimization algorithm, namely the multi-objective differential evolution with recursive distributed constraint handling (MODE-RDC) has been proposed and successfully employed to address this challenging EED problem. It has been thoroughly investigated in two different test cases at three different load demands. The efficiency of the MODE-RDC is also compared with two other multi-objective evolutionary algorithms (MOEAs), namely, the non-dominated sorting genetic algorithm (NSGA-II) and multi-objective particle swarm optimization (MOPSO). Performance evaluation is carried out by comparing the Pareto fronts, computational time and three non-parametric performance metrics. The statistical analysis is also performed, to demonstrate the ascendancy of the proposed MODE-RDC algorithm. Investigation of the performance metrics revealed that the proposed MODE-RDC approach was capable of providing good Pareto solutions while retaining sufficient diversity. It renders a wide opportunity to make a trade-off between operating cost and emission under different challenging constraints. Povzetek: Opisan je izvirni multi-kriterijski optimirni algoritem za energetske sisteme, ki kombinira kriterij onesnaževanja in kriterij energetske potrošnje. 1 Introduction The Economic Load Dispatch (ELD) problem deals with the estimation of the scheduled real power generation from the committed units for best economic operation. Over the years the problem has become more complex due to the increasing effects of emissions from fossil fuel based power plants on the environment. The emission and fuel cost of each unit depend on the quantity of power to be generated. Both of them are nonlinear functions of power output. Minimum operating cost does not ensure minimum emission. Each operating condition must satisfy the power balance criterion and should obey the generating limits of the committed units. These can be considered as constraints. Generally, better quality fuel ensures less emission but it can be further reduced by proper scheduling of generation from different units. The cost coefficients and emission coefficients of these generating units do not match. Hence, achieving these two objectives, i.e. less cost and less emission is contradictory in nature. Thus, the EED problem has evolved as a modification of the ELD problem. Therefore, the EED problem is a multi-objective optimization problem with nonlinear constraints. In [1-2], the power engineers solved the ELD problem by scheduling of the generation of multi-unit systems using the derivative based Gauss-Siedel and Newton-Raphson algorithms along with the Lagrangian multiplier. These conventional methods suffer from the problem of getting trapped in local minima and also fail for system discontinuities due to prohibited zones. These techniques are inadequate to solve multi-objective problems with nonlinear constraints. Chang et al. [3] rehabilitated the inherently multi-objective EED problem to a single objective one by assigning weights to the operating cost and emission. This weighted sum approach requires many runs of the same algorithm to find the Pareto optimal front. The solutions arrived at by this method do not ensure a uniform Pareto front. The trade-off information is lost when the function is concave. To avoid this bottom-hole different evolutionary based heuristic approaches have been introduced by many researchers [4-5]. These evolutionary algorithms have considered the two objectives simultaneously and are shown to perform better as compared to the conventional ones. Chiang et al. [6] made a further refinement and proposed an improved genetic algorithm to speed up the search process. He used the e-constraint technique for efficient 350 Informatica 44 (2020) 349-360 S.K. Mishra et al. constraint handling and proposed a multiplier updating mechanism for better exploration of the search space. Deb et al. [7] proposed the non-dominated sorting genetic algorithm which utilized rank and crowding distance as parameters to arrive at a compromise between the two conflicting objectives. This was applied to the multi-objective environmental economic load dispatch problem in [8]. The Pareto optimal front could be obtained by a single run of the algorithm. But, this population based genetic algorithm depends upon biologically inspired factors like mutation and crossover parameters. It needs further improvement in terms of exploring a wider area in the search space. Brar et al. [9] made improvements in the search space by adding the fuzzy inference system. Muthuswamy et al. [10] modified the non-dominated sorting technique by incorporating a dynamic crowding distance to improve the diversity of solutions in the search space. These algorithms fail when there are discontinuities in the cost function. Nayak et al. [11] implemented another evolutionary algorithm, the artificial bee colony (ABC) optimization, and improved the convergence rate and reliability under the presence of the prohibited zones and ramp rate limits. Liang et al. [12] modified the ABC algorithm to form an improved artificial bee colony (IABC) by addition of a new skill called chaos ques in the search process. Mori et al. [13] made an excellent improvement in the exploration of search space through the implementation of the particle swarm optimization (PSO) for this multimodal problem. They also used adaptive parameter adjustment to improve the results. A significant improvement in search space exploration was made by Hadji et al. [14]. They incorporated a time varying acceleration of the particles to improve the robustness of the algorithm. Recently, a differential evolution (DE) algorithm came up which generates the next set of population of new particles by the addition of a differential vector obtained from the difference of the position vectors of two different particles other than the particle undergoing evolution [15]. This algorithm is still dependent on the bio-inspired parameters but is able to avoid premature convergence. Meza et al. [16] improved the algorithm by incorporating spherical pruning for better exploitation of the search space. Di et al. [17] introduced a marginal analysis correction operator to improve the constraint handling. In [18], the particle swarm optimization algorithm has been developed which is based on the intelligence of flock of birds. The same has been improved and tested for multi-objective problems in [19-21]. The EED problem has been solved to decide the unit commitment of the power system by considering operational power flow and environmental constraints in [22]. But, it again utilized the method of conversion of the multi-objective problem to a single objective one. A new approach to optimization is proposed in [23] which hybridized adaptive PSO and DE for improvement of the search space. An improvement over ABC called as multi-objective global best artificial bee colony (MOGABC) optimization is suggested in [24] for better constraint handling in EED problem. The EED problem has been further modified and applied to the micro-grid containing renewable sources along with the conventional thermal power stations in [25]. It also converts the problem to a single objective one by incorporating a h-index. In this paper, a new constraint handling mechanism has been implemented, and a new multi-objective optimization (MOP) algorithm, namely the multi-objective differential evolution with recursive distributed constraint handling (MODE-RDC) has been proposed. The constraint handling mechanism is suitably incorporated in three multi-objective optimization (MOP) algorithms, and the effectiveness of the algorithms has been tested under various load conditions. 2 Multi-objective optimization: a review The main aim of the multi-objective optimization technique is to optimize two or more conflicting objectives simultaneously. The MOP is denoted by a decision variable vector, each element of which represents the objective functions [21]. The solution to the MOP is the optimum value of the vector function by considering all the constraints. A multi-objective minimization problem can be generalized as follows: Minimize f(x) = (^(x), f2(x).....fM(S)) (1) Subject to constraints: gj(X) < 0; j=1, 2, •••, J (2) hk(X) = 0; k =1, 2, ..., K (3) where, X is a vector with N decision variables x=[x1, x2, "■, XN]T The search space may be limited by lower and upper bounds lbj < xj < ubj; i =1, 2, ..., N (4) A solution vector u=[u1, u2, ..., uN]T dominates over another solution v= [v1, v2, ..., vN]T if and only if fI(u) m°) (1) p{f (o, m)= k} = cm n,k)> m0}= 1 - Z c° cmn k=max( m-o,m0 ) min {k,(m-k)m0} Z cW-k where Pl and P2 are the connectivity probability and disconnectivity probability respectively, p{f (o, m) = k} stands for the probability that there are no vehicles in k road grids, p{g(m'k)>m°} stands for the probability that the number of continuous grids between k road grids is larger than m0 , and symbols in the form Cb of a stand for the most probable number of selecting b objectives from a optional objectives for combination. 3 Data transmission optimization algorithm VANET in the intelligent traffic system takes the vehicle as a node and transmits the information within the communication radius of the vehicle. The information interaction between vehicles constitutes the network. Naturally, for better relay transmission of information, the corresponding routing algorithm, or in other words, the data transmission path selection algorithm, is needed [10]. 3.1 Ant colony algorithm As an intelligent algorithm, the ant colony algorithm [11] is an imitation of the foraging phenomenon of an ant colony in nature, which is mostly used for issues such as TSP, scheduling optimization, and path selection. In this study, the data transmission optimization algorithm in VANET can be said to be a path selection problem, and moreover, the self-organization and dynamic variability of ant colony foraging is quite consistent with the characteristics of VANET. The ant colony algorithm is applied in VANET, and the process is as follows. Firstly, ants start from the source node and then select the next hop node from the neighbor candidate nodes according to the candidate probability. The calculation formula of candidate probability [12] is: i [v,l(t)r-u,l(t)?\g,lmr\Klms . „ , I —------- j e allowed,, pki(t)=I Z[y(or ■ /cof ■ [gscor ■ [hs(t)]' k '.allowed e, (t) f (t) = — 11 Z e* (t) 0 otherwise , (2) g,j(t) = =c(i) (Emx - ej(t))-1 Z (Emax - e* (t))-1 seC(i) hl(t) ^^ seC(i) where pikl(t) represents the probability of ant k jumping from node i to node l at the t -th time of traversal search, y/ij (t) represents the pheromone concentration of the path between node i and l at the t -th time of traversal search, f^ (t) and gtj (t) are the energy metric parameters of node i and l at the t -th time of traversal search, hil(t) represents the space parameter of node i and l' at the t -th time of traversal search, i' e allowedk m n k=max( m-o,m ) c° k (m-k)n c k c Research on Data Transmission Optimization Informatica 44 (2020) 361-366 363 represents that node j belongs to the set that does not include the nodes that ant k has passed within the wireless communication range of node i , s e C(i) represents that node s belongs to the node set of node i within the wireless communication scope, (t ) represents the energy that node j has at the t -th time of traversal search, represents the maximum energy that can be provided by node, h— represents the rely hop count between node j and goal node d , and a , @ , Y and à are importance factors of corresponding parameters. When all ants arrive at the goal node according to the selection probability, pheromones on all paths are updated. The update formula [13] is as follows: 'w,j (t +1) = (1 -p)¥j (t) + A^,j (t) (t) = Z k=1 cEÎg (hmx - hk ) ' (3) wK where p is the pheromone volatilization coefficient, A^j (t) ; is the increment of pheromone between node i and j , c is a constant,w is a normal number, " a is the node average energy of ant k in the current path, hmax is the maximum node hop count that ant in the network h h can realize, and k is the node hop count of ant k in the current path. After the pheromone is updated, the ant colony is asked to search the path again. The above steps are repeated until the termination condition is reached, and the path with the most pheromones is regarded as the optimal path. 3.2 Improved ant colony algorithm In the process of calculating the selection probability of the next-hop node of an ant by the ant colony algorithm described above, although it takes into account the sharing of node energy efficiency so that the path can achieve the optimal energy efficiency, in the actual VANET, the premise of successful communication between nodes is that the nodes are connected, and the links between nodes may be broken as the vehicles acting as nodes are constantly changing in VANET. Only considering the energy efficiency, the nodes in the optimal path are likely to be disconnected. Therefore, this study improved the ant colony algorithm, predicted the connectivity by connectivity probability, put it into the calculation of node selection probability, and optimized the node searching of the ant colony. The process of the improved ant colony algorithm is shown in Figure 2. ® Parameters are initialized. @ Starting from the initial node, each ant chooses the next node according to the improved probability formula. The formula of the improved selection probability is: Ek Figure 2: The process of the improved ant colony algorithm. I [^j(i)]"-L«j(i)]^-[Vj(t)]'-[£„(t)]ä-[c„(i)]A . „ , I —---^---^-r j e allowedk pj(t)=| Z^s«]"-[»„(t)]"-[v„«r -[s„(i)]ä ■[c„(t)]i I seallowed I 0 otherwise U=-j- Z pf) Pj(t) = Pco (4) Ca (t) where j represents for the metric parameter of connectivity between node i and j and A is an importance factor of the corresponding parameter. ® Whether all ants in the ant colony have traversed from the initial node to the target node once is determined. If not, step @ repeats; if they do, then the pheromone in the path that the ant passes through is updated, and the update formula is equation (3). © Whether the algorithm has reached the termination condition is determined. If it does, it outputs the optimal path; if not, it will return to step The termination conditions include reaching the maximum number of iterations (taking all ants in the ant colony traversing the path once and updating the pheromone as one time of iteration) and the ant selection path converging to stability. 4 Simulation experiment 4.1 Experimental environment In this study, the running track and running state of vehicles were simulated by VANET mobile simulator, and the network which was composed of vehicles and the routing algorithm proposed were simulated by MATLAB software [14]. VANET mobile simulator was a free software developed based on the Java platform, which can realize the macro and micro-movement simulation of vehicles, and it can make the movement of simulated vehicles closer to the actual situation. MATLAB software is a network simulation software, which builds the network based on the vehicle trajectory simulated by the VANET simulator and simulates the routing algorithm. The experiment was carried out in a server in a laboratory. The server configuration was Windows 7 system, i7 processor, and 16 G memory. 364 Informatica 44 (2020) 361-366 H. Wang 4.2 Experimental parameters The relevant parameters of the simulation experiment are shown in Table 1, including vehicle simulation parameters and network simulation parameters. Vehicle simulation parameters included road topology size, number of vehicles, speed range, vehicle acceleration, and lane number. Network simulation parameters included simulation time, number of ant colonies, node communication radius, pheromone volatilization coefficient, minimum pheromone, max , node unit information transmitting and receiving energy and node communication protocol. Also, the Nakagami channel transmission model was adopted for the channel attenuation in the process of node communication [15]. To verify the performance of the improved algorithm, it is compared with the ant colony algorithm and geographic location-based routing algorithm. 4.3 Performance indicators The indicators that can reflect the reliability of network data transmission include packet loss rate and time delay. Packet loss refers to the phenomenon that part of the data packets are lost due to the instability or disconnection of the node-link when the information is transmitted to other nodes in the form of data packets. The packet loss rate is the degree of packet loss. Time delay is the time taken for data transmission from one node to the goal node. For the node network, the lower the packet loss rate and time delay are, the faster and more stable data transmission is and the more reliable the whole network is. In this study, the network delay and packet loss rate were estimated by the time stamp method. In addition to the above two indicators, the reliability of network transmission can also be measured by the statistical method. In this study, the VANET under three routing algorithms were simulated and analyzed by the Monte Carlo method many times, and the number of times that did not meet the requirements was counted to estimate the reliability of the network. The estimation formula is: R = 1 -(5) i where Rl stands for the estimated value of reliability after the i -th time of the simulation and N1 stands for the number of times that the network indicators did not meet the requirements after the i -th time of simulation. When the network transmits data, if the average packet loss rate was greater than 4% or the average delay was greater than 100 ms, it was considered that the network indicators of this simulation did not meet the requirements. In this study, when the Monte Carlo method was used, the precision interval was +3%, and 97% of them met the precision interval. 4.4 Experimental results As shown in Figure 3, under the same number of vehicle nodes, the packet loss rate of the algorithm proposed in this study was the lowest, and the routing algorithm based Vehicle Road topology 1000m x 1000m simulation size parameters Number of vehicles 10~100 Speed range 10 ~ 30m / s Vehicle acceleration 2m / s 2 Number of 3 lanes Network Simulation time 900 s simulation Number of ant 30 parameters colonies Node 150 m communication radius Pheromone 0.7 volatilization coefficient Minimum 0.01 pheromone E max 1J Node unit information 4.3 J / bit transmitting energy Node unit information 2.4,ßJ / bit receiving energy Node IEEE 802.11 communication protocol Table 1 : Related parameters of the simulation experiment. on geographical location had the highest packet loss rate. It was seen from the curve variation that the network packet loss rate under the three routing algorithms showed a trend of decreasing first and then increasing with the increase of vehicle nodes in the network, and the number 6 - 1 ■ 0 -'-'-'-'-1-'-1-1-1- 0 10 20 30 40 50 60 70 80 90 100 Number of vehicle nodes —•— The graphical location based routing algorithm • Ant colony algorithm • Improved ant colony Figure 3: Relationship between network packet loss rate and nodes under three routing algorithms. Research on Data Transmission Optimization Informatica 44 (2020) 361-366 365 of vehicle nodes of the improved algorithm was the largest and that of the routing algorithm based on geographical location was the smallest when the packet loss rate was the lowest. As shown in Figure 4, under the same number of vehicle nodes, the network average delay of the algorithm proposed in this study was the smallest, and that using of the routing algorithm based on geographical location was the largest. The average delay curve changes of the three routing algorithms clearly showed that the average delay of the network first decreased and then increased with the increase of the number of vehicle nodes, and the number of vehicle nodes of the ant colony algorithm and improved ant colony algorithm was the largest when the average time delay was the smallest and increased gently afterward. The simulation model was simulated by the Monte Carlo method for many times, and the simulation times that did not meet the set indicators were counted to evaluate the reliability of the network. The results are shown in Figure 5. It was seen from Figure 5 that the network reliability of the improved ant colony algorithm was the highest, and the network reliability of the routing algorithm based on geographical location was the lowest under the same number of vehicle nodes. It was seen from the curve changes that the network reliability of the routing algorithm based on geographical location and ant colony algorithm increased first and then decreased, while the network reliability of the improved ant colony algorithm increased first and then basically stabilized at 0.99. The above three results showed that the improved ant colony algorithm could effectively reduce the packet loss rate and average delay in the data transmission between nodes in VANET and improve the reliability of network data transmission. Although the packet loss rate and average delay of the network increased when the number of vehicles in VANET increased to a certain extent, under the effect of the improved ant colony algorithm, the increase amplitude was smaller than the other two 1.05 0.8 ■ 0.75 -1-1-1-1-1-1-1-1-1- 0 10 20 30 40 50 60 70 80 90 100 Number of vehicle nodes 0 The geographical location based routing algorithm • Ant colony algorithm —•— Improved aut colony algorithm Figure 5: Relationship between network reliability and nodes under three routing algorithms. algorithms, and the reliability of network transmission was maintained at a high level. The reason why the packet loss rate and time delay decreased and the reliability increased was that the number of nodes that could transmit smoothly around the nodes increased when the number of vehicle nodes in the network increased, which led to more choices of the data transmission path. The reason why the packet loss rate and time delay increased and the reliability decreased was that when the vehicle nodes increased to a certain extent, the interference of the surrounding node signals increased although the optional excellent path increased. However, the improved ant colony algorithm not only considered the energy efficiency but also considered the connectivity between vehicles, which made the path selection more inclined to the path with stronger connectivity to slow down the increase of packet loss rate and delay and maintain the stability of reliability. 5 Conclusion This paper briefly introduced the vehicle communication network, VANET, and the ant colony algorithm for path search in the network, improved the ant colony algorithm, carried out a simulation experiment on the improved ant colony algorithm on the simulation platform, and compared it with the traditional ant colony algorithm and the geographical location-based routing algorithm. The results showed that: (1) with the increase of vehicle nodes in VANET, the packet loss rate presented a tendency of increasing first and then decreasing, the packet loss rate of the improved ant colony algorithm was always the lowest, and the packet loss rate of the geographical location-based routing algorithm was always the highest; (2) with the increase of vehicle nodes, the average delay in the network firstly decreased and then increased, the improved ant colony algorithm was always the smallest, and the geographical location based routing algorithm was always the largest; (3) the reliability of network data transmission increased and then decreased with the increase of nodes, but the reliability of the network under the improved ant colony algorithm rose first and then tended to be stable, the network reliability under the improved ant colony algorithm was always the highest and that of the routing o ■_i_■ ■ ■_■ ■ ■ ■_ 0 10 20 30 40 50 60 70 80 90 100 Niunber of vehicle nodes • The geographical location based routing algorithm m Ant colony algorithm —Improved ant colony algorithm Figure 4: Relationship between average network delay and nodes under three routing algorithms. 366 Informatica 44 (2020) 361-366 H. Wang algorithm based on geographical location was always the lowest. References [1] Bitam S, Mellouk A, Zeadally S (2015). VANET-cloud: a generic cloud computing model for vehicular Ad Hoc networks. IEEE Wireless Communications, 22(1), pp. 96-102. [2] Amoozadeh M, Deng H, Chuah CN, Zhang HM, Ghosal D (2015). Platoon management with cooperative adaptive cruise control enabled by VANET. Vehicular Communications, 2(2), pp. 110123. https://doi.org/10.1016/j.vehcom.2015.03.004 [3] Melaouene N, Romadi R (2019). An enhanced routing algorithm using ant colony optimization and VANET infrastructure. MATEC Web of Conferences, 259(6), pp. 02009. https://doi.org/ [4] Kumar N, Dave M (2016). BIIR: A Beacon Information Independent VANET Routing Algorithm with Low Broadcast Overhead. Wireless Personal Communications, 87(3), pp. 869-895. https://doi.org/10.1007/s11277-015-2620-y [5] Saxena R, Jain M, Sharma DP, Jaidka S, Thampi SM, El-Alfy EM (2019). A review on VANET routing protocols and proposing a parallelized genetic algorithm based heuristic modification to mobicast routing for real time message passing. Journal of Intelligent & Fuzzy Systems, 36(3), pp. 2387-2398. https://doi.org/10.3233/jifs-169950 [6] Li N, Martinez-Ortega JF, Diaz VH, Fernandez JAS (2017). Probability Prediction based Reliable Opportunistic (PRO) Routing Algorithm for VANETs. IEEE/ACM Transactions on Networking, PP(99). https://doi.org/10.1109/TNET.2018.2852220 [7] Li AN, Zhao YM, Xie P, Wang Q (2016). Research on routing algorithm based on the VANET. MATEC Web of Conferences, 44, pp. 01085-. https://doi.org/10.1051/matecconf/20164401085 [8] Kumuthini C, Krishnakumari P (2015). An access point based routing algorithm (APBR) using greedy method towards improving quality of service for VANET. International Journal of Applied Engineering Research, 10(3), pp. 5489-5502. [9] Saravanan M, Ganeshkumar P, Kumar SM (2018). Survey on opportunistic routing algorithm for Vehicular Adhoc Network (VANET). International Journal of Pure and Applied Mathematics, 118(20), pp. 1735-1740. [10] Aadil F, Bajwa KB, Khan S, Chaudary NM, Akram A (2016). CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET. Plos One, 11(5), pp. e0154080. https://doi.org/10.1371/journal.pone.0154080 [11] Goudarzi F, Asgari H, Al-Raweshidy HS (2018). Traffic-aware VANET routing for city environments-A protocol based on ant colony optimization. IEEE Systems Journal, pp. 1-11. [12] Deepa Thilak K, Amuthan A (2017). Cellular Automata-based Improved Ant Colony-based Optimization Algorithm for mitigating DDoS attacks in VANETs. Future Generation Computer Systems, 82(MAY), pp. 304-314. https://doi.org/10.1016/jiuture.2017.1L043 [13] Goudarzi F, Asgari H, Al-Raweshidy HS (2019). Traffic-Aware VANET Routing for City Environments—A Protocol Based on Ant Colony Optimization. IEEE Systems Journal, 2019, 13(1), pp. 571-581. [14] Jindal V, Bedi P (2017). Preemptive MACO (MACO-P) Algorithm for Reducing Travel Time in VANETs. Applied Artificial Intelligence, pp. 74-196. https://doi.org/10.1080/08839514.2017.1300017 [15] Lakshmanaprabu SK, Shankar K, Rani SS, Abdulhay E, Arunkumar N, Ramirez G, Uthayakumar J (2019). An effect of big data technology with ant colony optimization based routing in vehicular ad hoc networks: Towards smart cities. Journal of Cleaner Production, 2019, 217(APR.20), pp. 584-593. https://doi.org/10.1016/j.jclepro.2019.01.115 https://doi.org/10.31449/inf.v44i3.3280 Informatica 44 (2020) 367-366 361 Automatic Image Segmentation for Material Microstructure Characterization by Optical Microscopy Naim Ramou, Nabil Chetih, Yamina Boutiche and Abdelkader Rabah Research Center in Industrial Technologies CRTI, P.O. Box 64, Cheraga, 16014, Algiers, Algeria E-mail: n.ramou@crti.dz Keywords: level set, microstructure characterization, image segmentation Received: January 13, 2020 This work shows the microstructure characterization utility for the analysis of material properties. To achieve this purpose, digital image segmentation is used on microscopic images of materials to extract the number of phases and their proportion present in the material to obtain a quantitative description of material properties and to better control product quality. In this way, we present here an automated method for segmenting the phases present in microscopic scanning images of metallographic samples using a multiphase level set with Mumford Shah formulation. Experience shows that the proposed model successfully detects phase regions for a variety of real micrographic images Povzetek: Predstavljena je metoda za segmentiranje slik, pridobljenih z optičnim mikroskopom, za ugotavljanje lastnosti materiala.. 1 Introduction The rate of ferrite in steel has a direct effect on its functional properties (yield strength, toughness, hardness, corrosion resistance, weldability, mouldability, embrittlement, magnetism). Its control and measurement are consequently very important. For this microstructures are used to determine the properties of materials. A full description of microstructures means giving a description of the size, shape and distribution of grains and second phase particles and their composition. This is why it is essential to establish the link between phenomena occurring at the microstructural scale and the properties of the material [1-5]. The microstructures are made up of a set of elements organized in the microscopic scale. Their observation and thus their characterization require the use of microscopic techniques. The phases differ from each other by their crystalline, semi crystalline or amorphous structure. Morphologies are observed by the optical or electronic microscope. The microstructures formed in the materials depend not only on the composition or on the chemical structure of the material but also on the existence of gradients of temperature or on concentration inside this one during its transformation. The microstructures are also strongly influenced by the energy needed for the creation of the new interfaces. Most of the microstructures which are formed during the solidification are crystalline in nature. The glass is always less stable than the crystal if it can form. In a number of cases, however, an amorphous (vitreous) structure appears during a fast cooling. Let us note that it is because of the absence of microstructure that lenses have their transparency. Some materials have a very irregular molecular structure and are not capable of developing a stable crystalline structure. They exist, in the solid state, only under a glassy shape, whatever are the conditions of cooling (atactiques polymers). The microstructure of the organic polymers is largely controlled by their chemical structure. If macromolecules have a regular molecular structure, the crystallization occurs generally. Note that the crystallization of polymeric materials is never complete (semi-crystalline structure) and form spherulites. The kinetic characteristics of phase transformations allow leading, by adequate heat treatments (tempering-annealing), very different microstructures, which have generally a nonhomogeneous composition and which are almost always metastable at the temperature of use. In the case of metals and their alloys, heat treatments combined with mechanical treatments, as the lamination, have reached a very high degree of sophistication. Ceramics are often obtained by sintering of powder which explains the presence of pores that are an important part of their microstructure. A variation of the microstructure over time during use entrains an important modification of its properties (aging phenomenon). Several works have been done in this subject. Furthermore, a study has been made around the development of microstructure and grain border character distribution at different temperatures [6]. Moreover, studies were made in the domain of segmentation of the materials micrographic images for the extraction of grains/phases which are fundamental to realize the microstructure descriptions. An automated algorithm used for segmenting the phases present in the scanning electron microscopic images of dual phase steel [7-9]. A level set segmentation model was proposed to analysis the images of particle to overcome the drawback of sensitivity to weak boundaries and curve's initial position [10]. In this way, we thus propose a different approach for image segmentation which is based on using a multiphase 368 Informatica 44 (2020) 367-372 N. Ramou et al. level set Mumford and Shah to segment the phases present in the scanning microscopic images of metallographic samples. To achieve this goal, we have to face in several complex problems such as the images quality and the presence of noises. The paper is organized as follows: Section 2 presents the materials and methods; in this section we divide the work into two parts: the first one is devoted to the description the material studied with all the necessary steps of preparation and acquisition of images used in this work. In the second part we describe and analyze the Mumford and Shah model used for image segmentation (multiphase case). Section3 shows the main contribution of this work which is the automatic distinguish between phases. We will illustrate the experimental results. Finally, conclusions are reported in Section 4. 2 Materials and methods 2.1 Sample preparation and image acquisition The acquisition and the information on different micrographic images that we used in this work are obtained from the NIKON optical microscope (Eclipse I.V 100 ND) figure 1 with camera equipped by an acquisition system (LV-LH50PC 12V50W Precentered Lamphouse Bright/darkfield switch and linked aperture stop (centerable),), it is possible to visualize all types of surfaces with a magnification up to 1500 times. The base metal that we used for the realization of this work is a stainless steel two-phase austeno-ferritic (duplex) of nuance 2205. Besides, the image processing which allows transforming easily the raw data into exploitable information, it is therefore possible to perform microscopic studies. For this a careful preparation are several successive steps for observation at the optical microscope: Starting with the coating of the samples in a hot phenolic resin, then the mechanical polishing under water and on an abrasive paper more and more fine (paper 320, 400, 500, 800, 1000, 1200, 2400, 4000), then the polishing of finishing by Felt sheet with use of diamond paste. Finally, after finishing of polishing is performed a chemical at- tack (H2O + HF) of the surface to reveal the microstructure. In our work, we used the three micrographic images at 200 magnification of 2205 duplex stainless steel heat-treated at temperatures respectively:800°C, 850°C and 950°C. Clearly and obviously, after observing the Figure 1, we notice that the three micrographic images of steel contain three different zones: Ferrite, austenite and the third zone known as sigma phase [11-12]; with 30 atoms per mesh (Fig 2). In addition to iron, the sigma phase contains Figure 2: Optical micrograph of2205duplex stainless steel after treatment temperatures:(A) at800° C;(203 x 288 pixels); (B) at 850°C; (236x 315 pixels); (C) at 950°C;(203x288 pixels. chromium and molybdenum which it draws from the matrix, thus causing a reduction in the corrosion resistance of Fe-Cr-Ni systems. Moreover, this phase, which is formed in the temperature range between 600 and 1000 C., causes a dramatic loss of the toughness of the stainless steels [13]. Precipitation of the sigma phase depends not only on the chemical composition of the steel [14]. Indeed, other factors influence its formation such as the size of the grains. Also, the sigma phase is more easily formed in high energy regions such as grain boundaries and interfaces. The solution temperature also affects precipitation in two ways: • The high dissolving temperatures induce grain enlargement, which reduces the rate of sigma phase formed. • On the other hand, at high temperatures, the ferrite content is increased, which at first glance encourages the precipitation of the sigma phase during the aging treatments. The sigma phase appears preferentially in the Austenite/Ferrite phase joints, but it can also appear in the grain boundaries Ferrite/Ferrite and Austenite/Austenite. It germinates in the Ferrite/Austenite interfaces and then grows inside the ferritic grains. An illustrative diagram of the precipitation of c is given in Fig. 2 and 3. The formation of the c phase, which is rich in Ferrite elements, causes the adjacent ferritic regions to become depleted in these elements, leading to the transformation of this ferrite into the secondary austenite. In this case, formation of the sigma phase takes place in the vicinity of the chromium nitride Automatic image segmentation for material microstructure characterization. Informatica 44 (2020) 367-372 369 Ni« —UCl. M" y^V — Mo a U—M«..ci V K M, V Ni5 ¡-p Ci. M > a Figure 3: (a) Structure of the quadratic sigma phase with parameters: a = 8.970 and c = 4.558; (b) diffusion of the atoms of alpha-elements of the femtic grain (a) towards austenite (y) and c; (c) Illustrative schema of the germination of the sigma phase at the interface Austenite/Ferrite and its growth inside the ferrite [12]. particles in the austeno-ferritic stainless steels having a high content of nitrogen (about 1% by weight). 2.2 Level set formulation Mumford and Shah have proposed a variational model that defines image segmentation as a problem of joint detection of homogeneous zones and contours [15]. The Mumford formulation model is based on the minimization of an energy function E where u represents our image with value bounded in Q from R2 to R and H the contour of each region Ri on which the image is approached by a function gi minimizing functional can be written as: E(F, gi,u) = *Hq (u - gfdxdy|Vgi )|2 dxdydl (1) Where X and f are positive real parameters for weighting the data fidelity term and long-term contours respectively. In the simple case where the functions gi are constants, the solution always exists. It can then be shown that the value of the gi on region Ri is the average denoted Ci of u restricted to Ri. In this framework, energy can therefore be rewritten as follows: E(ri, u) = Af f(u - c) dxdy + j f dl (2) With = f f R.udxdy f fR.dxdy In this formalization, the curve C is represented by the level line zero of a lipschitzienne function 0 such as 0=0 O > 0 inside C (3) 0 < 0 outside C So in this approach the unknown is not any more C but 0. If we introduce the function of Heaviside H and its derivative 50 in the sense of the distributions defined by H(z) = { 1 if z>0 0 if z <0 (4) So =~zH(z) (5) We have then: lengthC = $\VH(&))\dxdy = fnS0 (0 'a I \u — c2\2dxdy= I \u — c2\2(1 — H( 0 and = {me an(u0) = {me an(u0) c 00 c 01 ci0 = {me an(u0) c11 = {me an(u0) n n n n (0 and a n d a n d $2 > 0)} *2 < 0)}ri0) $2 > 0)}(10) $2 < 0)} ($1 < 0 ($1<0 £00(^1, $2) = mean(Uo)in{x: $1 > 0,$2 > 0} = fnU0H($1)H($2)dxdy fnH($1)H($2)dxdy c01(01,02) = mean(u0)in{x: 01 > 0,02 < 0} ^fnUpH(01)(1-H(02))dxdy fn H(01)(1-H(02))dxdy c10($1,$2) = mean(u0)in{x: 0} = faU0 (1-H($1))H($2)dxdy L(1-H(*1))H(*2)dxdy ' 370 Informatica 44 (2020) 367-372 N. Ramou et al. Contour 1 (1, ) Contour 2 ) Contour 3 (r5) / $]>0 / 02l>0 y 1 2>0 5 t|>l<0 <(>2>0 \ \ Region 1 (R1) Region 2 (HZ), R«jion 3 (R3) / (t>l<0 --__ 02 <0 Figure 4: Two initial curves of evolution which divide the image in four regions. c11(01,02) = mean(u0)in{x: ^ < 0, i)X1 - H Jß(1-H(0i))(1-H(02))dxdy The resolution by the associated Euler-Lagrange equation [17] leads to the evolution equations formulated by Oi, 02: Öl Fl -^r = <«01){V ^^j) - (|Uo - Clll2 - |Up - Cpi|2)H(02) + (|Uo - Ciol2 - |Up - Coo|2)(l - H(02)}} ^t = 5£(02){v ^j^) - (|Uo - ciil2 - |Uo - Ciol2)H(02) +(|Uo - Coi|2 - |Uo - Coo|2)(l - H(02)}} (11) In Figs 5 and 6 present respectively the segmentation steps and automatic detection of phases of our proposed method for synthetic image o 0 u °o c °o 0 it a 3» Fig.5. The segmentation steps of synthetic image :(a)Initial contour;(b) After 10 iterations;(c) After 30 iterations;(d) Finale contour. 3 Results 3.1 The conventional micrographic measurement approach The conventional approach to measuring ferrite in steel is by counting on a micrographic cup. The sample, prior to measurement, requires prior preparation. First of all, it must be small in size (cut-out of the part). The sample must then be coated (to make it easier to hold and eliminate edge effects). Then, the specimen must be a « M Fig. 6. Automatic detection of phases of synthetic image: a) First phase. b) Second phase. c) Third phase. d) Segmented image. polished (mirror polished and free of scratches that could hinder observation). Finally, the preparation requires acid etching (to reveal ferritic and austenitic structures). Analyses are to be carried out according to ISO 9042 standards. For this norm, the method for determining the volume fraction of a constituent consists first of all in choosing the grid of points (dimensions and number of points) according to the constituent to be studied. Then, the chosen grid of points is superposed on the metallographic cup. An Sample Temperature /1h Ferrite rate Microscopic image Sample 1 850°c 17% Sample 2 950°c 20% Table 1: Calculate of Ferrite rate. enlargement will be chosen to show the delimitation between the phases. Next, the number of points of the grid included in the component whose rate must be determined is counted. Finally, its volume fraction is deduced (Table 1). When counting, the values found in the fields are entered in an Excel table (which includes the calculations of the norm). When a field has been analyzed, the sample must be moved to observe another area. There are also other values calculated such as the 95% confidence interval. This method is relatively time-consuming because the sample has to be prepared before the analysis begins. In addition, it is destructive. It also requires special attention on the part of the user. Indeed, a parallax error can lead to a bad count. The choice of magnification can also influence the measured phase rate. These errors can be limited by increasing the number of observed fields. Phase rate analysis by image analysis [18] can be used. Automatic image segmentation for material microstructure characterization. Infoimatica 44 (2020) 367-372 371 3.2 The proposed approach using the Piecewise-Constant model (Multiphase case) Figure 7 illustrates the results of some of the steps to the segmentation of material microstructure images using a multi- circle type initialization of the initial contour with ^ = 0.25 10-5, N = 50 iterations and Ot = 0.5. To confirm the role of the segmentation step in the information analysis for the microstructure characterization, we have plot the histogram of micrographic image before segmentation step and after segmentation (figure 8). From this result we can say that after segmentation application we can used a simple thresholding to separate the phases in image, which he is not the case before segmentation step. Fig. 7. The segmentation steps of micrographic image : (A) Initial contour; (B) After 10 iterations; (C) After 30 iterations; (D) Finale contour. 3.3 Phases rate calculation For the characterization of the compositions of microstructure scale, it is really difficult to make the difference between the phases in order to calculate the phase rate using a micrographic image. In this work, we have proposed an automatic thresholding after segmentation application to detect the number of the phases as well as their percentage. The idea consists in the application of the segmentation multiphase follow-up by a histogram to separate the phases by a threshold Fig. 8. Histogram of micrographic image: (A) before segmentation; (B) After segmentation, blue represent respectively the phases: austenite, ferrite and c -phase. which is the average value of every phase. To separate the phases we used an automatic thresholding, the idea is based on the calculation of the average between the maximum values of each phase in Figure 9: Phasel < Sml Sm1 < Phase2 < Sm2 (12) Phasel > S_ In the case of Fig 9 we have two average values Smi and Sm2 to separate the three phases in image as follow: the number of pixel of (S,. < Phase < S,.+1) (13) the number of pixel of the image We present in Figures 9 and 10 the results that have been obtained by applying our proposed segmentation method on different micrograph images with no a priori knowledge about the number of phases of each image. This method makes the automatic detection of the number of phases of each image followed by a percentage of each phase representing the surface of the image. Three colours: yellow, green and blue represent respectively the phases: austenite, ferrite and a phase. 4 Conclusion Our contribution in this paper consists of determining automatically the number of phases and their proportion in a sample of metallic materials from a micrographic image. To reach this goal we used the variational approaches. The results obtained show that we have arrived to calculate the phase rates in an automatic way without the intervention of experts; we have applied this method on several images to validate the algorithm, and offer the expert a better micrographic image processing, which allows a reliable and reproducible results. j* y? t ti Fig. 9. Automatic detection of phases of micrographic image. (A) Original image; (B) Histogram of number of phase; (C) Segmented image. 372 Informatica 44 (2020) 367-372 References [1] Moghimi, M.K., Mohanna, F., (2019) A joint adaptive evolutionary model towards optical image contrast enhancement and geometrical reconstruction approach in underwater remote sensing. SN Appl. Sci. 1, 1242 https://doi.org/10.1007/s42452-019-1255-0 [2] Joshua P. (2013) Mechanical Properties of Materials. Solid Mechanics and Its Applications. [3] Lovell M.C., Avery A.J., Vernon M.W. (1976) Physical Properties of Materials. The Modern University Physics Series. [4] Laszlo S., Totha b., Chengfan G.C. (2014) Ultrafine-grain metals by severe plastic deformation. Microstructure characterization. 92,1-14. [4]Sarkhawas G., Arti B. (2015) Particle Analysis Using Improved Adaptive Level Set Method Based Image Segmentation. International Conference on Computing Communication Control and Automation. Pune. India, 747- 751. https://doi.org/10.1109/iccubea.2015.149 [5] Mohammadi J., Behnamian Y., Mostafaei A.,et al (2015) Friction stir welding joint of dissimilar materials between AZ31B magnesium and 6061 aluminum alloys: Microstructure studies and mechanical characterizations. Microstructure characterization. 101,189-207. https://doi.org/10.1016/j.matchar.2015.01.008 [6] Huang W.,Chai L.,Li Z.,Yang X., Guoc N.,Song B (2016) Evolution of microstructure and grain boundary character distribution of a tin bronze annealed at different temperatures'. Microstructure characterization. 114,204-210. https ://doi.org/10.1016/j.matchar.2016.02.022 [7] Chatterjee O.,Das K.,Dutta S.,Datta S.,Saha S.K. (2010) Phase Extraction and Boundary Removal in Dual Phase Steel Micrographs. IEEE India https://doi.org/10.1109/indcon.2010.5712693 N. Ramou et al. [8] Murase K.,Sugal S. (2013) Segmentation of dual phase steel micrograph: An automated approach. Measurement.46,2435-2440. [9] Choudhury, A., Pal, S., Naskar, R. and Basumallick, A. (2019), "Computer vision approach for phase identification from steel microstructure", Engineering Computations, Vol. 36 No. 6, pp. 1913-1933. https://doi.org/10.1108/ec-11-2018-0498 [10] Halimi, M. & Ramou, N. (2013) Extraction of weld defects dimension from radiographic images using the level set segmentation without reinitialization Russ J Nondestruct Test 49: 424. https://doi.org/10.1134/s1061830913070036 [11] Vanderschaeve E., Taillard R. and Foct J. (1994) Etude des phnomnes de prcipitation dans un acieraustnitique 19% de chrome et 19% de manganse, et "rs forte teneur en azote. J. Phys IV. Colloque C3, supplment au Journal de Physique III.4. https://doi.org/10.1051/jp4:1994312 [12] Zucatto I., Moreira M.C., Machado I.F., And Lebrao S.M.G. (2002) Microstructural Characterization and The Effect of Phase Transformations on Toughness of The UNS S31803 Duplex Stainless steel Aged treated at 850 C. Materials Research.5(3),385-389. https://doi.org/10.1590/s1516-14392002000300026 [13] Lacombe P., Baroux B., Beranger G. (1990) Les aciers inoxydables. [14] Chen T.H., Weng, K.L. And Yang, J.R. (2002) The Effect Of High Temperature Exposure On The Microstructural Stability And Toughness Property In A 2205 Duplex Stainless Steel. Materials Science And Engineering, A 338,259 270. https://doi.org/10.1016/s0921-5093(02)00093-x [15] D. Mumford and J. Shah. (1989) Optimal approximation by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics. XLII, 577685. [16] M. Rousson and R. Deriche. (2002) A Variational Framework for Active and Adaptative Segmentation of Vector Valued Images. Technical Report. 4515.Inria.France. https://doi.org/10.1109/motion.2002.1182214 [17] Chan T.F. and Vese L.A. (2000) Image segmentation using level sets and the piecewise constant Mumford-Shah model. Tech.Rep. UCLA Dept. Math, CAM 00-14. [18] Vese L.A. and Chan T.F. (2002) A Multiphase Level Set Framework for Image Segmentation Using the Mumford and Shah Model. International Journal of Computer Vision. 50(3),271293. Figure 10: Results of segmentation micrographic images by proposed method. https://doi.org/10.31449/inf.v44i3.3280 Informatica 44 (2020) 373-366 361 Smart Design for Resources Allocation in IoT Application Service Based on Multi-agent System and DCSP Mouadh Bali LIMED Laboratory, Faculty of Exact Sciences, University of Bejaia, Algeria Dept. Computer Science, Faculty of Exact Sciences, University of El Oued, Algeria E-mail: bali-mouadh@univ-eloued.dz Abdelkamel Tari LIMED Laboratory, Faculty of Exact Sciences, University of Bejaia, Algeria E-mail: tarikamel59@gmail.com Abdallah Almutawakel and Okba Kazar LINFI laboratory, Computer science department, University of Biskra, Algeria E-mail: aboud.aboud2012@gmail.com, o.kazar@univ-biskra.dz Keywords: IoT, IoT service, resource allocation, cloud computing, distributed constraints satisfaction problems, multi agent system Received: September 21, 2019 In the present paper, we aim at solving two problems; the first problem occurring in the transformation of the IoT devices (sensors, actuators, ...) to cloud service. Therefore, we work on maintaining a smooth and efficient data transmission for the cloud and support customer applications like: data sharing, storage and processing. The second problem has two dimensions. In the first dimension, the problem is arisen in the submission of cloudlets (customer requested jobs) to Virtual Machines (VMs) in the hosts. To solve this problem, we propose scheduling algorithm for resource allocation according to the lowest cost and load. In the second dimension, the problem lies in the hosting of new VMs in the hosts. To overcome this problem, we need take into account the loads when housing new VMs in different datacenters. In this work, we suggest a resource allocation approach for services oriented IoT applications. The architecture of this approach is based on two technics: Multi Agent System (MAS) and Distributed Constraint Satisfaction Problems (DCSP). The MAS manages the physical resources, making decision and the communication between datacenters, while DCSP used to simplify the policy of the resources provisioning in Datacenters. Variables and constraints are distributed among multiple agents in different layers. The experimental results show that the efficiency of our approach is manifested in: Average System Load, Cost augmentation Rate and Available Mips. Povzetek: Predlagan je način dodeljevanja virov za storitve v IoT aplikacijah na osnovi večagentnih sistemov (MAS) in zadovoljevanja porazdeljenih omejitev (DCSP). 1 Introduction Internet of Things (IoT) and Cloud Computing are two paradigm technologies utilized for a wide range of application in our life. IoT is a smart system to connect physical objects with sensors to enable them to collect and share the data via the internet [18]. The cloud is type of parallel and distributed systems. It is described as a model for application execution and data storage [19],[2] Cloud infrastructure allows customers using a large number of resources such as: network, storage and applications [1]. The data centers have a large number of resources commonly known as RA [20]. In cloud computing, RA is an issue due several challenges such as complexity, heterogeneity of resource that resides in the datacenter, scheduling, virtualization, migration [2],[3]. The motivation for studying this problem comes from IoT limited properties including: limited storage capacity and complicated processes (data analysis and a lot of heterogeneity in the devices) [18]. As result, we work on satisfying users' needs by providing resources allocation with lower cost. This cost is computed on the basis of smart solutions in datacenters (best host) according the resource constraints [8]. We provide a distributed resource allocation approach based on two technics: multi agent system (MAS) [17] and distributed constraint satisfaction problems (DCSP) [5], [10], [11], [23]. Overall, our main goal is to provide high performance services and minimize the costs of resources operating. In this paper, we study two problems related to IoT applications deployment in cloud computing. The first problem (Service Providing) occurring in the transformation of the IoT devices (sensors, actuators, ...) to cloud service. Therefore, we work on a smooth and efficient data transmission for the cloud and support customer applications like data sharing, storage and processing. We suggest a number of functionalities for service providing: service creation, service publishing and 374 Informatica 44 (2020) 373-386 M. Bali et al. service search. The second problem (Service Consumption) lies in the selection and execution of the service of resources allocation in the infrastructure of cloud computing. It occurs in two levels. In the first level, the problem is arisen in the scheduling of tasks (service cloudlet) to assign (submit) the cloudlets to the appropriate VMs taking into consideration the service's functional requirements and minimization of resources exploitation cost. In the second level, the problem lies in the hosting of new VMs in the hosts of the different datacenters according to their loads. The hosting of virtual machines has become a difficult issue in the resource allocation systems because each virtual machine is associated to a physical host according its available resources [6]. In order to solve the problem in both levels; we suggest smart solutions that depend on two techniques: The Multi Agent System (MAS) and CSP. The MAS manages the physical resources, making decision and the communication between datacenters. On the other hand, DCSP is used to simplify the policy of the resources provisioning in Datacenters. We organize the rest of the article as follows: Section 2 presents research works as related to the subject of this paper. Section 3 offers background and basic concepts. The developed mechanism and system architecture are defined in 4 section. Section 5 presents the main scenarios of interactions in the proposed system. Section 6 provides an illustrative example to clarify our approach. The experimental results are shown in section 7, the last section concludes the paper and presents the future perspectives. 2 Related works Because of the increasing demand of customers in the field of IoT in cloud infrastructure, many researchers have developed a number of methods to meet customers' demands by taking into account the efficiency of resources and operating expenses. Here, we mention some of the work done in this regard. Ghanbari et al. [9] proposed an analytics study for resource allocation mechanisms for IoT. The Authors of this paper seek to provide a model in the IoT resource allocation which aims at reducing load balancing, minimizing operational cost and power consuming. By reviewed and discussed the advantages and disadvantages of this mechanisms, they compared several parameters in different articles such as: availability, performance, bandwidth, cost, energy, QoS, SLA, throughput, etc. Besides, there are more service quality parameters to be studied such as: self-allocation features, self-adapting, modeling and earning from studies past and current behaviour. Ma et al. [13] suggest a model for task scheduling of the workflow in the IoT infrastructure as a service (IaaS) based on deadline constraints and cost-aware genetic optimization algorithm. To their approach is distributed at different levels according to the characteristics of cloud infrastructure due to the important features of the cloud (on-demand acquisition, heterogeneous dynamics and performance variation of VMs) so that no dependency exists between tasks at the same level. To demonstrate the feasibility of this approach, authors used the HEFT to generate individuals with the minimum completion time and cost. Fayazi et al. [7] focus on two factors for resource allocation: the reliability and rapid implementation of the work. Therefore, they suggested cloud resource allocation based on auction mechanism. The increase and the decrease in the reliability are determined by the success or failure of the implementation. These solutions are checked by using imperialist competitive algorithm and cost function which is calculated by make span and reliability values. Beside of the diversity of the techniques used in this work, it needs more flexibility for the heterogeneous resources. The work of Lu et al. [12] present a model to allocate the resources based on fairness evaluation framework by using two sub-models (Dynamic Demand Model (DDM) and Dynamic Node Model (DNM)) to describe the resource demand. The authors employ several typical algorithms in resource allocation like utility-based algorithm to prove their effectiveness. As strong point, this model supports the dynamic resources demands, but it does not take into account of the response time. Mezache et al. [15] suggest a genetic algorithm for resource allocation with energy constraint in cloud computing. They focus on two levels of resource allocation: cloudlets to virtual machines and virtual machines to hosts. These levels allow adapting the resource allocation system and keeping the cloud resources updated by taking into account the current submitted cloudlets. 3 Background and basic concepts for IoT and cloud In this section, we introduce some basic definitions and concepts as a background for our study. 3.1 Visions on integration internet of things and cloud computing The hybridization (combination) between IoT and Cloud Computing generates synergy for both technologies and bring many benefits. Cloud infrastructure offers a clear advantage to IoT systems since its datacenters are able to calculate the users' needs of resources allocation efficiently. It ,thus, shortens the execution time, reduces cost and speeds big data processing [16]. This combination between IoT and Cloud Computing allows to provide a number of technical benefits to users (for example, storage, optimization of resource utilization and energy efficiency) [4], [22]. Figure 1 describes the combination between IoT and Cloud Computing. Smart Design for Resources Allocation in IoT. Informatica 44 (2020) 373-386 375 V Cloud Data Stored «l» Storage \ (M) Processing Analytics J Gateway - • ((J)) 1 loT Objects z' V. Í . # \ Optimization Of Resource Figure 1: The combination between IoT and Cloud Computing. Figure 2: Architecture of IoT service-oriented. 3.2 IoT service-oriented architecture The aim of service oriented architecture (Figure 2) is to take advantage of the infrastructure of things and the cloud resources for obtaining a better quality of service (reduce the computing costs and improve the overall performance) [24]. The IoT services and devices are usually heterogeneous, and its resources are limited (e.g., memory, processing, bandwidth and energy). To manage such constrained environments, we need to build up a flexible architecture that is capable of managing these resources. 3.3 Components of IoT system In Figure 3, we present four fundamental components of IoT system (function and mechanism). • IoT devices and sensors: Sensor is one of IoT devices that has the capability to detect, measure and collect data from the physical environment such as: light, motion, heat, pressure or similar entities [9], [21]. • IoT gateways: The IoT gateway is a bridge between sensor networks and cloud services. The role of gateway is processing the collected data from sensors, then send it the cloud computing [21]. • Cloud function: Cloud function facilitates the advanced analytics and the monitoring of IoT devices in order to shortening the execution time, reducing costs and reducing energy consumption. • User interfaces: User interfaces are the visible and tangible part of the IoT system. They enable users to contact and monitor their activities in services that they have already subscribed using IoT system. 3.4 IoT deployed applications In 0 The deployment of IoT devices encounters number of challenges such as: heterogeneity, storage, bandwidth, implementation of management protocols. To overcome these challenges, researchers turn to the combination between IoT and Cloud Computing. This type of combination contributes in the deployment of high, smarter applications for smarter homes and offices, smarter transportation systems, smarter hospitals, smarter enterprises and factories [4], [25]. 3.5 The internet of things and multi agent systems Thanks its characteristics (intelligence, reactivity, autonomy, mobility and the ability to perform making decision). The MAS allows an efficient management for IoT applications in the physical cloud infrastructure such as: the heterogeneity, distribution and the data management In IoT applications. Briefly, MAS provides a decentralized smart solution to frame the new problems and their solutions in the resource allocation approach for services oriented IoT applications [22]. 3.6 Cloud infrastructure and constraint satisfaction problem The Constraint Satisfaction Problem technique is used to formulate and solve several artificial intelligence related problems such as: Scheduling and Optimization [14]. In the cloud Infrastructure, we use DCSP to simplify the policy of the resources provisioning in Datacenters. DCSP problem is formulated as a distributed Variables and 376 Informatica 44 (2020) 373-386 M. Bali et al. constraints to multiple agents. In MAS, each agent makes its proposal plan (solution) by using the distributed negotiation and satisfying its constraints. The various variables and constraints are identified, and the scenario of computing is painted accordingly. 4 Developing a new approach for RA in IoT At this stage, we proposed a new RA in IoT service. Then, we discuss its System Objectives, architecture, layers, DCSP modelling and system scenario. 4.1 System objectives This paper is interested mainly in the field of cloud of things. Particularly, it shows the importance of resource allocation in data centers. The aim of our approach is to ensure optimal management of resource allocation for service-oriented IoT applications based on decentralized intelligence in distributed computing. To achieve the stated goals i.e. load balancing (minimizing power consumption), efficiently exploiting resources and minimizing the execution time, we suggest: 1. Designing a system to manage the cloud infrastructure based on a multi-agent system for the allocation of resources in the cloud of things. 2. Developing a system to manage these resources by using two techniques: Multi-Agent System (SMA), Distributed Constraint Satisfaction Problems (DCSP). 3. Implementing and simulating the proposed system through a scenario that demonstrates the effectiveness of the proposed approach for the management of resources in the cloud of things. In this concern, we introduce a number of concepts and rules for IoT service delivery system specifications and the resource allocation process in cloud computing as shown below: -Concepts: 1- Cloud service contains a set of parameters (called nonfunctional parameters) such as: Latency Cost Data-format Availability Real number Real number Real number Real number 2- To execute cloud service, it requires a set of cloudlet's resources (called functional parameters). The cloudlet is represented in term of (Ram, Storage, Cpu and Bandwidth). RAM (MB) Storage (GB) CPU (mips) Bandwidth (Gbit/s) Real number Real number Real number Real number 3- Submission of Cloudlet to VM: is the selection of the Virtual machines (VMs) that have enough available resources to run cloudlet according to its resource requirements. 4- The hosting of VMs in hosts: is the process of selecting the host that provides the least price, low load and the best resources available for this VM. -Rules: 1 - Every object can be linked to many services. 2- Each service has one cloudlet request. 3 - Every Cloudlet should submitted to one VM. 4- Every VM can submit more than one cloudlet. 5- Every Host can host more than one VM. 6- Every Datacenter has two types of hosts: ON hosts and OFF hosts 7- Every host has special price. 8- The relationship between the price and the load of the host has a direct impact, where the augmentation in the load causes the increment of the price. 4.2 Smart design for resources allocation in IoT applications In this section, we are mainly interested in introducing a System Architecture for IoT Resource Allocation, its functional aspect and various layers to provide a better understanding to: how it works, how it stores and how to access to the cloud. Figure 4 describes the proposal smart design. Layerl (customer): In this layer, the system focuses on customers and their requests. The customer requests are presented in term of service name and characteristics. Layer2 (IoT Service): This layer has a significant the role as mediator between Customer Layer and Broker Layer. It contains two agents: 1. Object agent (OA): is reactive agent that represents an IoT object (physical device). It enables to control exchange and collect data from this device in order to provide a set of services to customers. 2. Mediator agent (MA): is cognitive agent; its role is to manage the customers' requests and the provided services. The main components of this agent are given below: • Service registry: aims to allow the OA agents to publish the information about their services in term of performance and functionalities. • Service selection: searches for a set of selected services in the registry that meet the customer request. • Service transfer MA creates a list of requested cloudlets from the performance characteristics of the selected services. Then, it sends this list of cloudlets to Broker Agent in the next layer. Broker Agent, in turn, arranges this list of cloudlets and send it to Resources layer Smart Design for Resources Allocation in IoT. Informatica 44 (2020) 373-386 377 Figure 4: Smart Design for Resources Allocation in IoT Applications. ID CL budget Vm id Size RAM Bandwidth Mips/pe Number of Pe 1 CPU, RAM, Bandwidth, Storage $ ID GB MB Gbit/s Mips Real Number 2 CPU, RAM, Bandwidth, Storage $ Table 1: Example of Broker Agent components. for the selection of the best cloudlet from this list by taking into account the resource allocation strategy in this layer. •Service Bind: after the selection of the best cloudlet, the MA connects the customer with the provider of the service that is associated by selected cloudlet. It also allows the OA to execute this service through this cloudlet. Layer3 (Broker Layer): The role of this layer is to manage the resources between IoT service and Resources layer. The broker agent (BA) manages the list of cloudlet requests, free VMs list, performance and delivery of cloud resources. The main role of this agent is to arrange a list of cloudlets, then send to Resources layer. 378 Informatica 44 (2020) 373-386 M. Bali et al. Layer4 (Resources layer): is the most important layer in the system due to its role in the managing, processing and selecting the best RA for the cloudlet in two levels: local level (between HA agents in the same datacenter) and the global level (between DCA agents of the cloud). This layer contains three types of agents. We introduce these agents and clarify the relationship between them by Figure 5. Datacenter agent (DCA): communicates with BA and hosts agents in the same datacenter. It also negotiates with other DCA. Host agent (HA): controls a host in state ON. Host off agent HOffA: controls a host in state OFF. S meet R : R(Av) < S(Av) and . R(Rep) < S(Rep) (1) Constraint 2 (Service Capacity): allows the service to handle new customer request. Before representing customer request in term of cloudlet, it should respect its capacity limitation: S(hr) + 1 < S(Cap) (2) Constraint 3 (Cloudlet Submission ability): virtual machine VMt has already a set of M cloudlets. In order to submit more cloudlet m', this condition must be satisfied: Clm,(length) + Clm(length) + Clm(outputsize)) < VMi(ram) (3) Clm,(file size) + Ym=1 Clm(file size) < VMi(storage) (4) Clm,(bw) + Yl=t Clm(bw) < VMl(bw) (5) Clm,(mips) + Y,m=i Clm(mips) < VMi(mips) (6) Where: Clm(mips) = CL(length) * CL(nbr_pe) (7) (8) VMl(mips) = Y.k=i PEkl(mips) Figure 5: Relationships between DCA and HA agents. 4.3 Relationships between DCA and HA agents DCSP problem is formulated as a distributed Variables and constraints to multiple agents. In MAS, each agent makes its proposal plan (solution) by using the distributed negotiation and satisfying its constraints. The various variables and constraints are identified, and the scenario of computing is painted accordingly. 4.3.1 Defining of the variables In this section, we show the most important variables and their definitions Table 2. 4.3.2 Constraints The aim of this section is to select the best solution for any task in DCSP systems. We thus need to define a set of constraints by using the previous defined variables that correspond to system requirements. Constraint 1 (Service Usability): verifies a service S that meets customer request R; it should satisfy the nonfunctional characteristics of the customer request according to the following constraint: and m' —e [1, M] (9) Constraint 4 (VM Hosting ability): To allow a Host J hosting a new virtual machine l' (free or migrated VM), we must verify these conditions: VMl'(ram) + YJ=1 VMl(ram) < Hostj(ram) (10) VM['(storage ) + Y1][=1 VMl(storage ) < Hostj (storage) (11) VMv(bw) + YJ=1 VMt(bw) < Hostj(bw) (12) VM[' (mips) + YJ^ VMl (mips) < Hostj (mips) (13) Where: Hostj(mips) = PEkj(mips) (14) and I' £ [1,V] (15) Constraint 5 (Ranking of Host Agents): The ranking Algorithm is based on mipsPrice. In case of finding two Hosts with the same price, then we must use mipsLoad: Best Host = min ('Hostj(mipsPrice) ,Hostj (mipsPrice)) and min (Hostj(mipsLoad) ,Hostj (mipsLoad)) , in case ■ Hostj(mipsPrice) = Hostj (mipsPrice) (16) Where: Hosti(usedMips) Hostj (mipsLoad) = Hostj(usedMips) = i(m^Ps) Hostj(mips) N 1=1 (17) (18) Constraint 6 (Best VM Hosting Selection): The selection of the best host between different hosts j and j' for hosting VM, it is organized on the basis of Hosting Cost: Smart Design for Resources Allocation in IoT. Informatica 44 (2020) 373-386 379 Variable Description Domain R O S CL R(Av) R(Rep) S(Av) S(Cap) S(Rep) S(hr) Host VM Host(pe) Host (ram) Host(bw) Host(Storage) Host(mips) Host(used_mips) Host(mips_load) Host(mips_price) Pe(mips) VM(size) VM(ram) VM(bw) VM(mips) VM(Costj) CL(length) CL(file size) CL(output size) CL(nbr_pe) CL(mips) Cl(Cost/,) The request of the customer The abstract object, each object is connected to physical device (gateway, sensor, actuator...). Sxw: The offered service x by the object w. Clm x w: Cloudlet of the service x from the object w The requested Availability. The requested Reputation. The Availability of the service. The Capacity of the service: it is the number of requests can be handled per unit of time. The Reputation of the service. The sum of current handled requests by the service. Physical host Virtual machine Processor in the host Size of host's ram Bandwidth of the host Size of the host's storage Sum of Capacities of Processors in the host Sum of Capacities of the Processors used by virtual machines hosted in the host The energy of the host, the Capacity of Processors used in accordance to the total capacity of Processors in the host. Unit price of mips in the host Capacity of the Processor VM's hard disc Size Size of the ram Bandwidth of the VM Sum of Capacities of the Processors of the VM The hosting Cost of the VM in the host J Size of the of CL. Total size of files of CL Size of the result of the execution of CL Max number of Processors of CL Capacity of the Processors of Cl it is the cost of resource exploitation of the submitted Cl in the VM l which is hosted in the host j {Ri, ... , Rv,... Rs} {Oi, ... , Ow,... Ot} {Sii, ... , Sx w,. ■ ■ Sut} {clill, ... , clm x w, . . . Clc u t} Rate value (%) Naturel number Rate value (%) Naturel number/time Naturel number Real number { Hosti, ... , Host],... Hosth} {vmi, ... , vml,... vmv} { Peii, ... , pekj,... peph} Naturel Number Real number Naturel Number Real number Real number Real number (%) Real number, obtained from proposed model for every host Real number Naturel Number Naturel Number Real number Real number Real number (DA) Real number Real number Real number Naturel Number Real number Real number (DA) Table 2: Defining of the Variables. Best Vm Hosting = min(vm(Costj) ,Vm(Costji)) (19) Constraint 7(Best Cloudlet Selection): selection of the best Cloudlet (service) for customer request is based on resources exploitation Cost: Best CL = min(ci(Costl¿) ,Cl(Costl,j,)) (20) Where: Cl(Costi j) : is the cost of resource exploitation of the submitted Cl in the VMi which is hosted in the host j. Cl(Costl'j') : is the cost of resource exploitation of the submitted Cl in the VMv which is hosted in the host j'. 380 Informatica 44 (2020) 373-386 M. Bali et al. 5 Scenario of interactions in the proposed system In this section, we present the main scenarios to provide and select RA for IoT service in the proposed system. Also, we illustrate the interactions between Agents by sequence diagrams where there are two object agents (OA1, OA2) and two datacenters agents (DCA1, DCA2). Every datacenter has two Host Agents (HA1, HA2). 5.1 Global interaction In this section, we explain the global interactions in the proposed system on three main levels: IoT Service request, Cloudlets Submission and Hosting Virtual Machines. The Search Algorithm and diagram in Figure 6 present the detailed descriptions for these interactions. 5.1.1 Search algorithm 8 if (Request, S) verify C1) then 9 if (S verifies C2) then 10 add S to SL 11 end if 12 end if 13 end for 14 Return SL 15 end. Algorithm 1 : Search Algorithm. 5.2 Cloudlets submission The process of cloudlets submission in datacenter and their hosts is illustrated in Figure 7. In addition, the Planning Algorithm (Algorithml) illustrates the process of cloudlets submission inside the Hots. 5.2.1 Planning algorithm Search Algorithm Input Request: customer request contains the nonfunctional characteristics (Av and Rep). SR: Service Registry contains the services list and their Characteristics. Output SL: List of found Services SL = 0 for all S in SR do 1 2 3 4 5 6 7 8 9 10 11 12 Algorithm Planning Input R: List of requested Cloudlets Output BCL: Best Cloudlet BCL = 0 For all VM in this host do While 3 CL £ R and (VM, CL) verify C3 do remove CL from R if (BCL=0 or (CL(cost), BCL(cost)) verify C7) then BCL = CL //the new CL is the best cloudlet end if end while end for — Best Servi ce .Cost .Provider — — — Figure 6: The global interactions in the system. 3 Smart Design for Resources Allocation in IoT. Informatica 44 (2020) 373-386 381 13 14 15 Return BCL end.. Algorithm 2: Planning Algorithm. 5.3 Hosting virtual machines In a case where there is no VM resource available, we launch the Hosting Virtual Machines to submit the requested Cloudlets. The BA starts the process of hosting free virtual machines as illustrated in Figure.8. 6 Illustrative example To illustrate our approach, we consider an example and discuss a case study of an IoT Application for smart transport system. We discuss this case study from two dimensions: 1. IoT service deployment First dimension: we focus on the aspect of the defining, publishing and searching services in addition to the different characteristics of these services and the customers' requests. We show a scenario of using this dimension by the following steps: Step 1: A company has IoT application for smart taxi. It provides the service of reservation of autonomous cars and tracking (monitoring program to be executed in the cloud) the car during the trip. Step 2: Each autonomous car (physical IoT) is connected to an agent (object agent) in the cloud (IoT layer). This agent publishes information about his service in MA services registry. The Table 3 illustrates some characteristics of the service in term of functional and nonfunctional. Figure 7: Cloudlets submission between datacenter and their hosts. 382 Informatica 44 (2020) 373-386 M. Bali et al. Figure 8: Hosting free virtual machines. ID Nonfunctional Functional Agent id Availability Reputation RAM(mb) Storage (mb) Cpu OA1 80% ***** 300 500 2 OA2 65% ** 500 1024 3 Table 3: Services characteristics. Step 3: The customer requests a car (service) via introducing the nonfunctional characteristics: availability, reputation and the type of desired trip. Step 4: First, the MA searches in the registry the available services that meet the customer request. In order to select the best service from the found services, the MA converts these services into cloudlets by using resources requirements (from functional characteristics), and sends them to BA in the next layer. 2. Service selection in cloud computing (Planning procedure) After obtaining the output (convert services to cloudlets) of the first dimension. We discuss how to the execution of the planning procedure in the second dimension in the cloud system functionality. We propose the cloud infrastructure that has two imaginary datacenters: where datacenter 1 has four hosts and three hosts for datacenter 2. In addition, there are eleven (11) Virtual machines (VMs) hosted in these different hosts. These VMs has already hosted thirty (30) Cloudlets, and BA needs to host seven (07) other requested cloudlets (CL31 ... CL37) in these Vms. In this case, the system looks forward to check the best resource allocation process for these cloudlets according to the cost and energy consumption as shown in the following steps. Smart Design for Resources Allocation in IoT. Informatica 44 (2020) 373-386 383 Step 1 (requests): BA distributes the received list of cloudlets to all DCA. As result, every DCA informs his HA agents who are in ON state to start the ranking process. Cloudlet id Length File size Outputs size Number of Pe 31 10MB 2 MB 1MB 2 32 13MB 1 MB 1MB 1 33 5MB 3 MB 2MB 1 34 10MB 1 MB 1MB 1 35 5MB 1 MB 1MB 2 36 2MB 3 MB 3MB 1 37 4MB 2MB 1MB 2 Table 4: Cloudlets List Distribute. Step 2 (Interne Negotiation 1 "Ranking process"): After, the HA agents (in ON state) share their prices and rank themselves into ascending order by the price. As illustrated in Table 5 the ranking in Datacenter I is: H2, H1, H4 where H2 has the price (1.5 $) which is the lowest price. And for Datacenter II: H1, H2 where H1 has the lowest price (1.4 $). At the end of the ranking, every first HA informs his DCA by the result of the ranking and asks him to send back the list of cloudlets. DCA1 rank Host id price 1 H2 1.5 $ 2 H1 4 $ 3 H4 7.8 $ DCA2 1 H1 1.4 $ 2 H2 5.5 $ Otherwise, in case of there is no Cloudlet that satisfies C1 in any HA, this HA retransmits the whole of the list of cloudlets to the next HA in the ranking list to consider his planning procedure. Step 4 (Local solution building): After the planning, every DCA receives the solution from HA agents and selects the best solution, which satisfies C7, and consider it as his local solution. Table 6 illustrates the local solutions in DC1 and DC2 for CL31 and CL37. Step 5 (External Negotiation) : The DCA agents share their solutions and negotiate to select the best solution using the best price (to satisfy C7). The DCA that is the owner of the best solution sends his solution to BA to build the global solution as illustrated in Table7. Cloudlet id price DCA id Host id Vm Id 33 37$ DCA2 H1 Vm8 Table 5: Hosts ranking Step 3 (Interne Negotiation 2 "Planning"): After the ranking process, the first HA in the each DCA gets the list of cloudlets from his DCA, and starts the planning procedure by checking available resources in the hosted VMs of his Host and verifies the constraint C1. If there are Cloudlets and VMs that verify C1, then the first HA selects the best cloudlet that satisfies the constraint C7. The first HA sends the selected cloudlet to the DCA in the term of (cloudlet, VM, host, cost) as reply. At the end of his procedure, it sends the rest of cloudlets (they do not satisfy C1) to the next HA in the ranking list to consider them in his planning procedure. This process is repeated continuously until the last HA in the ranking list or there is no rest cloudlet. Table 7: Global solution for the Broker Agent Step 6 (Show solutions and confirmation): After building the global solution, BA agent sends the cloudlet to MA. As result, MA sends the associated service of the cloudlet as response for customer request, enables (confirms) OA to launch the tracking device of the car and allows the customer to use the car with the lowest cost. 7 Simulation experiments To evaluate the performance of our approach, we used CloudSim [15] which is a Java based and extensible simulation framework for resource allocation algorithms. In this section, we discuss the experimental configuration and the results obtained by using our approach. 7.1 Experimental configuration We define the different parameters in our experiments as follows: datacenters, hosts, virtual machines, Processors and cloudlet as shown in Table 8. 7.2 Simulation results In this section, we present the experimental results and show the efficiency of our proposed approach by making a comparison between three solutions (First Fit algorithm (FF), the proposed Genetic Algorithm (GA) of Mezache et al. [15] and our algorithm (MD)). MD is built on MAS Local solution of DCA1 Local solution of DCA2 Cl id cost Host ID Vm ID Cl id cost Host ID Vm ID 31 39$ H2 Vm6 31 80$ Vm10 32 43$ Vm3 32 72$ H1 Vm10 33 41$ Vm1 33 37$ Vm8 34 78$ H1 Vm2 34 44$ Vm7 35 78$ Vm11 35 54$ H2 Vm7 36 46$ Vm3 36 50$ Vm8 37 40$ Vm2 37 93$ Vm7 Table 6: Local solution for every Datacenter 384 Informatica 44 (2020) 373-386 M. Bali et al. 80% 70% 60% 50% 40% 30% 20% 10% 0% ■ Load FF r. Load GA H Load MD 100 15% 11% 7% 500 22% 19% 12% 1000 39% 35% 29% 2000 52% 50% 41% 4000 63% 54% 46% 6000 70% 61% 49% ■ Load FF £ Load GA ■ Load MD Figure 9: Average Load by number of requested cloudlets. Parameters Values Max Length of cloudlet 50 Total number of cloudlets 500 3000 Total number of VMs 530 VM memory (RAM) 100 -1000 Number of PEs requirements 500 -1500 Number of datacenters 3 Number of hosts 47 Table 8: Values of experiments Parameters. with DCSP. In addition, we have defined performance metrics for the evaluation of the three proposed solutions. These solutions have common characteristics (Average System Load (ASL), Cost augmentation Rate (CR) and Available Mips (AM)). In the experiments, the customer request (Service Request) will be submitted to the IoT system for processing this request. In this case, the proposed system converts this request to a list of cloudlets (network bandwidth, Storage, CUP and load consumed) in order to fulfill this request with lowest cost by using our algorithm (MD). The main goal of our algorithm (MD) is to balance between the cost and energy of Datacenters hosts. The obtained results show that this goal is achieved through the common characteristics (metrics) that are shown as follow: a) Average System Load (ASL) This metric represents the energy consumption. The importance of this metric lies in specifying the datacenters status and reducing energy consumption in their hosts. Usually, the ideal system average load gives us a balance between the different hosts inside their datacenters. Figure 9 presents Average Load by the number of requested cloudlets (FF, GA, MD). The obtained results show the efficiency of our algorithms (MD) in getting a lower values of Average System Load (ASL) compared to FF and GA algorithms. The obtained (ASL) values after using our algorithm (MD) improves over the in terms of Average System Load, so that it does not exceed 50%. b) Cost augmentation Rate (CR) This metric represents the Cost augmentation rate by cloudlets number. The importance of this metric is manifested in reduce the costs of resources exploitation. The values of (CR) in Figure 10 demonstrate the positive contribution of our algorithm (MD) on reducing cost with almost of all groups. Our algorithm (MD) maintains the augmentation rate (CR) between (105% - 190%) except for the first groups (500 and 1000) where the GA has lower values in (CR). This due to the efficiency of our algorithm (MD) with groups which have an important number of cloudlets (more than 1000). c) Available Mips (AM) This metric represents the Available Mips by Cloudlets. The importance of this metric lies in measuring the computing performance and increasing Available Mips in datacenters. The more MIPS available for the datacenter, the lower cost of the resources exploitation. In Figure 11, we observe that the values of AM obtained by GA are bigger than the values of other algorithms in groups that have less than 1000 cloudlets. While, our algorithm (MD) has better values of AM when the number of cloudlets increases over 1000. 8 Conclusions and future work In this paper, we addressed a new approach for Resource Allocation (RA) in Internet of Things. Our approach is based to decentralized intelligence into distributed computing by using two technics: MAS and DCSP. In this hybridization, variables are used to present the resources. While the rules and policies are presented by constraints. They are distributed among multiple agents in the different layers of the system. The experiments show that the use of DCSP beside MAS pave the way for new efficient paradigms in solving problems related not only to Resource Allocation but also to provide smart solutions which are helpful to synchronize the IoT application services with computing devices. The obtained results show that the efficiency of our approach is manifested in: (1) reducing energy consumption in datacenters by about Smart Design for Resources Allocation in IoT. Informatica 44 (2020) 373-386 385 Figure 11: Cost augmentation rate by cloudlets number. 350% 300% 250% 200% 150% 100% 50% 0% 500 1000 2000 4000 6000 IUFF 198% 230% 320% 290% 167% tiGA 155% 185% 250% 210% 146% b.MD 184% 190% 150% 120% 105% IFF HMD Figure 10: Available Mips by Cloudlet. 50 %, (2) reducing cost augmentation Rate between (105% - 190%) and (3) increasing Available Mips in datacenters. Despite the provided advantages of our approach, we highlight the need of extending in its architecture to support other specific cases for IoT applications. Big data are generated day-to-day from the system, causing many challenges such as, the heterogeneity, scalability and simultaneous accessibility. In future research, we are looking for enhancing our approach by using more techniques of resources in IoT application services and extending the procedures by exploiting other approaches as: Search Approximation Algorithms, Artificial Intelligence and Fog environments. References [1] Anithakumari, S., Chandrasekaran, K., (2017). Interoperability based resource management in cloud computing by adaptive dimensional search, in: IEEE International Conference on Cloud Computing in Emerging Markets, CCEM. pp. 77-84. https://doi.org/10.1109/CCEM.2017.23 [2] Artan, M., Minarolli, D., Bernd, F., (2017). Distributed Resource Allocation in Cloud Computing Using Multi-Agent Systems. Telfor 9, 110-115. https://doi.org/10.5937/telfor1702110M [3] Bajo, J., De la Prieta, F., Corchado, J.M., Rodriguez, S., (2016). A low-level resource allocation in an agent-based Cloud Computing platform. Appl. Soft Comput. 48, 716-728. https://doi.org/10.1016/iasoc.2016.05.056 [4] Botta, A., De Donato, W., Persico, V., Pescape, A., (2016). Integration of Cloud computing and Internet of Things: A survey. Futur. Gener. Comput. Syst. 56, 684-700. https://doi.org/10.1016/j.future.2015.09.021 [5] Chen, J., Han, X., Jiang, G., (2014). A Negotiation Model Based on Multi-agent System under Cloud 386 Informatica 44 (2020) 373-386 Computing, in: In The Ninth International MultiConference on Computing in the Global Information Technology. pp. 157-164. [6] Ezugwu, A.E., Buhari, S.M., Junaidu, S.B., (2013). Virtual Machine Allocation in Cloud Computing Environment. Int. J. Cloud Appl. Comput. 3, 47-60. https://doi.org/10.4018/iicac.2013040105 [7] Fayazi, M., Reza, M., Enayatollah, S., (2016). Resource Allocation in Cloud Computing Using Imperialist Competitive Algorithm with Reliability Approach. Int. J. Adv. Comput. Sci. Appl. 7, 323331. https://doi.org/10.14569/IJACSA.2016.070346 [8] Gawanmeh, A., April, A., (2016). A Novel Algorithm for Optimizing Multiple Services Resource Allocation. Int. J. Adv. Comput. Sci. Appl. 7, 428-434. https://doi.org/10.14569/IJACSA.2016.070655 [9] Ghanbari, Z., Jafari Navimipour, N., Hosseinzadeh, M., Darwesh, A., (2019). Resource allocation mechanisms and approaches on the Internet of Things. Cluster Comput. 22, 1253-1282. https://doi.org/10.1007/s10586-019-02910-8 [10] Gutierrez-Garcia, J.O., Sim, K.M., (2011). Agents for cloud resource allocation: An amazon EC2 case study. Commun. Comput. Inf. Sci. 261 CCIS, 544553. https://doi.org/10.1007/978-3-642-27180-9 66 [11] Jing, L., Weicai, Z., Licheng, J., (2006). A multiagent evolutionary algorithm for constraint satisfaction problems. IEEE Trans. Syst. Man Cybern. Part B 36, 54-73. https://doi.org/10.1109/TSMCB.2005.852980 [12] Lu, D., Ma, J., Xi, N., (2015). A universal fairness evaluation framework for resource allocation in cloud computing. China Commun. 12, 113-122. https://doi.org/10.1109/CC.2015.7112034 [13] Ma, X., Gao, H., Xu, H., Bian, M., (2019). An IoT-based task scheduling optimization scheme considering the deadline and cost-aware scientific workflow for cloud computing. Eurasip J. Wirel. Commun. Netw. 2019. https://doi.org/10.1186/s13638-019-1557-3 [14] Mataoui, M., Sebbak, F., Beghdad Bey, K., Benhammadi, F., (2015). CSP formulation for scheduling independent jobs in cloud computing. CLOSER 2015 - 5th Int. Conf. Cloud Comput. Serv. Sci. Proc. 105-112. https://doi.org/10.5220/0005438801050112 [15] Mezache, C., Kazar, O., Bourekkache, S., (2016). A Genetic Algorithm for Resource Allocation with Energy Constraint in Cloud Computing, in: International Conference on Image Processing, M. Bali et al. Production and Computer Science (ICIPCS'2016) London (UK), March 26-27, 2016 Pp.62-69 A. pp. 62-69. https://doi.org/10.17758/UR.U0316020 [16] Mora, H., Signes-Pont, M.T., Gil, D., Johnsson, M., (2018). Collaborative working architecture for IoT-based applications. Sensors (Switzerland) 18. https://doi.org/10.3390/s18061676 [17] Nair, A.S., Hossen, T., Campion, M., Selvaraj, D.F., Goveas, N., Kaabouch, N., Ranganathan, P., (2018). Multi-Agent Systems for Resource Allocation and Scheduling in a Smart Grid. Technol. Econ. Smart Grids Sustain. Energy 3, 1-15. https://doi.org/10.1007/s40866-018-0052-y [18] Rivera, W., (2017). Sustainable cloud and energy services: Principles and practice. Sustain. Cloud Energy Serv. Princ. Pract. 1-268. https://doi.org/10.1007/978-3-319-62238-5 [19] Roogi, R.H., (2015). Big Data Solution by Divide and Conquer technique in Parallel Distribution System using Cloud Computing. Orient. J. Comput. Sci. Technol. 8, 9-12. [20] Shrimali, B., Bhadka, H., Patel, H., (2018). A fuzzy-based approach to evaluate multi-objective optimization for resource allocation in cloud. Int. J. Adv. Technol. Eng. Explor. 5, 140-150. https://doi.org/10.19101/IJATEE.2018.542020 [21] Singh, A., Viniotis, Y., (2017). Resource allocation for IoT applications in cloud environments. 2017 Int. Conf. Comput. Netw. Commun. ICNC 2017 719723. https://doi.org/10.1109/ICCNC.2017.7876218 [22] Singh, M.P., Chopra, A.K., (2017). The Internet of Things and Multiagent Systems: Decentralized Intelligence in Distributed Computing. Proc. - Int. Conf. Distrib. Comput. Syst. 1738-1747. https://doi.org/10.1109/ICDCS.2017.304 [23] Son, S., Sim, K.M., (2012). A price-and-time-slot-negotiation mechanism for cloud service reservations. IEEE Trans. Syst. Man, Cybern. Part B Cybern. 42, 713-728. https://doi.org/10.1109/TSMCB.2011.2174355 [24] Suciu, G., Suciu, V., Martian, A., Craciunescu, R., Vulpe, A., Marcu, I., Halunga, S., Fratu, O., (2015). Big Data, Internet of Things and Cloud Convergence - An Architecture for Secure E-Health Applications. J. Med. Syst. 39. https://doi.org/10.1007/s10916-015-0327-y [25] Zahoor, S., Mir, R.N., (2018). Resource management in pervasive Internet of Things: A survey. J. King Saud Univ. Inf. Sci. https://doi.org/10.1016/i.iksuci.2018.08.014 https://doi.org/10.31449/inf.v44i3.3280 Informatica 44 (2020) 387-366 361 How to Define Co-occurrence in a Multidisciplinary Context? Mathieu Roche CIRAD, TETIS, F-34398 Montpellier, France TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier, France E-mail: mathieu.roche@cirad.fr, http://textmining.biz/Staff/Roche Position paper Keywords: co-occurrence, collocation, phrase, n-gram, skyp-n-gram, association rule, sequential pattern Received: October 28, 2019 This position paper presents a comparative study of co-occurrences. Some similarities and differences in the definition exist depending on the research domain (e.g. linguistics, natural language processing, computer science). This paper discusses these points and deals with the methodological aspects in order to identify co-occurrences in a multidisciplinary paradigm. Povzetek: Predstavljena je analiza sočasnosti. 1 Introduction Determining co-occurrences in corpora is challenging for different applications such as classification, translation, terminology building, etc. More generally, co-occurrences can be identified with all types of data, e.g. databases [8], texts [30], images [38], music [15], video [19], etc. The co-occurrence concept has different definitions depending on the research domain (i.e. linguistics, natural language processing (NLP), computer science, biology, etc.). This position paper reviews the main definitions in the literature and discusses similarities and differences according to the domains. This type of study can be crucial in the context of data science, which is geared towards developing a multidisciplinary paradigm for data processing and analysis, especially textual data. Here the co-occurrence concept related to textual data is discussed. Note that before their validation by an expert, co-occurrences of words are often considered as candidate terms. First, Section 2 of this paper details the different definitions of co-occurrence according to the studied domains. Section 3 discusses and compares these different aspects based on their intrinsic definition but also on the associated methodologies in order to identify them. Finally, Section 4 lists some perspectives. 2 Co-occurrence in a multidisciplinary context 2.1 Linguistic viewpoint In linguistics, one notion that is broadly used to define the term is called lexical unit [23] and polylexical expression [16]. The latter represents a set of words having an au- tonomous existence, which is also called multi-word expression [33]. In addition, several linguistics studies use the collocation notion. [10] gives two properties defining a collocation. First, collocation is defined as a group of words having an overall meaning that is deducible from the units (words). For example, climate change is considered as a collocation because the overall meaning of this group of words can be deduced from both words climate and change. On the other hand, the expression to rain cats and dogs is not a collocation because its meaning cannot be deduced from each of the words; this is called a fixed expression or an idiom. A second property is added by [10] to define a collocation. The meaning of the words that make up the collocation must be limited. For example, buy a dog is not a collocation because the meaning of buy is not limited. 2.2 NLP viewpoint In the natural language processing (NLP) domain, the cooccurrence notion refers to the general phenomenon where words are present together in the same context. More precisely, several principles are used that take contextual criteria into account. First, the terms or phrases [6, 11] can respect syntactic patterns (e.g. adjective noun, noun noun, noun preposition noun, etc.). Some examples of extracted phrases (i.e. syntactic co-occurrences) are given in Table 1. In addition, the methods without linguistic filtering are also conventionally used in the NLP domain by extracting n-grams of words (i.e. lexical co-occurrences) [25, 35]. n-grams are contiguous sequences of n words extracted from a given sequence of text (e.g. the bi-grams1 x y and y z are associated with the text x y z). n-grams that allow gaps 1 n-grams with n =2. 388 Informatica 44 (2020) 387-393 M. Roche are called skip-n-grams (e.g. the skip-bi-grams x y, x z, y z are related to the text x y z). Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships [27]. Some examples of n-grams and skip-n-grams are given in Table 1. After summarizing the term notion in the NLP domain, the following section discusses these aspects in the computer science context, particularly in data mining. Note that the NLP domain may be considered as being located at the linguistics and computer science interface. 2.3 Computer science viewpoint In the data mining domain, co-occurring items are called association rules [1, 39] and they could be candidates for construction or enrichment of terminologies [12]. In the data mining context, the list of items corresponds to the set of available articles. With textual data, items may represent the words present in sentences, paragraphs, or documents [2, 29]. A transaction is a set of items. A set of transactions is a learning set used to determine association rules. Some extensions of association rules are called sequential patterns. They take into account a certain order of extracted elements [18, 34] with an enriched representation related to textual data as follows: - objects represent texts or pieces of texts, - items are the words of a text, - itemsets represent sets of words present together within a sentence, paragraph or document, - dates highlight the order of sentences within a text. There are several algorithms for discovering association rules and sequential patterns. One of the most popular is Apriori, which is used to extract frequent itemsets from large databases. The Apriori algorithm [1] finds frequent itemsets where k-itemsets are used to generate k + 1-itemsets. Association rules and sequential patterns of words are often used in text mining for different applications, e.g. terminology enrichment [12], association of concept instances [5, 29], classification [18, 34], etc. 3 Discussion: comparative study of definitions and approaches This section proposes a comparison of: (i) co-occurrence definitions (see Section 3.1), (ii) automatic methods in order to identify them (see Section 3.2). This section highlights some similarities and differences between domains. 3.1 Co-occurrence extraction The general definition of co-occurrence is finally close to association rules in data mining domain. Note that the integration of windows2 in the association rule or sequential pattern extraction process enables us to have similarity with skip-n-gram extraction. The integration of syntactic criteria makes it possible to extract more relevant candidate terms (see Table 1). Such information is typically taken into account in NLP to extract terms from general or specialized domains [20, 24, 28, 32]. Table 1 highlights relevant terms extracted using linguistic patterns (e.g. climate change, water cycle, significant change). The use of linguistic patterns tends to improve precision values. Generally other methods such as skip-bi-grams return lower precision, i.e. many extracted candidates are irrelevant (e.g. climate the). But this kind of method enables extraction of some relevant terms not found with linguistic patterns (e.g. cycle expected); then the recall can be improved. Table 2 presents research domains related to different types of candidates, i.e. collocations, polylexical expressions, phrases, n-grams, association rules, sequential patterns. Table 3 summarizes the main criteria described in the literature. Note that the extraction is more flexible and automatic when there are fewer criteria. In this table, two types of information are associated with the different criteria. The first one (marked with /) designates the characteristics given by the co-occurrence definitions. The second type of information (marked with represents characteristics that are implemented in many extensions of the state-of-the-art. Table 3 shows that the semantic criterion is seldom associated with co-occurrence definitions. This criterion is however taken into account in linguistics. For example, semantic aspects are taken into account in several studies [17, 22, 26]. In this context [26] introduced lexical functions rely on semantic criteria to define the relationships between collocation units. For instance, a given relation can be expressed in various ways between the arguments and their values, like Centr (the center, culmination of) that returns different meanings3: - Centr(crisis) = the peak - Centr(desert) = the heart - Centr(forest) = the thick - Centr(glory) = summit - Centr(life) = prime In the data mining domain, semantic information is used in two main directions. The first one involves filtering the 2Association Rule with Time-Windows (ARTW) [39]. 3http://people.brandeis.edu/ ~smalamud/ling130/lex_functions.pdf How to Define Co-occurrence in... Informatica 44 (2020) 387-393 389 Sentence (input) With climate change the water cycle is expected to undergo significant change. Candidates (output) Phrases (noun noun, adjective noun) climate change water cycle, significant change bi-grams of words With climate, climate change, change the, the water, water cycle, cycle is, is expected, expected to, to undergo, undergo significant, significant change 2-skip-bi-grams With climate, With change, With the, climate change, climate the, climate water, change the, change water, change cycle, the water, the cycle, the is, water cycle, water is, water expected, cycle is, cycle expected, cycle to, is expected, is to, is undergo, expected to, expected undergo, expected significant, to undergo, to significant, to change, undergo significant, undergo change, significant change Table 1: Examples of candidates extracted with different NLP techniques. Definitions Domains Collocations L Polylexical expressions L + NLP Phrases NLP n-grams NLP + CS Association rules CS Sequential patterns CS Table 2: Summary of the main domains associated with expressions (L: linguistics, NLP: natural language processing, CS: computer science). results if they respect certain semantic information (e.g. phrases or patterns where a word is an instance of a semantic resource). Other methods involve semantic resources in the knowledge discovery process, i.e. the extraction is driven by semantic information [5]. In recent studies in the NLP domain, the semantic aspects are based on word embedding, which provides a dense representation of words and their relative meanings [14,40]. Finally, note that several types of co-occurrence are often used in different domains. For example, polylexical expressions are commonly used in NLP and also in linguistics. In addition, n-grams is currently used in NLP and computer science domains. For example, n-grams of words are often used to build terminologies (NLP domain) but also as features for machine learning algorithms (computer science domain) [35]. Table 4 summarizes the main types of criteria (i.e. statistic, morpho-syntactic, and semantic) used for extracting cooccurrences according to the research domains considered in this paper. After presenting the characteristics associated with the co-occurrence notion in a multidisciplinary context, the following section compares the methodological viewpoints to identify these elements according to the domains. 3.2 Ranking of co-occurrences Co-occurrence identification by automatic systems is generally based on the use of quality measures and/or algorithms. This section provides two illustrative examples that show similarities between approaches according the domains. 3.2.1 Mutual Information and Lift measure Firstly the use of specific statistical measures from different domains is highlighted. This subsection focuses on the study of Mutual Information (MI). This measure is often used in the NLP domain to measure the association between words [9]. MI (see formula (3.1)) compares the probability of observing x and y together (joint probabil- 390 Informatica 44 (2020) 387-393 M. Roche Ordered Sequences Morpho-syntactic Semantic sequences with gaps information information Collocations / / ★ Polylexical expressions / / Phrases / / n-grams / ★ Association rules / Sequential patterns / / Table 3: Summary of the main criteria associated with co-occurrence identification. / represents the respect of the criterion by definition. * is present when extensions are currently used in the state-of-the-art. Statistic information Morpho-syntactic information Semantic information Linguistics / ★ NLP / / ★ Data mining / ★ ★ Table 4: Summary of the main criteria associated with research domains. / represents the respect of the criterion for extracting co-occurrences from textual data. * is present when extensions are currently used in the state-of-the-art. ity) with the probability of observing x and y independently (chance) [9]. I (x) = log2 P (x,y) (3.1) P (x)P (y) In general, word probabilities P(x) and P(y) correspond to the number of observations of x and y in a corpus normalized by the size of the corpus. Some extensions of MI are also proposed. The algorithm PMI-IR (Pointwise Mutual Information and Information Retrieval) described in [36] queries the Web via the AltaVista search engine to determine appropriate synonyms for a given query. For a given word, denoted x, PMI-IR chooses a synonym among a given list. These selected terms, denoted yi, i e [1, n], correspond to TOEFL questions. The aim is to compute the yi synonym that gives the best score. To obtain scores, PMI-IR uses several measures based on the proportion of documents where both terms are present. Turney's formula is given below (3.2): It is one of the basic measures used in [36]. It is inspired from MI described in [9]. With this formula (3.2), the proportion of documents containing both x and yi (within a 10 word window) is calculated and compared with the number of documents containing the word yi. The higher this proportion, the more x and yi are seen as synonyms. '(Vi) = nb(x NEAR y) (3.2) nb(yi) nb(x) computes the number of documents containing the word x (i.e. nb corresponds to number of webpages returned by search engines), NEAR (used in the 'advanced research' field of AltaVista) is an operator that identifies if two words are present in a 10 word wide window. This kind of web mining approach is also used in many NLP applications, e.g. (i) computing the relationship between host and clinical sign for an epidemiology surveillance system [3], (ii) computing the dependency of words of acronym definitions for word-sense disambiguation tasks [31]. The probabilities are generally symmetric (i.e. P(x, y) = P(y, x)), while the original MI measure is also symmetric. But the association ratio applied in the NLP domain is not symmetric, i.e. the occurrence number of pairs of words "x y" and "y x" generally differ. Moreover the meaning and relevance of phrases should differ according to the word order in a text, e.g. first lady and lady first. Finally, MI is very close to the lift measure [7, 37, 4] in data mining. This measure identifies relevant association rules (see formula (3.3)). The lift measure evaluates the relevance of co-occurrences only (not implication) and how x and y are independent [4]. lift(x ^ y) conf (x ^ y) (3.3) sup(y) This measure is based on both confidence and support criteria, which in turn are based on association rule (x ^ y) identification. Support is an indication of how frequently the itemset appears in the dataset. Confidence is a standard measure that estimates the probability of observing y given x (see formula 3.4) f (x ^ y) = sup(x U y) (3.4) sup(x) Note that other quality measures of the data mining domain, such as Least contradiction or Conviction [21], could How to Define Co-occurrence in... Informatica 44 (2020) 387-393 391 be tailored to deal with textual data. 3.2.2 C-value and closed itemset Another example is the methodological similarities associated with different approaches. For example, the C-value approach [13] used in the NLP domain [24, 20] favors terms that do not appear to a significant extent in longer terms. For example, in a specialized corpus related to ophthalmology, the work of [13] shows that a more general term such as soft contact is irrelevant, whereas a longer and therefore more specific term such as soft contact lens is relevant. This kind of measure is particularly relevant in the biology domain [24, 20]. In addition, in the computer science domain (i.e. data mining), the notion of closed itemset is finally very close to the C-value approach. In this context, a frequent itemset is considered as closed if none of its supersets4 has the same support (i.e. frequency). This section and both illustrative examples confirm the importance of having a real multidisciplinary viewpoint on the methodological aspects in order to build scientific bridges and thus contribute to the development of the emerging data science domain. 4 Conclusion and Future Work This position paper proposes a discussion on similarities as well as differences in the definition of co-occurrence according to research domains (i.e. linguistics, NLP, computer science). The aim of this position paper is to show the bridges that exist between different domains. In addition, this paper highlights some similarities in the methodologies used in order to identify co-occurrences in different domains. We could extend the discussion to other domains. For example, methodological transfers are currently applied between bioinformatics and NLP. For example, the use of edition measures (e.g. Levenshtein distance) for sequence alignment tasks (bioinformatics) v.s. string comparison (NLP). Acknowledgments This work is funded by the SONGES project (Occitanie and FEDER) - Heterogeneous Data Science (http:// textmining.biz/Projects/Songes). References [1] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB '94, 4A superset is defined with respect to another itemset, for example {M1, M2, M3} is a superset of {M1, M2}. B is superset of A if card(A) < card(B) and A C B. pages 487-499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id= 645920.672836. [2] Amihood Amir, Yonatan Aumann, Ronen Feldman, and Moshe Fresko. Maximal association rules: A tool for mining associations in text. Journal of Intelligent Information Systems, 25(3):333-345, Nov 2005. https://doi.org/10.10 07/ s10844-005-0196-9. [3] Elena Arsevska, Mathieu Roche, Pascal Hendrikx, David Chavernac, Sylvain Falala, Renaud Lancelot, and Barbara Dufour. Identification of associations between clinical signs and hosts to monitor the web for detection of animal disease outbreaks. International Journal of Agricultural and Environmental Information Systems, 7(3):1-20, 2016. https://doi.org/10.4 018/IJAEIS. 2016070101. [4] Paulo J. Azevedo and Alipio M. Jorge. Comparing rule measures for predictive association rules. In Proceedings of the 18th European Conference on Machine Learning, ECML '07, pages 510-517, Berlin, Heidelberg, 2007. Springer-Verlag. http://dx.doi.org/10.1007/ 97 8-3-54 0-7 4 95 8-5_4 7. [5] Soumia Lilia Berrahou, Patrice Buche, Juliette Dibie, and Mathieu Roche. Xart: Discovery of correlated arguments of n-ary relations in text. Expert Systems with Applications, 73(Supplement C):115 - 124, 2017. https://doi.org/10.1016/j.eswa. 2016.12.028. [6] Didier Bourigault. Surface grammatical analysis for the extraction of terminological noun phrases. In Proceedings of the 14th Conference on Computational Linguistics - Volume 3, COLING '92, pages 977-981, Stroudsburg, PA, USA, 1992. Association for Computational Linguistics. http://dx.doi.org/10.3115/992383. 992415. [7] Sergey Brin, Rajeev Motwani, and Craig Silverstein. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of the 1997ACM SIGMOD International Conference on Management of Data, SIGMOD '97, pages 265-276, New York, NY, USA, 1997. ACM. http://doi.acm.org/10.114 5/2532 60. 253327. [8] Hui Cao, George Hripcsak, and Marianthi Marka-tou. A statistical methodology for analyzing co-occurrence data from a large sample. Journal of 392 Informatica 44 (2020) 387-393 M. Roche Biomedical Informatics, 40(3):343 - 352, 2007. https://doi.org/10.1016/j.jbi.2006. 11.003. [9] Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1):22-29, March 1990. http://dl.acm.org/citation.cfm?id= 89086.89095. [10] André Clas. Collocations et langues de spécialité. Meta, 39(4):576-580, 1994. https://doi.org/10.72 02/0 02327ar. [11] Béatrice Daille, Éric Gaussier, and Jean-Marc Langé. Towards automatic extraction of monolingual and bilingual terminology. In Proceedings of the 15th Conference on Computational Linguistics - Volume 1, COLING '94, pages 515-521, Stroudsburg, PA, USA, 1994. Association for Computational Linguistics. https://doi.org/10.3115/991886. 991975. [12] Lisa Di-Jorio, Sandra Bringay, Céline Fiot, Anne Laurent, and Maguelonne Teisseire. Sequential patterns for maintaining ontologies over time. In On the Move to Meaningful Internet Systems: OTM 2008, OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008, Monterrey, Mexico, November 9-14, 2008, Proceedings, Part II, pages 1385-1403, 2008. https://doi.org/10.10 07/ 97 8-3-54 0-8 8 87 3-4_32. [13] Katerina Frantzi, Sophia Ananiadou, and Hideki Mima. Automatic recognition of multi-word terms: the C-value/NC-value method. International Journal on Digital Libraries, 3(2):115-130, Aug 2000. https://doi.org/10.10 07/ s007999900023. [14] Debasis Ganguly, Dwaipayan Roy, Mandar Mitra, and Gareth J.F. Jones. Word embedding based generalized language model for information retrieval. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '15, pages 795-798, New York, NY, USA, 2015. ACM. http://doi.acm.org/10.114 5/27 664 62. 2767780. [15] Arijit Ghosal, Rudrasis Chakraborty, Bibhas Chandra Dhara, and Sanjoy Kumar Saha. Song Classification: Classical and Non-classical Discrimination Using MFCC Co-occurrence Based Features, pages 179- 185. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011. https://doi.org/10.10 07/ 97 8-3-64 2-27183-0_19. [16] Gaston Gross. Les expressions figées en français. Ophrys, 1996. [17] Ulrich Heid. Towards a corpus-based dictionary of german noun-verb collocations. In Proceedings of the Euralex International Congress, pages 301-312, 1998. [18] Simon Jaillet, Anne Laurent, and Maguelonne Teisseire. Sequential patterns for text categorization. Intelligent Data Analysis, 10(3):199-214, May 2006. https://doi.org/10.32 33/ IDA-2006-10302. [19] Hyun-Ho Jeon, Andrea Basso, and Peter F. Driessen. Camera Motion Detection in Video Sequences Using Motion Cooccurrences, pages 524-534. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005. https://doi.org/10.1007/11581772_4 6. [20] Min Jiang, Joshua C. Denny, Buzhou Tang, Hongxin Cao, and Hua Xu. Extracting semantic lexicons from discharge summaries using machine learning and the c-value method. In AMIA 2012, American Medical Informatics Association Annual Symposium, Chicago, Illinois, USA, November 3-7, 2012, 2012. https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC354 05 81/. [21] Stephane Lallich, Olivier Teytaud, and Elie Prud-homme. Association Rule Interestingness: Measure and Statistical Validation, pages 251-275. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007. https://doi.org/10.10 07/ 97 8-3-54 0-4 4 918-8_11. [22] Marleen Laurens. La description des collocations et leur traitement dans les dictionnaires. Romaneske, 4:44-51, 1999. http://www.vlrom.be/pdf/994colloc. pdf. [23] Carmen Lederer. La notion d'unité lexicale et l'enseignement du lexique. The French Review, 43(1):96-98, 1969. https://www.jstor.org/stable/38 67 36. [24] Juan Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche, and Maguelonne Teisseire. Biomedical term extraction: Overview and a new methodology. Information Retrieval Journal, 19(1-2):59-99, April 2016. http://dx.doi.org/10.1007/ s10791-015-9262-2. [25] Sean Massung and Chengxiang Zhai. Non-native text analysis: A survey. Natural Language Engineering, 22(2):163-186, 2016. https://doi.org/10.1017/ S1351324915000303. How to Define Co-occurrence in... Informatica 44 (2020) 387-393 393 [26] Igor A. Mel'cuk, Nadia Arbatchewsky-Jumarie, Léo Elnitsky, and Adèle Lessard. Dictionnaire explicatif et combinatoire du francais contemporain. Presses de l'Université de Montréal, Montréal, Canada, 1984,1988,1992,1999. Volume 1, 2, 3, 4. [27] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Cor-rado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS'13, pages 3111-3119, USA, 2013. Curran Associates Inc. http://dl.acm.org/citation.cfm?id= 2999792.2999959. [28] Goran Nenadic, Irena Spasic, and Sophia Ananiadou. Terminology-driven mining of biomedical literature. In Proceedings of the 2003 ACM Symposium on Applied Computing, SAC '03, pages 83-87, New York, NY, USA, 2003. ACM. http://doi.acm.org/10.114 5/952532. 952553. [29] Julien Rabatel, Yuan Lin, Yoann Pitarch, Hassan Saneifar, Claire Serp, Mathieu Roche, and Anne Laurent. Visualisation des motifs séquentiels extraits à partir d'un corpus en ancien français. In Extraction et gestion des connaissances (EGC'2008), pages 237-238, 2008. https://editions-rnti.fr/?inprocid= 1000605. [30] Mathieu Roche, Jérôme Azé, Oriane Matte-Tailliez, and Yves Kodratoff. Mining texts by association rules discovery in a technical corpus. In Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM'04 Conference held in Zakopane, Poland, May 17-20, 2004, pages 89-98, 2004. https://link.springer.com/chapter/ 10.1007/97 8-3-540-39985-8_10. [31] Mathieu Roche and Violaine Prince. A web-mining approach to disambiguate biomedical acronym expansions. Informatica (Slovenia), 34(2):243-253, 2010. http://www.informatica.si/index. php/informatica/article/view/296. [32] Mathieu Roche, Maguelonne Teisseire, and Gaurav Shrivastava. Valorcarn-TETIS: Terms extracted with Biotex [dataset]. CIRAD Dataverse, 2017. http://dx.doi.org/10.18167/DVN1/ PGQGQL. [33] Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann A. Copestake, and Dan Flickinger. Multiword expressions: A pain in the neck for NLP. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, CICLing '02, pages 1-15, London, UK, UK, 2002. Springer-Verlag. http://dl.acm.org/citation.cfm?id= 647344.724004. [34] Claire Serp, Anne Laurent, Mathieu Roche, and Maguelonne Teisseire. La quête du graal et la réalité numérique. Corpus, 7, 2008. https://doi.org/10.4 000/corpus.1512. [35] Piyoros Tungthamthiti, Kiyoaki Shirai, and Masnizah Mohd. Recognition of sarcasm in tweets based on concept level sentiment analysis and supervised learning approaches, pages 404-413. Faculty of Pharmaceutical Sciences, Chulalongkorn University, 2014. https://www.aclweb.org/anthology/ Y14-1047. [36] Peter D. Turney. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the 12th European Conference on Machine Learning, ECML '01, pages 491-502, London, UK, UK, 2001. Springer-Verlag. http://dl.acm.org/citation.cfm?id= 645328.650004. [37] Sebastián Ventura and José María Luna. Quality Measures in Pattern Mining, pages 27-44. Springer International Publishing, Cham, 2016. https://doi.org/10.10 07/ 97 8-3-319-3385 8-3_2. [38] Manisha Verma, Balasubramanian Raman, and Sub-rahmanyam Murala. Local extrema co-occurrence pattern for color and texture image retrieval. Neuro-comput., 165(C):255-269, October 2015. http://dx.doi.org/10.1016/j.neucom. 2015.03.015. [39] Yong Yin, Ikou Kaku, Jiafu Tang, and JianMing Zhu. Association Rules Mining in Inventory Database, pages 9-23. Springer London, London, 2011. https://doi.org/10.10 07/ 97 8-1-84 996-338-1_2. [40] Hamed Zamani and W. Bruce Croft. Relevance-based word embedding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '17, pages 505-514, New York, NY, USA, 2017. ACM. http://doi.acm.org/10.114 5/3077136. 3080831. 394 Informática 44 (2020) 387-393 M. Roche https://doi.org/10.31449/inf.v44i3.3280 Informatica 44 (2020) 395-366 361 Association Rule Model of On-demand Lending Recommendation for University Library Shixin Xu Huaiyin Institute of Technology, Huaian, Jiangsu 223003, China E-mail: xusx@hyit.edu.cn Student paper Keywords: library, recommendation, association rules, Bayes Received: August 31, 2020 University library that is connected to the Internet is more convenient to search, but the huge amount of data is not convenient for users who lack a precise target. In this study, the traditional association rule algorithm was improved by a Bayesian algorithm, and then simulation experiment was carried out taking borrowing records of 1000 students as examples. In order to verify the effectiveness of the improved algorithm, it was compared with the traditional association rule algorithm and collaborative filtering algorithm. The results showed that the recommendation results of the improved association rule recommendation algorithm were more relevant to students' majors, and the coincidence degree of different students was low. In the objective evaluation of the performance of the algorithm, the accuracy, recall rate and F value showed that the personalized recommendation performance of the improved association rule algorithm was better and the improved association rule algorithm could recommend users with the book type that they need. Povzetek: Opisan je asociativni algoritem z dodanim Bayesovim klasifikatorjem za iskanje po univerzitetni knjižnici. 1 Introduction The arrival of the Internet era has made great changes in our lives, the most intuitive expression of which is that the amount of information that can be obtained far exceeds the era before the emergence of the Internet [1]. Although the Internet with a large amount of information greatly facilitates people's lives, the huge amount of information also greatly increases the difficulty of people's retrieval of effective information. The same is true in the university libraries. Generally, the number of books in a university library is very large. In order to meet the needs of university teachers and students, the selection of books is often very rich [2]. According to the method of field search, it takes time and effort to browse the bookshelves one by one. When the Internet is combined with the university library, the book information in the university library is uploaded to the Internet, and the university teachers and students can simply retrieve the desired book information by using the Internet [3]. However, similar to the Internet described before, although the amount of book information in university library cannot be compared with the amount of data in the whole Internet, it is still a huge amount of data for university teachers and students. If there is a clear goal, it can be accurately retrieved, but if there is only a vague demand range, it is difficult to accurately retrieve the required information. Zhang [4] proposed a personalized book recommendation algorithm based on time series collaborative filtering recommendation and found through experiment that the book recommendation algorithm met the professional learning needs of college students. Sohail et al. [5] put forward an opinion mining based recommendation technology which provided college students with promising books in the syllabus and found through experiment that the accuracy of this method improved by 55% and it could be applied to the recommendation of other products. Chahinez et al. [6] proposed a book recommendation method based on complex user query, and the experimental results showed that the combination with retrieval model could significantly improve the standard ranked retrieval metrics. In this study, the traditional association rule algorithm was improved by a Bayesian algorithm, and then simulation experiment was carried out taking borrowing records of 1000 students as examples. In order to verify the effectiveness of the improved algorithm, it was compared with the traditional association rule algorithm and collaborative filtering algorithm. 2 Book recommendation algorithm based on association rules 2.1 Association rule algorithm Association rule recommendation algorithm [7] is to find the connection between different project elements from a large data set and regard the connection whose degree exceeds the set threshold as a strong association rule to guide the recommendation of books. The key point of the 396 Informatica 44 (2020) 395-399 S. Xu association rule recommendation algorithm is to find the strong rule in the database. The algorithm is generally divided into two steps: ® search frequent sets in the database; ® search strong rules in the frequent set. Candidate item set 1 Frequent item set 1 User Item I A.B.C 2 B.C.D 3 BJ>.E 4 A.B.C.D Item Hi®?? |B.C,D| 1/2 Candidate item set 3 Item wr (A| t/2 |B| 1 (CI 3/4 ID) 3/4 Item |B,C) 3/4 ¡BJ)| 3/4 Frequent item set 2 Item m? (B| i !C) 3/4 |D| 3/4 | Scan Item |B,C| 3/4 IB.DI 3/4 |C,D| 1/2 Candidate item set 2 Figure 1: The basic diagram of association rule algorithm flow. For convenience, as shown in Figure 1, numbers represent users, letters represent the names of books borrowed by users, and the number of records is reduced to 4 users and 4 books. Firstly, a book is taken as the candidate item, and then the support degree of each item in candidate project set 1 is calculated [8] using the following formula: SUP =- N (1) where SUP is the support degree of item, N is the total number of records in the database, for example, there are 4 borrowing records of users in Figure 1, and n is the number of records containing the item. According to the set support degree threshold, frequent set 1 is filtered out, then frequent set items are combined to form a new candidate set, and the new frequent set is filtered out; the above operation is repeated until no candidate set can be obtained. In addition to calculating the support degree, the confidence degree should also be calculated for searching the strong rule in frequent item set. Taking {B, C} item in frequent item set 2 in Figure 1 as an example, a strong association rule may produce between its non-empty subset and the set of remaining elements, then the possible strong rule is {B} ^ {C} and {C} ^ {B}. For association item set X ^Y, the calculation formula of confidence degree [9] is: x n y\ CON = - X (2) Data of bookl borrowing records 1 Association ule algorithn Association rule set Personalized Bayesian pruning optimization 1 Historic data Data of borrowing of borrowers ecords reflecting reader interest Figure 2: The process of the association rule recommendation algorithm improved by Bayesian network. threshold, the association item set is considered as a strong rule. The confidence degree of all the items in the frequent item set are calculated as above, and the strong association rule is selected as the reference of book recommendation. 2.2 Improvement of association rule algorithm by Bayesian network In order to make up for the shortcomings of the association rule algorithm, the association rule algorithm was improved by Bayesian algorithm [10]. The basic steps are as follows. Firstly, a training set is established, and the conditional probability estimation of different characteristic attributes of items to be classified in every classification is counted. Secondly, the probability of belonging to a classification is calculated according to characteristic attributes of the item to be classified [11]: P(XlYi)P(Yi) P(Xt\x)=- P(X) (3) where CON stands for the confidence level of the association term set, \X n Y| stands for the number of records containing two items at the same time, and |x| is the number of records containing the item. If the confidence level of the association item set exceeds the set where P(YilX) stands for the probability of item X to be classified belonging Yt, X represents a set of some borrowed books in the historic record of borrows,^ indicates the set of some kind of recommended books obtained according to X, i.e., the probability of book X being classified to book Yi or the establishment probability of association item set X after Bayesian calibration, and P(XlYi) stands for the distribution probability of X in Yi, whose value is obtained by estimating the conditional probability of X by the training set. Thirdly, the probability of X belonging to Yi is calculated using equation (3), and the set with the largest probability is the most possible association item set. The association rule set which is calculated by the association rule algorithm is optimized by Bayesian algorithm, and the book recommendation result is obtained according to its probability. The basic flow is shown in Figure 2. Firstly, the data of book borrowing records in the library are input, and then the association rule set is summarized using the association rule algorithm described above from the borrowing records. @ After obtaining the association rule set, in order to obtain the personalized book recommendation, the association rule set is pruned based on the historical data of the borrower [12]: the items in the association rule set are compared with the items in the historical data, and the record is deleted if the difference is smaller than the set threshold value. The calculation formula of the threshold value is: count (Si) N = - (4) count(H uSer ) where N is the set threshold and count(St )and count(Hmer) are the number of items in the frequent item set and the number of borrowing records. ©Through practical investigation, the interest tendency of borrowers to different books in the borrowing records are confirmed, so as to build the borrowing record database [13] which reflects the interest of readers,i.e., the training set of Bayesian algorithm. After the training of Association Rule Model of On-demand. Informatica 44 (2020) 395-399 397 Bayesian algorithm, the association rule set is calibrated after personalized pruning. Finally, the book is recommended according to the probability obtained after the calibration of Bayesian algorithm. 3 Simulation experiment 3.1 Experimental environment In this study, the above recommended algorithm was simulated using MATLAB software [14]. The experiment was carried out in a laboratory server. The configuration of the server was Windows 7 operating system, 16 G memory and Core i7 processor. 3.2 Experimental setup First of all, the experimental data used for the simulation experiment came from the book borrowing management system of a university library. Taking 1000 students as subjects, the borrowing records of them from freshmen year to senior year were collected, and then the preliminary processing was carried out, including deleting the records with less than 7 books borrowed (it will reduce the amount of samples, leading to a large contingency in the association rule summarized by the algorithm, deleting the useless fields in the records, such as name and gender of borrows, book author, etc., deleting the invalid data records. There were 26525 borrowing records after final processing, and some records after pretreatment are shown in Table 1. Library card No. Borrowing grade Major disciplines Book type A12045 Freshman Law D90; D92; D923; D924;...... B21541 Sophomor e Economics F03; F05; G114; G411;...... B22548 Junior Mathemati cs O1; O4; P3; Q2;...... A12365 Senior Medicine R4; R75; Q3;...... final number of recommended books was set as 5; the data set which was used for training Bayesian algorithm the in improved association rule algorithm was the borrowing record which was constructed after investigation and could reflect the interest of readers. 3.3 Evaluating indicator In this study, the recommendation effect of recommendation algorithm was evaluated by accuracy, recall and F value, and their formulas are: , L< the the P = VM M=1' M-N M R=m M-Pi (5) F = 2RP R+P Table 1: Some book borrowing records after pretreatment. Some of the borrowing records after pre-processing are shown in Table 1. Only the library card number which represents the identity of the borrower, the borrowing grade which represents the borrowing time, the major of the borrower and the type of books borrowed by the borrower were left in the borrowing records. Taking the record in the first row of Table 1 as an example, a student whose library card number was A12045 and who was major in law borrowed "D90;D92;D923;D924" books, and most of the books was about law. In the process of iterative induction of frequent itemsets, the support and confidence degrees of the traditional and improved association rule recommendation algorithms were set as 0.1 and 0.5 respectively, and the where L is the number of recommended books in line with readers' interests, M is the numb er of readers, N is the total number of recommended books, and p is the number of books that the reader is interested in. 3.4 Experimental results The borrowing records obtained after pre-processing were calculated using three algorithms, and finally the book recommendation results of different people were obtained. Limited by the length, this paper only shows some recommendation results, as shown in Table 2. It was seen from Table 2 that the recommendation results of the three algorithms were different for the same person. According to the book classification number, it was found that the books recommended by the collaborative filtering recommendation algorithm were mostly irrelevant, although there were books related to the major; only one or two books recommended by the traditional association rule algorithm were irrelevant; the books recommended by the improved association rule algorithm were basically relevant to the major. The vertical comparison of the recommendation results of different people under the same algorithm showed that the result types under the collaborative filtering algorithm were messy and nearly involved all the majors; the results under the traditional association rule algorithm had overlapping, i.e., high similarity; the results under the improved association rule algorithm involved different types, but different from the collaborative filtering algorithm, they were relevant to the major of the borrower. The recommendation results of the three recommended algorithms were counted and checked with the corresponding borrower to see if the book was what he was interested in or needed. The final results of the performance of the algorithms are shown in Figure 3. The accuracy of the collaborative filtering algorithm was 67.3%, the recall rate was 72.1%, and the F value was 69.6%; the accuracy of the traditional association rule algorithm was 89.6%, the recall rate was 89.1%, and the F value was 89.3%; the accuracy of the improved association rule algorithm was 98.2%, the recall was 98.3%, and the F value was 98.2%. It was seen from Figure 3 that the improved association rule algorithm had the highest accuracy rate and recall rate, followed by the L 1 398 Informatica 44 (2020) 395-399 S. Xu traditional association rule algorithm and the collaborative filtering algorithm, indicating that the improved association rule algorithm could provide users with more accurate recommended books; the improved association rule algorithm also had the largest F value, followed by the traditional association rule algorithm and collaborative filtering algorithm. F value is the combination of accuracy and recall rate, which can reflect the personalized recommendation level of the algorithm to different users. The traditional association rule algorithm started from the connection between different book items and used the connection to speculate users' needs. Although personalized pruning was applied, the traditional association rule algorithm was also based on the whole borrowing record, and the strong rule still reflected the overall trend; the improved association rule algorithm used the trained Bayesian algorithm for calibration and optimization to further reflect the demand tendency of different people, therefore the accuracy, recall rate and F value of its personalized recommendation results were larger. 100 i-95 • 90 ■ 0 85 ■ 1 80 ■ I 75" I 70 " 65 • 60 • 50 - Accuracy Recall rate F value ■ Collaborative filtering algorithm ■ Traditional association rule algorithm B Improved association rule algorithm Figure 3: The performance of three recommended algorithms. 4 Conclusion This paper introduced a recommendation algorithm which mined association rules in borrowing records and improved it with a Bayesian algorithm. Then, borrowing records of 1000 students in the library management system of a university were simulated using MATLAB software. The results are as follows. (1) For the same person, the recommendation results of three algorithms were different: there were many kinds of recommendation results under the collaborative filtering algorithm, only one or two of which were related to the major; there were many kinds of recommendation results under the traditional association rule algorithm, but most of them were related to the major; the recommendation results under the improved association rule algorithm were basically related to the major. (2) Under the same algorithm, the recommendation results for different people were also different: under the collaborative filtering algorithm, the types of recommendation books for different people were diverse; under the traditional association rule algorithm, the types of recommendation books for different people overlapped to a certain extent; under the improved association rule algorithm, the Librar Recommend Results of Results of y card ation results traditional improved No. of association association collaborative rule rule filtering A120 D92;D923;D D92;D923;D D923;D924; 45 924;O1; 924; D923.6; I253.1 I253.1;H1 D99;D90 B215 F05;D99;G2 F05;G114;D F03;F05;G1 41 0;F12; 923;D924; 14;G411; G114 F12 F12 B225 O1;O4;P3;F O4;P3;G114; O1;O4;P3; 48 12;H1 D923; Q2;P2 D92 A123 G20;F12;D9 R4;R75;D92; R4;R75;Q3; 65 23;G114; D923; R8;Q5 I253.1 G114 Table 2: Some recommendation results of three algorithms. recommendation books for different people were related to their respective majors, with a low degree of overlap. (3) The results of the objective evaluation showed that the improved association rule algorithm had the largest accuracy, recall rate and F value, followed by the traditional association rule algorithm. 5 References [1] Zhou Y (2020). Design and Implementation of Book Recommendation Management System Based on Improved Apriori Algorithm. Intelligent Information Management, 12(3), pp. 75-87. https://doi.org/10.4236/iim.2020.123006 [2] Zhang FL (2016). A Personalized Time-Sequence-based Book Recommendation Algorithm for Digital Libraries. IEEE Access, pp. 1-1. https://doi.org/10.1109/ACCESS.2016.2564997 [3] Kim JY (2015). A Comparative Study of Pre-service Teachers with Korean Language Education Majors in Book Recommendation Criteria for the Middle and High School Students. journal of research in reading, 36, pp. 201-234. [4] Zhang F (2016). A Personalized Time-Sequence-Based Book Recommendation Algorithm for Digital Libraries. IEEE Access, 4, pp. 1-1. https://doi.org/10.1109/ACCESS.2016.2564997 [5] Sohail SS, Siddiqui J, Ali R (2018). Feature-Based Opinion Mining Approach (FOMA) for Improved Book Recommendation. Arabian Journal for Science & Engineering, (2), pp. 1-20. https://doi.org/10.1007/s13369-018-3282-3 [6] Chahinez B, Patrice B (2015). Information Retrieval and Graph Analysis Approaches for Book Recommendation. Scientific World Journal, 2015, pp. 1-8. https://doi.org/10.1155/2015/926418 [7] Jooa JH, Bangb SW, Parka GD (2016). Implementation of a Recommendation System Using Association Rules and Collaborative Filtering. Procedia Computer Science, 91, pp. 944-952. https://doi.org/10.1016/j.procs.2016.07.115 Association Rule Model of On-demand. Informatica 44 (2020) 395-399 399 [8] Ping H (2015). The Research on Personalized Recommendation Algorithm of Library Based on Big Data and Association Rules. Open Cybernetics & Systemics Journal, 9(1), pp. 2554-2558. https://doi.org/10.2174/1874110X01509012554 [9] Gabroveanu M (2015). Recommendation System Based On Association Rules For Distributed E-Learning Management Systems. Acta Universitatis Cibiniensis, 67(1). https://doi.org/10.1515/aucts-2015-0072 [10] dos Santos FF, Domingues MA, Sundermann CV, de Carvalho VO, Moura MF, Rezende SO (2018). Latent association rule cluster based model to extract topics for classification and recommendation applications. Expert Systems with Application, 112(DEC.), pp. 3460. https://doi.org/10.1016/j.eswa.2018.06.021 [11] Gao Y, Xu A, Hu JH, Cheng TH (2017). Incorporating association rule networks in feature category-weighted naive Bayes model to support weaning decision making. Decision Support Systems, 96, pp. 27-38. [12] Xiao S, Hu Y, Han J, Zhou R, Wen JQ (2016). Bayesian Networks-based Association Rules and Knowledge Reuse in Maintenance Decision-Making of Industrial Product-Service Systems. Procedia CIRP, 47, pp. 198-203. https://doi.org/10.1016/j.procir.2016.03.046 [13] Rao W, Zhu L, Pan S, Yang P, Qiao J (2019). Bayesian Network and association rules-based transformer oil temperature prediction. Journal of Physics Conference, 1314, pp. 012066. https://doi.org/10.1088/1742-6596/1314/1Z012066 [14] Siddiquee MR, Rahman S, Chowdhuy SUI, Rahman MR (2016). Association rule mining and audio signal processing for music discovery and recommendation. International Journal of Software Innovation, 4(2), pp. 71-87. https://doi.org/10.4018/IJSI.2016040105 400 Informatica 44 (2020) 395-399 S. Xu https://doi.org/10.31449/inf.v44i3.3280 Informatica 44 (2020) 401-366 361 Designing Hybrid Intelligence Based Recommendation Algorithms: An Experience Through Machine Learning Metaphor Arup Roy Birla Institute of Technology, Mesra, India E-mail: aruproy.cse@gmail.com Thesis summary Keywords: recommendation system, hybrid intelligent system, optimization, machine learning Received: August 12, 2019 This article presents a summarization of the doctoral thesis, which proposes efficient hybrid intelligent algorithms in recommendation systems. The development of effective recommendation algorithms for ensuring quality recommendation in a timely manner is a tricky task. Moreover, the traditional recommendation system is inadequate to cope up with the new technological trends. To overcome these issues, a batch of sophisticated recommendation systems has been discovered e.g. contextual recommendation, group recommendation, and social recommendation. The research work investigates and analyzes new genres of recommenders using nature-inspired algorithms, evolutionary algorithms, swarm intelligence algorithms, and machine learning techniques. The algorithms resolve some crucial problems of these recommenders. As a result, the more personalized recommendation is ensured. Povzetek: Povzetek doktorske disertacije, ki predlaga učinkovite hibridne inteligentne algoritme v priporočenih sistemih, raziskuje in analizira nove zvrsti priporočil z uporabo algoritmov po naravnih vzorih, evolucijskih algoritmov, algoritmov z roji in tehnik strojnega učenja. 1 Introduction Recommendation from known sources assist to achieve unknown tasks, e.g. purchasing of products, making plans for vacation, etc. However, verbal assurance often lacks real-time information and consequences contradicting opinions. Consequently, users are overwhelmed by the voluminous information, and the possibility of opting wrong products could increase. Recommendation System (RS) becomes functional in such situations, e.g. movie recommendation of movielens.org, music recommendation of last.fm, product recommendation of amazon.com [1][2]. An RS lessens, the "information overload" problem as well as provide quick personalized recommendations [3]. Technically, the recommendation process consists of collecting user preferences, tracking the relevant data, and executing the recommendation algorithms [4]. The thesis presents intelligent recommendation models considering real-life applications viz., movies, ecommerce, restaurants, hotels, and matrimonial sites using learning algorithms, nature-inspired algorithms and meta-heuristics optimizations. The initial chapters of the thesis propose a new group and contextual recommendation algorithms. The final few chapters depict performance optimization algorithms. 2 Methodology The thesis deals with designing hybrid intelligent algorithms using soft computing techniques, bio-inspired algorithms, and probabilistic models. Specifically, the research introduces (a) Crowd-Sourcing based Group Recommendation Framework: a modified termite colony based hybrid movie recommendation framework is introduced to minimize the scalability problem, recommendation of high quality products, and minimization of the recommendation time[5] (b) Trusted Contextual Recommendation Framework: a fish school search algorithm based model is proposed to ensure the recommendation from reputed users, and a reduction of the recommendation hazards using artificial bee colony based simulated annealing algorithm[6] (c) Functional Retail Recommendation Framework: a termite colony based optimized model is introduced for product recommendation, predicted of stocks based on product consumption pattern, and prediction to increase the overall selling[7] (d) New Collaborative Filtering Framework: a rough-dragonfly hybrid is proposed to find the optimal neighbors of the active user, accurate rating prediction, and removal of data sparsity issue[8] (e) New Vista in Demographic Filtering Framework: a K-means-ant colony hybrid is introduced to recommend the best partners in matrimonial sites, intelligent noisy data removal mechanism prior to recommendation, and intelligent classification of the significant attributes [9]. 3 Results In (a) Crowd-Sourcing based Group Recommendation Framework: The well-known Movie-Lens dataset is used in the experimentation purpose. The metrics such as Mean Absolute Error and Root Mean Squared Error have been used to test the error in the predicted rating. 402 Informática 44 (2020) 401-402 A. Roy Moreover, the proposed content based filtering has been compared with the Jaccard, Tanimoto, and Binary Cosine techniques. The experimental results show promising results compared to these techniques. In (b) Trusted Contextual Recommendation Framework: The Irish Trip-Advisor dataset is used in the experimentation. Moreover, the AOL data set is used for the verification of proper access operations. Particularly, location and time are considered as contextual features. The parameters reputation of a user, recommendations to a user, degree of impact, and fitness are considered to demonstrate the effectiveness of the proposed algorithm. In (c) Functional Retail Recommendation Framework: The transactional dataset (13 distinct values) from a UK-based online retail store is used in experimentation purpose. The parameters frequency of selling, repeat purchase, number of purchases, and total selling frequency assisting to predict the pattern of the stock in the near future. In (d) New Collaborative Filtering Framework: The model is trained using Restaurant and Consumer data of the Recommender Systems Domain. Subsequently, metrics such as Coverage, Root Mean Squared Error, Precision, F-Measure, and Reliability demonstrates the effectiveness of the proposed model. In (e) New Vista in Demographic Filtering Framework: The first 100 demographic profiles of the prospective brides and grooms from the popular Indian matchmaking website SimplyMarry.com is considered for the validation purpose. The metrics such as Success Rate and Recall depicts the efficiency of the proposed algorithm. 4 Conclusion and Future Work The thesis proposes some novel ideas and implementations to envisage recommendations. To achieve this, the research exhibit intelligent recommendations through learning. It successfully gets rid of some inherent limitations such as detection of intruders, recommendation generation, rating prediction, neighbor selection, and matching of the user profiles. The frameworks have been proposed in view of some real-life applications such as movies, e-commerce, restaurants, hotels, and matrimonial sites. As a result, researches become more vibrant and exciting. Moreover, the models could be easily plugged into commercial recommenders. Although, the researches show promising results, some improvements need to be taken care of such as utility-based recommendations, good consensus functions for group recommenders, management of big data, robust algorithms to efficiently deal with the fuzzy, ambiguous, and non-deterministic information. 5 References [1] Will Hill, Larry Stead, Mark Rosenstein, and George Furnas, Recommending and evaluating choices in a virtual community of use, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 194-201, 1995. https://doi.org/10.1145/223904.223929 [2] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl, GroupLens: An open architecture for collaborative filtering of netnews, in: Proceedings of the ACM Conference on Computer Supported Cooperative Work, 175-186, 1994. https://doi.org/10.1145/192844.192905 [3] David Goldberg, David Nichols, Brian M. Oki, and Douglas Terry, Using collaborative filtering to weave an information, Communications of the ACM, 35(12): 61-70, 1992. https://dl.acm.org/doi/10.1145/138859.138867 [4] Keunho Choi, Donghee Yoo, Gunwoo Kim, and Yongmoo Suh, A hybrid online-product recommendation system: Combining implicit rating-based collaborative filtering and sequential pattern analysis, Electronic Commerce Research and Applications, 11(4): 309-317, 2012. https://doi.org/10.1016/j.elerap.2012.02.004 [5] Arup Roy, Soumya Banerjee, Chintan Bhatt, Youakim Badr, Sourav Mallik, Hybrid group recommendation using modified termite colony algorithm: A context towards big data, Journal of Information and Knowledge Management, 17(2), 2018. https://doi.org/10.1142/S0219649218500193 [6] Arup Roy, Madjid Tavana, Soumya Banerjee, Debora Di Caprio, A secured context aware tourism recommender system using artificial bee colony and simulated annealing, International Journal of Applied Management Science, 8(2): 93-113, 2016. https://dx.doi.org/10.1504/IJAMS.2016.077014 [7] Soumya Banerjee, Neveen I. Ghali, Arup Roy, Aboul Ella Hassanein, A bio-inspired perspective towards retail recommender system: Investigating optimization in retail inventory, in: Proceedings of the IEEE International Conference on Intelligent Systems Design and Applications, 161-165, 2012. https://doi.org/10.1109/ISDA.2012.6416530 [8] Arup Roy, Soumya Banerjee, Manash Sarkar, Ashraf Darwish, Mohamed Elhoseny, Aboul Ella Hassanein, Exploring new vista of intelligent collaborative filtering: A restaurant recommendation paradigm, Journal of Computational Science, 27(1); 168-182, 2018. https://doi.org/10.1016/jjocs.2018.05.012 [9] Arup Roy, Soumya Banerjee, Who will be my dearest one? An expert decision, International Journal of Advanced Intelligence Paradigm (In Press). Informatica 44 (2020) 403-403 403 JOŽEF STEFAN INSTITUTE Jožef Stefan (1835-1893) was one of the most prominent physicists of the 19th century. Born to Slovene parents, he obtained his Ph.D. at Vienna University, where he was later Director of the Physics Institute, Vice-President of the Vienna Academy of Sciences and a member of several scientific institutions in Europe. Stefan explored many areas in hydrodynamics, optics, acoustics, electricity, magnetism and the kinetic theory of gases. Among other things, he originated the law that the total radiation from a black body is proportional to the 4th power of its absolute temperature, known as the Stefan-Boltzmann law. The Jožef Stefan Institute (JSI) is the leading independent scientific research institution in Slovenia, covering a broad spectrum of fundamental and applied research in the fields of physics, chemistry and biochemistry, electronics and information science, nuclear science technology, energy research and environmental science. The Jožef Stefan Institute (JSI) is a research organisation for pure and applied research in the natural sciences and technology. Both are closely interconnected in research departments composed of different task teams. Emphasis in basic research is given to the development and education of young scientists, while applied research and development serve for the transfer of advanced knowledge, contributing to the development of the national economy and society in general. At present the Institute, with a total of about 900 staff, has 700 researchers, about 250 of whom are postgraduates, around 500 of whom have doctorates (Ph.D.), and around 200 of whom have permanent professorships or temporary teaching assignments at the Universities. In view of its activities and status, the JSI plays the role of a national institute, complementing the role of the universities and bridging the gap between basic science and applications. Research at the JSI includes the following major fields: physics; chemistry; electronics, informatics and computer sciences; biochemistry; ecology; reactor technology; applied mathematics. Most of the activities are more or less closely connected to information sciences, in particular computer sciences, artificial intelligence, language and speech technologies, computer-aided design, computer architectures, biocybernetics and robotics, computer automation and control, professional electronics, digital communications and networks, and applied mathematics. The Institute is located in Ljubljana, the capital of the independent state of Slovenia (or S9nia). The capital today is considered a crossroad between East, West and Mediter- ranean Europe, offering excellent productive capabilities and solid business opportunities, with strong international connections. Ljubljana is connected to important centers such as Prague, Budapest, Vienna, Zagreb, Milan, Rome, Monaco, Nice, Bern and Munich, all within a radius of 600 km. From the Jožef Stefan Institute, the Technology park "Ljubljana" has been proposed as part of the national strategy for technological development to foster synergies between research and industry, to promote joint ventures between university bodies, research institutes and innovative industry, to act as an incubator for high-tech initiatives and to accelerate the development cycle of innovative products. Part of the Institute was reorganized into several hightech units supported by and connected within the Technology park at the Jožef Stefan Institute, established as the beginning of a regional Technology park "Ljubljana". The project was developed at a particularly historical moment, characterized by the process of state reorganisation, privatisation and private initiative. The national Technology Park is a shareholding company hosting an independent venture-capital institution. The promoters and operational entities of the project are the Republic of Slovenia, Ministry of Higher Education, Science and Technology and the Jožef Stefan Institute. The framework of the operation also includes the University of Ljubljana, the National Institute of Chemistry, the Institute for Electronics and Vacuum Technology and the Institute for Materials and Construction Research among others. In addition, the project is supported by the Ministry of the Economy, the National Chamber of Economy and the City of Ljubljana. Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Tel.:+386 1 4773 900, Fax.:+386 1 251 93 85 WWW: http://www.ijs.si E-mail: matjaz.gams@ijs.si Public relations: Polona Strnad Informática 44 (2020) INFORMATICA AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS INVITATION, COOPERATION Submissions and Refereeing Please register as an author and submit a manuscript at: http://www.informatica.si. At least two referees outside the author's country will examine it, and they are invited to make as many remarks as possible from typing errors to global philosophical disagreements. The chosen editor will send the author the obtained reviews. If the paper is accepted, the editor will also send an email to the managing editor. The executive board will inform the author that the paper has been accepted, and the author will send the paper to the managing editor. The paper will be published within one year of receipt of email with the text in Informatica MS Word format or Informatica LTeX format and figures in .eps format. Style and examples of papers can be obtained from http://www.informatica.si. Opinions, news, calls for conferences, calls for papers, etc. should be sent directly to the managing editor. SUBSCRIPTION Please, complete the order form and send it to Dr. Drago Torkar, Informatica, Institut Jožef Stefan, Jamova 39, 1000 Ljubljana, Slovenia. E-mail: drago.torkar@ijs.si Since 1977, Informatica has been a major Slovenian scientific journal of computing and informatics, including telecommunications, automation and other related areas. In its 16th year (more than twentysix years ago) it became truly international, although it still remains connected to Central Europe. The basic aim of Informatica is to impose intellectual values (science, engineering) in a distributed organisation. Informatica is a journal primarily covering intelligent systems in the European computer science, informatics and cognitive community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international refereeing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the Refereeing Board. Informatica web edition is free of charge and accessible at http://www.informatica.si. Informatica print edition is free of charge for major scientific, educational and governmental institutions. Others should subscribe. Informatica WWW: http://www.informatica.si/ Referees from 2008 on: A. Abraham, S. Abraham, R. Accornero, A. Adhikari, R. Ahmad, G. Alvarez, N. Anciaux, R. Arora, I. Awan, J. Azimi, C. Badica, Z. Balogh, S. Banerjee, G. Barbier, A. Baruzzo, B. Batagelj, T. Beaubouef, N. Beaulieu, M. ter Beek, P. Bellavista, K. Bilal, S. Bishop, J. Bodlaj, M. Bohanec, D. Bolme, Z. Bonikowski, B. Boškovic, M. Botta, P. Brazdil, J. Brest, J. Brichau, A. Brodnik, D. Brown, I. Bruha, M. Bruynooghe, W. Buntine, D.D. Burdescu, J. Buys, X. Cai, Y. Cai, J.C. Cano, T. Cao, J.-V. Capella-Hernändez, N. Carver, M. Cavazza, R. Ceylan, A. Chebotko, I. Chekalov, J. Chen, L.-M. Cheng, G. Chiola, Y.-C. Chiou, I. Chorbev, S.R. Choudhary, S.S.M. Chow, K.R. Chowdhury, V. Christlein, W. Chu, L. Chung, M. Ciglaric, J.-N. Colin, V. Cortellessa, J. Cui, P. Cui, Z. Cui, D. Cutting, A. Cuzzocrea, V. Cvjetkovic, J. Cypryjanski, L. Cehovin, D. Cerepnalkoski, I. Cosic, G. Daniele, G. Danoy, M. Dash, S. Datt, A. Datta, M.-Y. Day, F. Debili, C.J. Debono, J. Dedic, P. Degano, A. Dekdouk, H. Demirel, B. Demoen, S. Dendamrongvit, T. Deng, A. Derezinska, J. Dezert, G. Dias, I. Dimitrovski, S. Dobrišek, Q. Dou, J. Doumen, E. Dovgan, B. Dragovich, D. Drajic, O. Drbohlav, M. Drole, J. Dujmovic, O. Ebers, J. Eder, S. Elaluf-Calderwood, E. Engström, U. riza Erturk, A. Farago, C. Fei, L. Feng, Y.X. Feng, B. Filipic, I. Fister, I. Fister Jr., D. Fišer, A. Flores, V.A. Fomichov, S. Forli, A. Freitas, J. Fridrich, S. Friedman, C. Fu, X. Fu, T. Fujimoto, G. Fung, S. Gabrielli, D. Galindo, A. Gambarara, M. Gams, M. Ganzha, J. Garbajosa, R. Gennari, G. Georgeson, N. Gligoric, S. Goel, G.H. Gonnet, D.S. Goodsell, S. Gordillo, J. Gore, M. Grcar, M. Grgurovic, D. Grosse, Z.-H. Guan, D. Gubiani, M. Guid, C. Guo, B. Gupta, M. Gusev, M. Hahsler, Z. Haiping, A. Hameed, C. Hamzagebi, Q.-L. Han, H. Hanping, T. Härder, J.N. Hatzopoulos, S. Hazelhurst, K. Hempstalk, J.M.G. Hidalgo, J. Hodgson, M. Holbl, M.P. Hong, G. Howells, M. Hu, J. Hyvärinen, D. Ienco, B. Ionescu, R. Irfan, N. Jaisankar, D. Jakobovic, K. Jassem, I. Jawhar, Y. Jia, T. Jin, I. Jureta, D. Juricic, S. K, S. Kalajdziski, Y. Kalantidis, B. Kaluža, D. Kanellopoulos, R. Kapoor, D. Karapetyan, A. Kassler, D.S. Katz, A. Kaveh, S.U. Khan, M. Khattak, V. Khomenko, E.S. Khorasani, I. Kitanovski, D. Kocev, J. Kocijan, J. Kollär, A. Kontostathis, P. Korošec, A. Koschmider, D. Košir, J. Kovac, A. Krajnc, M. Krevs, J. Krogstie, P. Krsek, M. Kubat, M. Kukar, A. Kulis, A.P.S. Kumar, H. Kwašnicka, W.K. Lai, C.-S. Laih, K.-Y. Lam, N. Landwehr, J. Lanir, A. Lavrov, M. Layouni, G. Leban, A. Lee, Y.-C. Lee, U. Legat, A. Leonardis, G. Li, G.-Z. Li, J. Li, X. Li, X. Li, Y. Li, Y. Li, S. Lian, L. Liao, C. Lim, J.-C. Lin, H. Liu, J. Liu, P. Liu, X. Liu, X. Liu, F. Logist, S. Loskovska, H. Lu, Z. Lu, X. Luo, M. Luštrek, I.V. Lyustig, S.A. Madani, M. Mahoney, S.U.R. Malik, Y. Marinakis, D. Marincic, J. Marques-Silva, A. Martin, D. Marwede, M. Matijaševic, T. Matsui, L. McMillan, A. McPherson, A. McPherson, Z. Meng, M.C. Mihaescu, V. Milea, N. Min-Allah, E. Minisci, V. Mišic, A.-H. Mogos, P. Mohapatra, D.D. Monica, A. Montanari, A. Moroni, J. Mosegaard, M. Moškon, L. de M. Mourelle, H. Moustafa, M. Možina, M. Mrak, Y. Mu, J. Mula, D. Nagamalai, M. Di Natale, A. Navarra, P. Navrat, N. Nedjah, R. Nejabati, W. Ng, Z. Ni, E.S. Nielsen, O. Nouali, F. Novak, B. Novikov, P. Nurmi, D. Obrul, B. Oliboni, X. Pan, M. Pancur, W. Pang, G. Papa, M. Paprzycki, M. Paralic, B.-K. Park, P. Patel, T.B. Pedersen, Z. Peng, R.G. Pensa, J. Perš, D. Petcu, B. Petelin, M. Petkovšek, D. Pevec, M. Piculin, R. Piltaver, E. Pirogova, V. Podpecan, M. Polo, V. Pomponiu, E. Popescu, D. Poshyvanyk, B. Potočnik, R.J. Povinelli, S.R.M. Prasanna, K. Pripužic, G. Puppis, H. Qian, Y. Qian, L. Qiao, C. Qin, J. Que, J.-J. Quisquater, C. Rafe, S. Rahimi, V. Rajkovic, D. Rakovic, J. Ramaekers, J. Ramon, R. Ravnik, Y. Reddy, W. Reimche, H. Rezankova, D. Rispoli, B. Ristevski, B. Robic, J.A. Rodriguez-Aguilar, P. Rohatgi, W. Rossak, I. Rožanc, J. Rupnik, S.B. Sadkhan, K. Saeed, M. Saeki, K.S.M. Sahari, C. Sakharwade, E. Sakkopoulos, P. Sala, M.H. Samadzadeh, J.S. Sandhu, P. Scaglioso, V. Schau, W. Schempp, J. Seberry, A. Senanayake, M. Senobari, T.C. Seong, S. Shamala, c. shi, Z. Shi, L. Shiguo, N. Shilov, Z.-E.H. Slimane, F. Smith, H. Sneed, P. Sokolowski, T. Song, A. Soppera, A. Sorniotti, M. Stajdohar, L. Stanescu, D. Strnad, X. Sun, L. Šajn, R. Šenkerik, M.R. Šikonja, J. Šilc, I. Škrjanc, T. Štajner, B. Šter, V. Štruc, H. Takizawa, C. Talcott, N. Tomasev, D. Torkar, S. Torrente, M. Trampuš, C. Tranoris, K. Trojacanec, M. Tschierschke, F. De Turck, J. Twycross, N. Tziritas, W. Vanhoof, P. Vateekul, L.A. Vese, A. Visconti, B. Vlaovic, V. Vojisavljevic, M. Vozalis, P. Vracar, V. Vranic, C.-H. Wang, H. Wang, H. Wang, H. Wang, S. Wang, X.-F. Wang, X. Wang, Y. Wang, A. Wasilewska, S. Wenzel, V. Wickramasinghe, J. Wong, S. Wrobel, K. Wrona, B. Wu, L. Xiang, Y. Xiang, D. Xiao, F. Xie, L. Xie, Z. Xing, H. Yang, X. Yang, N.Y. Yen, C. Yong-Sheng, J.J. You, G. Yu, X. Zabulis, A. Zainal, A. Zamuda, M. Zand, Z. Zhang, Z. Zhao, D. Zheng, J. Zheng, X. Zheng, Z.-H. Zhou, F. Zhuang, A. Zimmermann, M.J. Zuo, B. Zupan, M. Zuqiang, B. Žalik, J. Žižka, Informática An International Journal of Computing and Informatics Web edition of Informatica may be accessed at: http://www.informatica.si. Subscription Information Informatica (ISSN 0350-5596) is published four times a year in Spring, Summer, Autumn, and Winter (4 issues per year) by the Slovene Society Informatika, Litostrojska cesta 54, 1000 Ljubljana, Slovenia. The subscription rate for 2020 (Volume 44) is - 60 EUR for institutions, -30 EUR for individuals, and - 15 EUR for students Claims for missing issues will be honored free of charge within six months after the publication date of the issue. Typesetting: Borut Žnidar, borut.znidar@gmail.com. Printing: ABO grafika d.o.o., Ob železnici 16, 1000 Ljubljana. Orders may be placed by email (drago.torkar@ijs.si), telephone (+386 1 477 3900) or fax (+386 1 251 93 85). The payment should be made to our bank account no.: 02083-0013014662 at NLB d.d., 1520 Ljubljana, Trg republike 2, Slovenija, IBAN no.: SI56020830013014662, SWIFT Code: LJBASI2X. Informatica is published by Slovene Society Informatika (president Niko Schlamberger) in cooperation with the following societies (and contact persons): Slovene Society for Pattern Recognition (Vitomir Struc) Slovenian Artificial Intelligence Society (Saso Dzeroski) Cognitive Science Society (Olga Markic) Slovenian Society of Mathematicians, Physicists and Astronomers (Dragan Mihailovic) Automatic Control Society of Slovenia (Giovanni Godena) Slovenian Association of Technical and Natural Sciences / Engineering Academy of Slovenia (Mark Plesko) ACM Slovenia (Nikolaj Zimic) Informatica is financially supported by the Slovenian research agency from the Call for co-financing of scientific periodical publications. Informatica is surveyed by: ACM Digital Library, Citeseer, COBISS, Compendex, Computer & Information Systems Abstracts, Computer Database, Computer Science Index, Current Mathematical Publications, DBLP Computer Science Bibliography, Directory of Open Access Journals, InfoTrac OneFile, Inspec, Linguistic and Language Behaviour Abstracts, Mathematical Reviews, MatSciNet, MatSci on SilverPlatter, Scopus, Zentralblatt Math Volume 44 Number 3 September 2020 ISSN 0350-5596 Informática An International Journal of Computing and Informatics Reminder of the First Paper on Transfer Learning in Neural Networks, 1976 Minimum Flows in Parametric Dynamic Networks the Static Approach Investigating Algorithmic Stock Market Trading Using Ensemble Machine Learning Methods Increasing the Engagement Level in Algorithms and Data Structures Course by Driving Algorithm Visualizations Similarity Measure of Multiple Sets and its Application to Pattern Recognition Performance Assessment of a Set of Multi-Objective Optimization Algorithms for Solution of Economic Emission Dispatch Problem Research on Data Transmission Optimization Of Communication Network Based on Reliability Analysis Automatic Image Segmentation for Material Microstructure Characterization by Optical Microscopy Smart Design for Resources Allocation in IoT Application Service Based on Multi-agent System and CSP How to Define Co-occurrence in a Multidisciplinary Context? Association Rule Model of On-demand Lending Recommendation for University Library Designing Hybrid Intelligence Based Recommendation Algorithms: An Experience Through Machine Learning Metaphor S. Bozinovski 291 N. Grigoras 303 R. Saifan, K. Sharif, 311 M. Abu-ghazaleh, M. Abdel-majeed S. Simonak 327 V. Shijina, U. Adithya, 335 J.J. Sunil S. Mishra, S.K. Mishra 349 H. Wang 361 N. Ramou, N. Chetih, 367 Y. Boutiche, R. Abdelkader M. Bali, A. Tari, 373 A. Almutawakel, O. Kazar M. Roche 387 S. Xu 395 A. Roy 401 Informatica 44 (2020) Number 3, pp. 291-403