Informatica 39 (2015) 63-69 63 Swarm Intelligence and its Application in Abnormal Data Detection Bo Liu, Mei Cai and Jiazong Yu College of Information Science and Technology, Jinan University, Guangzhou, China E-mail: ddxllb@163.com Keywords: swarm intelligence, abnormal data, detection, ACO, PSO, BCO Received: June 18, 2014 This study addresses swarm intelligence-based approaches in data quality detection. First, three typical swarm intelligence models and their applications in abnormity detection are introduced, including Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), Bee Colony Optimization (BCO). Then, it presents three approaches based on ACO, PSO and BCO for detection of attribute outliers in datasets. These approaches use different search strategies on the data items; however, they choose the same fitness function (i.e. the O-measure) to evaluate the solutions, and they make use of swarms of the fittest agents and random moving agents to obtain superior solutions by changing the searching paths or positions of agents. Three algorithms are described and explained, which are efficient by heuristic principles. Povzetek: Opisane so tri metode z roji za analizo kvalitetpodatkov: na osnovi mravelj, delcev in cebel. 1 Introduction Swarm Intelligence (SI) is a branch of Artificial Intelligence that is used to model the collective behaviour of social swarms (groups of agents) in nature, such as ant colonies, honey bees and bird flocks[1]. Each agent can not only interact with its local environment and other agents, but also respond to environmental change immediately, such that it can always find its objective. Swarm-based algorithms have emerged as a family of nature-inspired, population-based algorithms that are capable of producing low-cost, fast and robust solutions to several complex problems[2][3]. [4][5][6] introduce a slice of new developments made in the theory and applications of bio-inspired computation. For instance, Cui et al.[7] proposed the artificial plant optimization algorithm with detailed case studies, and Yang et al.[8] edited a special issue on meta-heuristics and swarm intelligence in engineering and industry. Data quality mining (DQR) is a new and promising application of data mining techniques for the detection, quantification and correction of data quality deficiencies in very large databases. Typical data mining techniques, such as clustering, classification, association analysis, have been used to discover abnormal data in databases[9][10]. Recently swarm-based DQR approaches have also been developed. For example, Alam et al.[11] proposed Si-based clustering approaches for outlier detection, and Jindal et al.[12] surveyed of SI in intrusion detection. The motivation of this study is dedicated to the application of SI in data quality mining, which explores new approaches for abnormal data discovery, avoiding searching the whole data space, such that improves efficiency and accuracy in abnormal data detection. The paper introduces typical SI models, and investigate their applications in abnormal data or behaviour detection. It discusses the inadequacy of current SI-based solutions in abnormal data detection, and proposes new approaches utilizing three SI models for detection of abnormal attributes. The rest of this paper is organized as follows: in Section 2, typical SI models are introduced. We give a survey of SI-based methods for outlier detection in Section 3. New approaches based on SI for abnormal attribute detection are presented in Section 4, and the paper concludes with our final remarks in Section 5. 2 Typical SI models SI was first introduced by G. Beni and J. Wang in the global optimization framework as a set of algorithms for controlling robotic swarm [13], which has become one of popular natural computing[14] areas. The most popular models of SI include: Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO) and Bee Colony Optimization (BCO). The main ideas of the three models are discussed below. There is no single 'Ant Model', rather there is a family of models, each inspired by a different aspect of ant behaviour[15]. These models include those inspired by ant-foraging behaviour, brood-sorting behaviour, cemetery formation behaviour and cooperative transport. One typical behaviour is ant foraging. Each ant deposits a chemical substance (called a pheromone) on the ground while walking[16], and the pheromone encourages the following ants to stay close to the previous path. Meanwhile the pheromone evaporates over time, allowing search exploration. Dorigo and Maniezzo illustrated the complex behaviour of ant colonies through a number of experiments in [17], which showed that the ants following the shorter path will make more visits to the source than those following the longer path. ACO is inspired by observation of the behaviour of real ants, 64 Informatica 39 (2015) 63-69 B. Liu et al. which was proposed by Dorigo and colleagues[18] as a method for solving combinatorial optimization problems. PSO was introduced by J. Kennedy et al.[19]. This is a population-based optimization technique inspired by the social behaviour of bird flocking and fish schooling. It was observed that large numbers of birds often change direction suddenly, and scatter and regroup. The synchrony of flocking behaviour is thought to be a function of the birds' efforts to maintain an optimum distance between themselves and their neighbours. PSO researchers have implemented such natural processes to solve optimization problems in which each single solution, called a particle, joins other individuals to make up a swarm (population) for exploring within the search space. Each particle has a fitness value calculated by a fitness function, and a velocity of moving towards the optimum[20]. The PSO process starts by creating random particles and initializing the best positions for each particle and for the whole population. It then iterates the computation of the positions and velocities of each particle movement and the update of the local best position for each particle and the global best position for the entire swarm. PSO has been continually modified trying to improve its convergence properties, and the PSO variants have been used to solve a wide range of optimization and inverse problems [21]. BCO was proposed by D. Karabago as a new member of the family of SI algorithms[22]. In nature, bees reach a nectar source by following a nest-mate who has already discovered a patch of flowers. When a forager bee (recruiter) decides to attract more bee mates to a newly-discovered good food source, it returns to the hive and starts performing what is known as the waggle dance to communicate spatial and profitability information about the discovered food source, and recruit more honey bees (dancer followers) to exploit it [1]. BCO is based on a swarm of artificial bees cooperating together to solve a problem. First, a bee, named InitBee, settles to find a solution with good features. From this first solution a set of other solutions called SearchArea is determined by using a certain strategy. Then, every bee will consider a solution from SearchArea as its starting point in the search, and communicates the best visited solution to all its neighbours. However, if after a period of time the swarm observes that the solution is not improved, it introduces a criterion of diversification, preventing it from being trapped in a local optimum [23]. 3 Survey of SI-based methods for outlier detection An abnormity is an object that does not conform to the normal behaviour of a dataset. Because abnormal data are very much in the minority in a dataset, they are also called outliers or noises. Although abnormal data or outliers may not be errors, detecting them is the most valid way to eliminate errors. There are two kinds of outliers: class outliers and attribute outliers[24]. Class outliers are records or tuples belonging to rare categories or labelled with wrong classes, and attribute outliers are attribute values deviating from the normal distribution or error attribute values in dataset records. In other words, they are at two levels: the record level and the attribute value level. Many studies have used traditional data mining or machine learning techniques to discover outliers, which need training on classifying rules, data patterns, or clustering instances of the dataset. There are also several approaches based on SI without training for detecting outliers, which are simple and heuristic. We focus on SI-based solutions in the rest of this study. In [12], an ACO-based approach of clustering was proposed as a data preprocessing procedure, during which outliers can be detected. In this process, the continuity of ants for similar data points and dissimilar data points is collected into the next nodes. Each ant of the data points compares their property values according to the initial data point set, checks the importance of the data points and iteratively updates the values of the pheromones laid by other ants. Lastly, the data point selection matrix for obtaining final clustering results using another algorithm is generated; at the same time, those unselected data points are outliers. Without setting the number of clusters and initial centre points, the ACO-based clustering method can obtain the high clustering correctness rate and outlier filtering rate. Alam et al.[11] proposed a SI-based clustering technique called Hierarchical PSO Based Clustering (HPSO-clustering) for outlier detection. HPSO-clustering uses a partitional approach to generate a hierarchy of clusters. The initial generation consists of the entire swarm. The swarm is then evolved towards a single cluster by merging two clusters of the swarm in each successive generation. For each particle of the swarm, updating of the position is continued until the particle comes to its most suitable position inside the cluster and becomes the true representative of the centroid. The cluster-based approach helps to find outliers based on their distance to the centroids. Less dense data falling at considerable distances from the centroid of the nearest cluster are considered as potential outliers. Intrusion detection in computer networks is one of the common areas of application of outlier detection. There are some intrusion detection approaches based on ACO, PSO and BCO, and most of them categorize the detected behaviour into normal or abnormal, thus, in a sense, the intrusion detection problem is reduced to a classification or clustering problem[25]. Several examples are given below. Tsai et al. [26] described an intrusive analysis model based on the design of honeypots and an ant colony; the ACO is applied to trace the trail of attack and analyse the habits and interests of aggressors. Soroush et al.[27] presented one of the pioneering studies, in which ACO is used for intrusion detection, unlike previous approaches in which it was used for intrusion response. Ramos and Abraham were two of the first researchers who attempted to introduce the LF algorithm[28] into the intrusion detection realm[29]. Their model, called ANTIDS, was based on a number of ant-like agents that pick up and drop items with a certain probability to form clusters, distinctively, the cluster with abnormal attributes is Swarm Intelligence and its Application in... Informatica 39 (2015) 63-69 69 typically much smaller in size. Most of the PSO-based intrusion detection systems (IDS) are hybrid anomaly detection systems, Kolias et al. [25] classified them into: (a) hybrid PSO-Neural Network Systems, (b) hybrid PSO-Support Vector Machine Systems, (c) hybrid PSO-K-means Systems and another category containing intrusion detection systems that employ PSO for the extraction of classification rules. In addition, Osama et al.[30] proposed an approach using BCO as a search strategy for subset generation, and using Support Vector Machine (SVM) as the classifier based on the selected feature subset, in which the feature selection is one of the prominent factors influencing the quality of intrusion detection systems. In this approach, first, an initial scout bee population is randomly generated, then the fitness of all the bees in the population is measured iteratively. The bees that have the highest fitness are selected as elite bees and the sites visited by them are chosen for neighbourhood search. At the end, SVM is trained based on the best feature subset. As a whole, SI-based approaches have several advantages over other techniques. More specifically, SI techniques aim to solve complex problems by employing multiple but simple agents without a need for any form of supervision. Every agent collaborates with others toward finding the optimal solution via direct or indirect communication (interactions). These agents are used for discovering classification rules, clusters in anomaly detection[25]. However, to our knowledge, existing SI-based approaches have focused on discovering class level abnormities; and no SI-based approaches for discovering attribute level abnormities in a dataset have been studied. In reality, attribute abnormities happen more frequently, and correcting abnormal attributes rather than eliminating the tuple outliers has advantage of retaining information. Koh et al.[31] gave some statements: class outliers are often the result of one or more attribute outliers, and even when attribute outliers do not affect class memberships, they may still interfere with data analysis; in addition, many real world datasets do not contain class attributes or distinct clusters, and it is meaningful to identify attribute outliers which may be source errors. Thus we will study new approaches based on SI for attribute outlier detection in the following. 4 New ideas on SI approaches for attribute outlier detection Some studies[9][24] have used attribute correlation analysis to detect attribute outliers in a dataset, but the time complexity of these algorithms is very high. A distribution-based approach[32] eliminates attribute values that do not fit into the distribution models of the dataset, but the accuracy largely depends on the best-fit distribution models used[31]. In this section, we will discuss SI approaches for attribute outlier detection. The main motivation for using SI in the discovery of outliers is that they perform a global search and cope better with attribute interaction, and the time complexity will be reduced. ID Name Gender City Province Country 1 L-199 F Guangzhou Guangdong China 2 L-200 M Guangzhou Guangdong China 3 L-201 M Nanning Guangxi China 4 L-202 F Guangzhou Guandon China 5 L-203 M Nanchang Jiangxi China Table 1: An example of a dataset R. 4.1 Outlier measurement Rarity is not an indicator of attribute outliers, and observations should be drawn from the correlation behaviour of attributes. Three measures, namely, the O-measure, the P-measure and the Q-measure, are provided in [31]. We use the O-measure in our approach. Related definitions referred to by [31] are described as follows. Definition 1 (support) Let R be a relation with m attributes AbA2,......,Am. Let S be a projection on R over attributes Au,..., Av, i.e. S = nAu,..., Av (R). The support of a tuple s in S, denoted by sup(s), is the count of the tuples in R that have the same values for attributes Au,..Av as item set s. For example, in Table 1, let S = ncity, Province (R), sup(Guangzhou, Guangdong) = 2, sup(Nanning, Guangxi) = 1. Definition 2 (Neighbourhood) Let tuple s = . Suppose Av is the target attribute, the extent of deviation of which we are interested in determining. The neighbourhood of Av w.r.t s is defined as N(Av, s) = < au,......, av - i >. The support of N(Av, s) is the count of tuples in R with the same values au,......, av -1 for Au,....Av - i. For example, in Table 1, let S = ncity, Provmce(R), consider tuple s = , sup(N(Province, s)) = 3, sup(N(City, s)) = 2. Definition 3 (O-measure) The O-measure of target attribute Av w.r.t s is defined as X supN(4, s)) O-measure(Av, s) = j=u_ sup(N(Av, s)) (1) For example, let s = be a tuple of S = ncity, Province, Country (R). The support of N(Province, s) is 3 while sup(N(Country, s)) and sup(N(City, s)) are both 1. The O-measure of the Province attribute w.r.t. s is (1 + 1)/3 = 0.6, i.e. O-measure(Province, s) = 0.6. For comparison, we also compute the O-measure of the Province attribute in tuple t = . We have O-measure(Province, t) = (2 + 2)/3 = 1.33. The lower the O-measure score, the more likely it is that attribute Av is an attribute outlier in a tuple. In Table 1, O-measure(Province, s) is relatively lower than O-measure(Province, t), so Guangdon is more likely an to be attribute outlier. 64 Informatica 39 (2015) 63-69 B. Liu et al. Consider a relation R with m attributes and n tuples. In the worst case, scanning all data subspaces (or projected relations) in R requires 0(n/2">) searches, where 2™ is the total number of projected relations on R. Therefore, computing the O-measure scores for each attribute w.r.t every projected relation requires 0(2"'/ n/m) time complexity [31], Obviously, the approach of searching all data subspaces of a relation for detecting outliers is highly inefficient. Thus, we make use of a swarm of agents to search heuristically and randomly. Although different kinds of agents use different search strategies, if they use the same fitness function to evaluate the solutions, they tend to obtain a similar solution set. After stopping searching, the tuples having lower O-measure values (below a threshold) in the solution set include attribute outliers. This does not mean that they are all errors. 4.2 Overview of ACO for detecting attribute outlier In a relational dataset, each tuple or its projection over some attributes can be represented as a path in a dataset graph (DG). A DG for a dataset with the attribute number m is defined as G = (V, E), where V is a set of all data items, i.e. V = {A = v| A is an attribute name, and v is a value in the domain of A}, and V, is a subset of V, which only includes data items related to the attribute A,. LT V] = V: E is a set of directed edges between two data items in each tuple, i.e. E = {| \ e V,. y e Vj, For example, a dataset is shown in Table 1, and a DG for its projection S=r[province, city, count^R) is given in Figure 1, where V = Vj^Vz^ V3 = {Vii, v12, V13} u {v2i, v22, V23, v24} u {v31 } = {Guangzhou, Naning, Nanchang} >..J {Guangdong, Guangxi, Guangdon, Jiangxi} '-J {China}} T,(f = 0) = 1 Edge _ number (2) Figure 1: A DG of the dataset projection in Table 1. For a group of ants, the main steps in detecting attribute outliers by the constructed DG are shown as Algorithm 1. Some explanations of Algorithm 1 are as follows: •In step 2), the pheromone of each edge f, (\ Ve^P (6) j In Equation (5), p in [0,1] is the pheromone evaporation rate, which controls how fast the old visited path evaporates; Q = £Sup(ej), c, e P, such that the smaller Q is, the more the pheromone of the edge in P increases. In Equation (6), is the sum of the j pheromones of all edges. •In step 6), for the newest selected path P, let the tuple t consist of vertices on P, A is attribute set. Compute the O-measure of each attribute from P, i.e. O-mcasurc(/1,. t), /1,e A, and obtain the minimum value O-measure(/4i, /) among them; then search the tuple s in table S where N(/l,, s[T]) = N(/l,, /) and s[A] = Ak and sj V|^/|/1/;|. and add or modify outlier information into table S by the following rule (call it Rule A). Rule A: if sexists then if 0-mcasurc(/1A, i)