Special Issue: Bioinspired Optimization Guest Editors: Jurij Šilc Aleš Zamuda Editorial Boards Informatica is a journal primarily covering intelligent systems in the European computer science, informatics and cognitive com­munity; scienti.c and educational as well as technical, commer­cial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international refereeing. It publishes scienti.c papers ac­cepted by at least two referees outside the author’s country. In ad­dition, it contains information about conferences, opinions, criti­cal examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and infor­mation industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor from the Editorial Board can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Edi­torial Board. Referees should not be from the author’s country. If new referees are appointed, their names will appear in the list of referees. Each paper bears the name of the editor who appointed the referees. Each editor can propose new members for the Edi­torial Board or referees. Editors and referees inactive for a longer period can be automatically replaced. Changes in the Editorial Board are con.rmed by the Executive Editors. The coordination necessary is made through the Executive Edi­tors who examine the reviews, sort the accepted articles and main­tain appropriate international distribution. The Executive Board is appointed by the Society Informatika. Informatica is partially supported by the Slovenian Ministry of Higher Education, Sci­ence and Technology. Each author is guaranteed to receive the reviews of his article. When accepted, publication in Informatica is guaranteed in less than one year after the Executive Editors receive the corrected version of the article. Executive Editor – Editor in Chief Anton P. Železnikar Volariˇ ceva 8, Ljubljana, Slovenia s51em@lea.hamradio.si http://lea.hamradio.si/~s51em/ Executive Associate Editor -Managing Editor Matjaž Gams, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +3861251 93 85 matjaz.gams@ijs.si http://dis.ijs.si/mezi/matjaz.html Executive Associate Editor -Deputy Managing Editor Mitja Luštrek, Jožef Stefan Institute mitja.lustrek@ijs.si Executive Associate Editor -Technical Editor Drago Torkar, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +3861251 93 85 drago.torkar@ijs.si Contact Associate Editors Europe, Africa: Matjaz Gams N. and S. America: Shahram Rahimi Asia, Australia: Ling Feng Overview papers: Maria Ganzha Editorial Board Juan Carlos Augusto (Argentina) Vladimir Batagelj (Slovenia) Francesco Bergadano (Italy) Marco Botta (Italy) Pavel Brazdil (Portugal) Andrej Brodnik (Slovenia) Ivan Bruha (Canada) Wray Buntine (Finland) Zhihua Cui (China) Hubert L. Dreyfus (USA) Jozo Dujmovi´ c (USA) Johann Eder (Austria) Ling Feng (China) Vladimir A. Fomichov (Russia) Maria Ganzha (Poland) Sumit Goyal (India) Marjan Gušev (Macedonia) N. Jaisankar (India) Dariusz Jacek Jakóbczak (Poland) Dimitris Kanellopoulos (Greece) Samee Ullah Khan (USA) Hiroaki Kitano (Japan) Igor Kononenko (Slovenia) Miroslav Kubat (USA) Ante Lauc (Croatia) Jadran Lenarciˇˇ c (Slovenia) Shiguo Lian (China) Suzana Loskovska (Macedonia) Ramon L. de Mantaras (Spain) Natividad Martínez Madrid (Germany) Sando Martinciˇ´c (Croatia) c-Ipiši´Angelo Montanari (Italy) Pavol Návrat (Slovakia) Jerzy R. Nawrocki (Poland) Nadia Nedjah (Brasil) Franc Novak (Slovenia) Marcin Paprzycki (USA/Poland) Wiesław Pawłowski (Poland) Ivana Podnar Žarko (Croatia) Karl H. Pribram (USA) Luc De Raedt (Belgium) Shahram Rahimi (USA) Dejan Rakovi´ c (Serbia) Jean Ramaekers (Belgium) Wilhelm Rossak (Germany) Ivan Rozman (Slovenia) Sugata Sanyal (India) Walter Schempp (Germany) Johannes Schwinn (Germany) Zhongzhi Shi (China) Oliviero Stock (Italy) Robert Trappl (Austria) Terry Winograd (USA) Stefan Wrobel (Germany) Konrad Wrona (France) Xindong Wu (USA) Yudong Zhang (China) Rushan Ziatdinov (Russia & Turkey) Editors' Introduction to the Special Issue on .Bioinspired Optimization” The possibly changing and uncertain environment attracts and retains the fittest members of biological populations, which accumulate experience and improve, from adapting and competing among themselves. Their material of experience is exchanged and propagated from iteration to iteration according to the laws of nature. Relying on elementary activities of individuals, societies of these biological populations exhibit complex emergent behaviors. Assemblies of genes, insects, bird flocks, and many other fascinating natural phenomena have been a rich source of inspiration in computer algorithms design for decades. Specifically, optimization is an area where these techniques are studied and exercised with particular practical success. As a result bioinspired optimization algorithms (evolutionary algorithms, genetic algorithms, evolution strategies, evolutionary programming, genetic programming, ant colony optimization, particle swarm optimization, artificial immune systems, etc.) were designed to overcome the drawbacks of traditional algorithms in demanding application scenarios including those where little, if any, information is available to assist problem solving. The emerging challenges inspire new methods to be delivered and existing ones being introduced for specific tasks. This special issue of Informatica – an International Journal of Computing and Informatics includes selected extended versions of student papers presented during: – the Fifth International Conference on Bioinspired Optimization Methods and their Applications (BIOMA 2012) held in Bohinj, Slovenia, on 24–25 May 2012 and – the Student Workshop on Bioinspired Optimization Methods and their Applications (BIOMA 2014), held in Ljubljana, Slovenia, on 13 September 2014. After the selection and approval of the reviewing committee, this special issue presents seven valuable contributions. They were contributed by 21 co-authors coming from five countries (Germany, Romania, Slovenia, Turkey, and United Kingdom) The first paper is entitled Differential Evolution Control Parameters Study for Self-Adaptive Triangular Brushstrokes and contributed by Aleš Zamuda and Uroš Mlakar. This work describes a lossy image representation where a reference image is approximated by an evolved image, constituted of variable number of triangular brushstrokes. Experimental results show the viability of the proposed encoding and optimization results with statistical tests that confirm the improved performance with the self-adaptation of the control parameters over the fixed control parameters. The second paper, Parallel Implementation of Desirability Function-Based Scalarization Approach for Multiobjective Optimization Problems, contributed by Okkes Tolga Altinoz, Eren Akca, Asim Egemen Yilmaz, Anton Duca, and Gabriela Ciupriana, presents the results obtained for the parallel CUDA implementation of the previously proposed desirability-based scalarization approach for the solution of the multi-objective optimization problems. The CUDA implemented approach allows for roughly 20 times speedup compared to sequential implementation, provided a suitable number of solutions to be evaluated is given. The third paper is Using a Genetic Algorithm to Produce Slogans, by Polona Tomašič, Gregor Papa, and Martin Žnidaršič. This paper describes a new solution based on the use of linguistic resources and evolutionary computing for invention of slogans. The approach utilizes a tool to check grammatical mistakes in trial solutions. A real case data is also studied, where slogans for Sentinel company are evolved. The fourth paper, entitled Comparing Evolutionary Operators and Search Spaces in the Construction of Facial Composites, by Joseph James Mist, Stuart James Gibson, and Christopher John Solomon, addresses three experiments concerning the use of interactive evolutionary algorithms in the creation of facial composites. The approach was validated by roughly 20 participants using and assessing and it, thereby generating face-spaces, using different search algorithms, and assessing the comparison of different algorithms. The fifth paper in this special issue is Heuristics for Optimization of LED Spatial Light Distribution Model, by David Kaljun, Darja Rupnik Poklukar, and Janez Žerovnik. This work presents a genetic algorithm and several versions of local search heuristics for optimization of a model of LED and secondary lens combination with symmetric spatial light distribution. They give a parameter and mechanisms combination study on the lighting task challenged. The yielding hybrid approach outperformed the standard genetic algorithm, and also outperformed a local search when inspected closely. The sixth paper is entitled Implicit and Explicit Averaging Strategies for Simulation-Based Optimization of a Real-World Production Planning Problem and contributed by Juan Esteban Diaz and Julia Handl. This paper explores the impact of noise handling strategies on optimization performance in the context of a real-world production planning problem. Since the stochastic nature of the fitness values may impact on optimization performance, authors proposed explicit and implicit averaging strategies to address this issue. They show that under increased levels of fitness variability, a hybrid strategy starts to outperform pure implicit and explicit averaging strategies for evaluation of a real-world production planning problem. Finally, the seventh paper Data Mining-Assisted Parameter Tuning of a Search Algoritm contributed, by Jurij Šilc, Katerina Taškova, and Peter Korošec, deals with the problem of tuning the performance of a meta­heuristic search algorithm with respect to its parameters. The principle challenge here is how to provide meaningful settings for an algorithm, obtained as result of better insight in its behavior. They apply their approach in learning a model for the DASA algorithm and give some conclusions on the suggested parameters tuning based on the knowledge obtained, such as number of ants and the evaporation factor. We would like to thank the authors of the papers for their individual contributions and all anonymous dedicated reviewers for their criticism and time to help us making final decisions. Without their valuable and strong support, we could not have made this special issue successful. J. Šilc et al. As Guest Editors, we hope the readers will find the Special Issue interesting and informative, as well as that the papers will stimulate further progress in the field of .Bioinspired Optimization”. Jurij Šilc Aleš Zamuda Guest Editors Differential Evolution Control Parameters Study for Self-Adaptive Triangular Brushstrokes Aleš Zamuda and Uroš Mlakar Faculty of Electrical Engineering and Computer Science, University of Maribor Smetanova ulica 17, SI-2000 Maribor, Slovenia E-mail: ales.zamuda@um.si, uros.mlakar@um.si Keywords: differential evolution, evolutionary computer vision, evolutionary art, image-based modeling, self-adaptation, triangular brushstrokes Received: December 1, 2014 This paper proposes a lossy image representation where a reference image is approximated by an evolved image, constituted of variable number of triangular brushstrokes. The parameters of each triangle brush are evolved using differential evolution, which self-adapts the triangles to the reference image, and also self-adapts some of the control parameters of the optimization algorithm, including the number of trian­gles. Experimental results show the viability of the proposed encoding and optimization results on a few sample reference images. The results of the self-adapting control parameters for crossover and mutation in differential evolution are also compared to results with keeping these parameters constant, like in a basic differential evolution algorithm. Statistical tests are furthermore included to con.rm the improved perfor­mance with the self-adaptation of the control parameters over the .xed control parameters. Povzetek: V ˇcna slika aproksimirana z clanku je predlagana izgubna predstavitev slike, kjer je referenˇ evoluirano sliko, ki je sestavljena iz spremenljivega števila potez trikotniškega ˇcopiˇca. Parametre vsake poteze ˇca optimiramo s pomoˇ copiˇcjo diferencialne evolucije, ki samoprilagaja trikotniške poteze na ref­erenˇ cno sliko in prav tako samoprilagaja nekatere krmilne parametre samega optimizacijskega algoritma, vkljuˇ cno s številom trikotnikov. Rezultati poizkusov kažejo primernost predlagane metode in rezultati op­timizacije so prikazani za veˇcnih slik. Rezultati samoprilagodljivih krmilnih parametrov c izbranih referenˇ za diferecialno evolucijo so primerjani tudi z rezultati, kjer so ti parametri nespremenljivi, kot je to primer pri osnovnem algoritmu diferencialne evolucije. Dodatno so podani še statistiˇcni testi, ki nadalje potrju­ jejo izboljšanje kakovosti pristopa ob samoprilagajanju krmilnih parametrov v primerjavi s pristopom z nespremenljivimi krmilnimi parametri. 1 Introduction In this paper, evolvable lossy image representation utiliz­ing an image compared to its evolved generated counterpart image, is proposed. The image is represented using a vari­able number of triangular brushstrokes [7], each consist­ing of triangle vertices coordinates and color parameters. These parameters for each triangle brush are evolved using differential evolution [13, 4], which self-adapts the control parameters, including the proposed self-adaptation for the number of triangles to be used. Experimental results show the viability of the proposed encoding and evolution con­vergence for lossy compression of sample images. Since this paper is an extended version of [8], new additional re­sults are included, where the experiments results with .xed control parameters for differential evolution are included to check and demonstrate the self-adaptation mechanism in.uence on results. The results show clear superiority of the proposed approach with the self-adaptive control pa­rameters over the approach where its control parameters are .xed. The approach presented is built upon and compared with [7], by addressing and also extending the original challenge. Namely, the challenge introduced in [7] uses triangles in trying to build an approximate model of an im­age [7]. The triangle is an ef.cient brush shape for this challenge, since it covers more pixels than a single point, and also allows overlaying and blending of colors over sev­eral regional surface pixels, which lines can not. Also, an arbitrary triangle shape is less constrained than any further point-approximated shape, and also other shapes can be built by combining several triangles. Instead of genetic pro­gramming in [7], in this paper differential evolution is used with a .xed size tree-like chromosome vector, which is cut­off self-adaptively to form codon and anti-codon parts of the chromosome. Also, our approach uses a modi.ed chal­lenge, where we can reconstruct the model for the reference image solely using the evolved model without using the ref­erence image, whereas the [7] needs the reference image when drawing pixels to the canvas in deciding which pix­els match the reference image for accepting them into the evolved canvas. Also, in this paper the triangle brushstroke encoding differs and is proposed especially designed for an ef.cient DE encoding. In the following section, related work is presented, then the proposed approach is de.ned. In Section 4, the experi­mental results are reported. Section 5 concludes the paper with propositions for future work. 2 Related Work In this section, related work on evolutionary computer vi­sion, evolutionary art, image representation, and evolution­ary optimization using differential evolution, are presented. These topics are used in the proposed method, de.ned in the next section. 2.1 Image-Based Modeling, Evolutionary Computer Vision, and Evolutionary Art Image-based approaches to modeling include processing of images, e.g., two-dimensional, from which after segmenta­tion certain features are extracted and used to represent a geometrical model [10]. For art drawings modeling, au­tomatic evolutionary rendering has been applied [2, 12]. Heijer and Eiben evolved pop art two-dimensional scal­able vector graphics (SVG) images [6] and de.ned genetic operators on SVG to evolve representational images using SVG, and also to evolve new images, different from source images, leading to new and surprising images for pop-art. Bergen and Ross [3] interactively evolved vector graph­ ics images using genetic algorithm, where solid-coloured opaque or translucent geometric objects or mosaic tile ef­fects with bitmap textures were utilized; they considered the art aspect of the evolved image and multiple possible A. Zamuda et al. images [16, 15]. DE mechanisms were also compared to other algorithms in several studies [17]. Neri and Tirronen in their survey on DE [9] concluded that, compared to the other algorithms, a DE extension called jDE [4], is supe­rior to the compared algorithms in terms of robustness and versatility over a diverse benchmark set used in the survey. Therefore, we choose to apply jDE in this approach. The original DE has a main evolutionary loop where a population of vectors is computed within each genera­tion. For one generation, counted as g, each vector xi, .i .{1,..., NP} in the current population of size NP, undergoes DE evolutionary operators, namely the muta­tion, crossover, and selection. Using these operators, a trial vector (offspring) is produced and the vector with the best .tness value is selected for the next generation. For each corresponding population vector, mutation creates a mutant vector vi,g+1 (‘rand/1’ [13]): vi,g+1 = xr1,g + F (xr2,g - xr3,g), (1) where the indexes r1, r2, and r3 are random and mutu­ally different integers generated in from set {1,..., NP}, which are also different from i. F is an ampli.cation fac­tor of the difference vector, mostly within the interval [0, 1]. The term xr2,g -xr3,g denotes a difference vector, which is named the ampli.ed difference vector after multiplication with F . The mutant vector vi,g+1 is then used for recom­bination, where with the target vector xi,g a trial vector ui,j,g+1 is created, e.g., using binary crossover: ui,j,g+1 = . .. .. vi,j,g+1, if rand(0, 1) . CR or j = jrand, (2) xi,j,g otherwise, outcomes due to evolution stochastics and concluded to in­vestigate vector animation of the vectorized image. In [14] animated artwork is evolved using an evolu­ tionary algorithm. Then, Izadi et al. [7] evolved trian­gular brushstrokes challenge using genetic programming for two-dimensional images, using unguided and guided searches on a three or four branch genetic program, where roughly 5% similarity with reference images was obtained on average per pixel. In this paper, we build upon and com­pare our new approach with [7], by addressing and also ex­tending this challenge. After extending the challenge, we optimize it using DE, which is described in the next sec­tion. 2.2 Evolutionary Optimization Using where CR denotes the crossover rate, .j .{1,...,D}is a j-th search parameter of D-dimensional search space, rand(0, 1) . [0, 1] is a uniformly distributed random num­ber, and jrand is a uniform randomly chosen index of the search parameter, which is always exchanged to prevent cloning of target vectors. The original DE [13] keeps the control parameters .xed, such as F =0.5 and CR =0.9 throughout optimization. However, the jDE algorithm, which is a modi.cation of the original DE, self-adapts the F and CR control parame­ters to generate the vectors vi,g+1 and ui,g+1, correspond­ing values Fi and CRi, .i .{1,..., NP} are updated prior to their use in the mutation and crossover mecha­ nisms: Differential Evolution Differential evolution (DE) [13] is a .oating-point encod- Fl + rand1 × Fu if rand2 <.1, Fi,g+1 = (3) Fi,g otherwise, ing evolutionary algorithm for continuous global optimiza­tion. It has been modi.ed and extended several times with various versions being proposed [5]. DE has also been ap­ rand3 if rand4 <.2, CRi,g+1 = (4) plied to remote sensing image subpixel mapping [18], im­age thresholding [11], and for image-based modeling using evolutionary computer vision to reconstruct a spatial pro- CRi,g otherwise, where {rand1, . . . , rand4}. [0, 1] are uniform random .oating-point numbers and .1 = .2 =0.1. Finally, the se­ cedural tree model from a limited set of two dimensional lection operator evaluates and compares the trial to current vector and propagates the .ttest: ui,g+1 if f(ui,g+1) hmax – The overall desirability value is de.ned as the geomet­ric mean of all desirability functions; this value is to be maximized. 0, yhmax relevant desirability functions d1(f1) and d2(f2) can be de.ned as in Figure 2. The desirability functions are not necessarily de.ned to be linear; certainly, non-linear de.­ d3(y)= . . . . . 0, yhmax The desirability level d(y)=1 is the state for fully desir­ proposed by altering the shapes of the desirability functions in a systematical manner. Particularly by: able, and d(y)=0 is for a not-desired case. In this respect, d1 one-sided desirability function is useful for minimiza­tion problem. The curve parameters are r, t and s. They are used in order to plot an arc instead of solid line, when desired. Curves plot in Figure 1 demonstrate the effects of the curve parameters and the graphs of the desirability functions. 2.2 Method of Desirability Function-Based Scalarization The main idea beneath the desirability functions is as fol­lows: – The desirability function is a mapping from the do­main of real numbers to the range set [0, 1]. – The domain of each desirability function is one of the objective functions; and it maps the values of the rel­evant objective function to the interval [0, 1]. – Fixing the parameters f1max_tol and f2max_tol seen in Figure 2 at in.nity, and – Varying the parameters f1min_tol and f2min_tol system­atically, It is possible to .nd the Pareto front regardless of its con­vexity or concavity. This claim can be illustrated for the bi­objective case as follows: as seen in Figure 3, the param­eters f1min_tol and f2min_tol determine the sector which is traced throughout the solution. The obtained solution cor­responds to a point for which the geometric mean of the two desirability values. As seen in Figure 4, even in the case of concave Pareto front, the solution can be found without loss of generality. In other words, unlike the weighted-sum approach, the method proposed in [6] does not suffer from the concave Pareto fronts. In [6], the applicability and the ef.ciency of the proposed scalarization approach was demonstrated via some multi-objective benchmark functions. Each single-objective problem (i.e., the scalarization scheme) was Figure 4: The solution via the desirability-function based approach for concave Pareto front. solved with Particle Swarm Optimization. Despite no ex­plicit demonstration or proof, it was claimed that: – There were no limitations about the usage of Parti­cle Swarm Optimization; i.e., any other heuristic al­gorithm could be incorporated and implemented. – The proposed method can be easily parallelizable. In this study, we demonstrate the validity of these claims by performing a parallel implementation on GPUs via the CUDA framework. The next section is devoted to the im­plementation details. 3 Parallel Multiobjective Optimization with GPU This section is dedicated to explaining the steps and idea of parallelizing the Desirability function-based scalarization approach with the aid of CUDA library. 3.1 Fundamentals of CUDA Parallel Implementation The researchers familiar with the programming languages used to desire a programming language or framework let­ting them write parallel codes easily. For this purpose in 2007, NVidia [8] introduced a software framework called CUDA. By means of this, a sequential function code can be converted to a parallel kernel by using the libraries and some pre.x expressions. By this way, the programmers do not need to learn a new programming language. They are able to use their previous know-how related to C/C++, and enhance this knowledge with some basic expressions intro­duced by CUDA. However, without the knowledge about the CUDA software and the parallel architecture hardware, it is not possible to write ef.cient codes. CUDA programming begins with the division of the ar­chitectures. It de.nes the CPU as host and GPU as de­vice. The parallel programming actually is the assignment of duties to parallel structure and collection of the results by CPU. In summary, the codes are written for CPU on C/C++ environment, and these codes include some paral­lel structures. These codes are executed by the host. Host commands device for code executed. When the code is exe­cuted by the device, the host waits until the job is .nished, then a new parallel duty can be assigned, or results from the .nished job can be collected by the host. Thus, the de­vice becomes a parallel computation unit. Hence, parallel computing relies on the data movement between host and device. Eventhough both host and device are very fast com­putation units, the data bus is slower. Therefore, in order to write an ef.cient program, the programmer must keep his/her code for minimum data transfer between the host and the device. The GPU has stream multiprocessors (SMs). Each SM has 8 stream processors (SPs), also known as cores, and each core has a number of threads. In tesla architecture there are 240 SPs, and on each SP has 128 threads, which is the kernel execution unit. The bodies of threads are called groups. The groups are performed collaterally with respect to the core size. If the GPU architecture has two cores, then two blocks of threads are executed simultaneously. If it has four cores, then four blocks are executed collaterally. Host and device communicate via data movement. The host moves data to the memory of the GPU board. This memory is called global memory which is accessed from all threads and the host. The host has also access to con­stant and texture memories. However, it cannot access the shared memory, which is a divided structure assigned for every block. The threads within the block can access their own shared memory. The communication of the shared memory is faster than the global memory. Hence, a par­allel code must contain data transfers to shared memory more often, instead of global memory. In this study, random numbers are needed to execute the algorithm. Hence, instead of the rand() function of the C/C++ environment, CURAND library of the CUDA pack has been employed. In addition, the CUDA Event is pre­ferred for accurate measurement of the execution time. In the next section, the parallel implementation of desirability function-based scalarization was explained in detailed. Informatica 39 (2015) 115–123 119 3.2 Parallel Implementation of Desirability Function-Based Scalarization The main idea of our parallel implementation throughout this study is illustrated in Figure 5. Each scalarization scheme is handled in a separate thread; after the relevant solutions are obtained, they are gathered in a centralized manner to constitute the Pareto front from which the human decision maker picks a solu­tion according to his/her needs. This approach ensures that the number of solutions found that can be found in parallel is limited by the capability of the GPU card used. As stated before, we implemented the Particle Swarm Optimization Algorithm for veri.cation of the aforemen­tioned claims. The parallel CUDA implementation was compared to the sequential implementation on various GPUs and CPUs. Figure 5: The parallel CUDA implementation of the desirability-function based approach. It was seen that both implementations (sequential and parallel CUDA) were able to .nd the same solutions but in different elapsed times. As seen in Figure 6, if the num­ber of Pareto front solutions increase, the advantage of the parallel CUDA increases dramatically. Figure 6 presents parallel implementation of scalariza­tion approach for the weighted sum method. The simple convex problem is selected and de.ned in (4) and (5) as a test bed for present the performance of the parallelization method for scalarization. 2 f1(x)= x (4) f2(x)=(x - 2)2 (5) According to Figure 6, the performance of high and mid-level GPU cards are approximately 10-times faster than se­quential implementation. The results obtained in Figure 6 yields the following conclusions: – For a small number of Pareto solutions, CPU performs better against GPU – After 64 solutions, parallel implementation presents better results than sequential code – An old-fashion mobile GPU performs almost same as a relatively high level CPU. – As the number of solution increases, the professional high level GPU devices perform more stable than gen­eral purpose GPUs. 4 Implementation, Results, and Discussion The parallel desirability function-based scalarization ap­proach was applied to solve seven benchmark problems. These problems are selected based on the complexity against execution time on computation unit. Since the av­erage number of execution time is considered in the study, problems from simple calculation to problems with more branch and complex functions. In this section the bench­mark problems and the results with respect to execution time is presented. 4.1 Benchmark Problems In this study, ten benchmark problems [9] with different complexity and Pareto shape are selected to present the per­formance of the method. Table 1 gives the mathematical formulations of the problems. The performance compar­ison is performed not only on the accuracy of the results, but more importantly on the execution time. As given in Ta­ble 1 the complexity of the benchmark problems are given from simple to more complex problems. The reason be­hind is that as the complexity of the function is increased, the single processors have to accomplish much more cal­culations, and since the single processors on a GPU has lower capacity than CPU, it will be a good comparison for not only the number of solutions in solution space but also the problem complexity. Table 1 presents as three columns. The .rst column gives the known-names of benchmark problems. The reader can be access amount of information about the function by searching by selecting keyword as function name. The sec­ond column is the mathematical formulation of the func­tion. As the order of row increases the complexity of the O. T. Altinoz et. al. function also increases. The last column is for the de.nes of the range of the decision variables. 4.2 Implementation Results Table 2 presents the execution time comparison of CPU (Xeon E2620) and GPU (Tesla K20) for various numbers of levels from 8×8 to 100×100, number of single objective evaluations are 64 and 104 respectively. For low complex problems, until 225 numbers of levels (400 levels need for hard problems), the CPU outperforms GPU implementa­tion with respect to execution time. It is reasonable since only small portion of cores on GPU can be used. But lower number of relatively very fast cores are .nished the exe­cutions earlier than GPU. From 400 to 6, 400 levels, GPU computation time of parallel codes exceeds CPU time. At 6, 400 levels, the difference between CPU and GPU is at the peak grade. After that level, the advantage of GPU re­duces. In other words, the GPU implementation acts more sequentially, since there are not any empty resources to ex­ecute parallel implementation. Among all of the problems, UF1 is the hardest for GPU implementation since the com­putation time is the longest for this problem. The main reasons are that: a) checking mechanism for even and odd parts that adds branch to the code, b) square of the trigono­metric function. for GPU implementation branch are the time consuming programming codes such that in an if-else, both parts are evaluated by the architecture, that reduces the resources. The average execution time of CPU is 8.25-times slower than average GPU execution time. The following results are obtained for comparison the execution time: – For a small number of solutions, CPU outperforms GPU – The increase on CPU execution time is proportional to the number of solutions. Hence, the execution time on CPU increases. – The GPU implementations are much bene.cial for overall comparison. – For a very high number of solutions, the improve­ments obtained in GPU slowly decreases since GPU contains limited number of stream (multi)processors. At some point the improvements are not lower than . 10-times on average. 5 Conclusion In this study, desirability function-based scalarization ap­proach is evaluated in a parallel fashion. Since the perfor­mance of sequential and parallel implementations are sim­ilar to each other, the execution time of these codes are compared based on different number of solutions. The re­sults show that, for small number of solutions, parallel im­plementation is slower when compared to sequential im­plementation. But as the number of solution increases, the Table 1: Multiobjective benchmark problems Function name Mathematical description Decision variable range ZDT1 f1(x) = x1 f2(x) = g(1 - f1 g ) g = 1 + 9 n-1 n i=2 xi 0 . xi . 1 ZDT2 f1(x) = x1 f2(x) = g(1 - ( f1 g )2) g = 1 + 9 n-1 n i=2 xi 0 . xi . 1 ZDT3 UF1 UF2 UF3 UF4 f1(x) = x1 f2(x) = g(1 - f1 g - x1 g sin(10.x1)) g = 1 + 9 n-1 n i=2 xi f1(x) = x1 + 2 |J1| i.J1 (xi - sin(6.x1 + i. n ))2 f2(x) = 1 - . x1 + 2 |J2| i.J2 (xi - sin(6.x1 + i. n ))2 J1 = {i| i is odd and 2 . i . n}, J2 = {i| i is even and 2 . i . n} f1(x) = x1 + 2 |J1| i.J1 y2 i f2(x) = 1 - . x1 + 2 |J2| i.J1 y2 i yi = xi - (0.3x2 1 cos(24.x1 + 4i. n ) + 0.6x1) cos(6.x1 + i. n ), i . J1 xi - (0.3x2 1 cos(24.x1 + 4i. n ) + 0.6x1) sin(6.x1 + i. n ), i . J2 f1(x) = x1 + 2 |J-1| ((4 i.J1 y2 i ) - (2 i.J1 cos( 20yi.. i )) + 2) f2(x) = 1 - . x1 + 2 |J-2| ((4 i.J2 y2 i ) - (2 i.J2 cos(20yi.. i )) + 2) yi = xi - x 0.5(1+ 3(i-2) n-2 ) 1 f1(x) = x1 + 2 |J1| i.J1 h(yi) f2(x) = 1 - x2 1 + 2 |J2| i.J2 h(yi) yi = xi - sin(6.x1 + i. n ), h(t) = t 1+e2t 0 . xi . 1 0 . xi . 1 -1 . xi-1 . 1 0 . xi . 1 -1 . xi-1 . 1 0 . xi . 1 0 . xi . 1 -2 . xi-1 . 2 Table 2: Execution time comparison [seconds] of benchmark functions, where improvement, impr, is the scale factor shows how many times the GPU is faster than CPU, so that if impr < 1 means CPU is faster than GPU # of levels for Devices 2 desirability & ZDT1 ZDT2 ZDT3 UF1 UF2 UF3 UF4 Average functions impr CPU 0.133 0.109 0.19 0.11 0.109 0.109 0.094 0.1220 8 × 8 GPU 0.433 0.4504 0.483 0.4917 0.4861 0.4906 0.408 0.4633 impr 0.3072 0.2420 0.3934 0.2237 0.2242 0.2222 0.2304 0.2633 10 × 10 15 × 15 CPU GPU impr CPU GPU impr 0.221 0.439 0.5034 0.446 0.4424 1.0081 0.153 0.451 0.3392 0.333 0.4576 0.7277 0.291 0.4848 0.6002 0.576 0.4904 1.1746 0.222 0.4934 0.4499 0.42 0.499 0.8417 0.199 0.49 0.4061 0.418 0.4944 0.8455 0.197 0.4914 0.4009 0.413 0.4967 0.8315 0.168 0.405 0.4148 0.372 0.409 0.9095 0.2073 0.4649 0.4450 0.4254 0.4699 0.9055 20 × 20 25 × 25 CPU GPU impr CPU GPU impr 0.8 0.4281 1.8687 1.21 0.4393 2.7544 0.564 0.442 1.2760 0.893 0.4573 1.9528 0.997 0.4781 2.0853 1.521 0.491 3.0978 0.717 0.5 1.4340 1.12 0.5 2.2400 0.706 0.4977 1.4185 1.444 0.4954 2.9148 0.728 0.5 1.4560 1.114 0.499 2.2325 0.811 0.4146 1.9561 0.987 0.408 2.4191 0.7604 0.4658 1.6421 1.1841 0.4700 2.5159 30 × 30 40 × 40 CPU GPU impr CPU GPU impr 1.753 0.4424 3.9625 3.162 0.4451 7.1040 1.266 0.4566 2.7727 2.186 0.453 4.8256 2.279 0.4871 4.6787 4.094 0.4893 8.3671 1.582 0.501 3.1577 2.794 0.4999 5.5891 1.589 0.4973 3.1953 2.854 0.4983 5.7275 1.59 0.4979 3.1934 2.757 0.4991 5.5239 1.428 0.4132 3.4560 2.508 0.4151 6.0419 1.6410 0.4708 3.4880 2.9079 0.4714 6.1684 50 × 50 60 × 60 CPU GPU impr CPU GPU impr 4.879 0.4488 10.8712 6.946 0.4709 14.7505 3.431 0.4639 7.3960 4.798 0.4864 9.8643 6.138 0.4967 12.3576 9.492 0.518 18.3243 4.412 0.5119 8.6189 6.236 0.5287 11.7950 4.382 0.5 8.7640 6.411 0.518 12.3764 4.298 0.501 8.5788 6.391 0.519 12.3141 3.889 0.4321 9.0002 6.233 0.4587 13.5884 4.4899 0.4792 9.3695 6.6439 0.5000 13.2876 70 × 70 80 × 80 CPU GPU impr CPU GPU impr 9.52 0.4995 19.0591 12.488 0.6179 20.2104 6.764 0.5144 13.1493 8.87 0.6321 14.0326 11.959 0.5417 22.0768 15.892 0.6488 24.4945 8.566 0.5489 15.6058 11.11 0.6388 17.3920 8.548 0.539 15.8590 11.366 0.635 17.8992 8.562 0.5435 15.7534 11.538 0.6362 18.1358 7.592 0.4923 15.4215 13.307 0.607 21.9226 8.7873 0.5256 16.7036 12.0816 0.6308 19.1553 90 × 90 100 × 100 CPU GPU impr CPU GPU impr 15.776 0.8299 19.0095 19.2579 1.1157 17.2608 11.246 0.854 13.1686 13.863 1.149 12.0653 20.027 0.8749 22.8906 24.504 1.1812 20.7450 14.039 0.8424 16.6655 17.252 1.12 15.4036 14.138 0.84 16.8310 19.219 1.222 15.7275 14.053 0.8432 16.6663 17.74 1.125 15.7689 14.583 0.8335 17.4961 15.49 1.132 13.6837 14.8374 0.8454 17.5325 18.1894 1.1493 15.8078 Figure 6: Comparison of the sequential Java and the parallel CUDA implementations. GPU is almost 20-times faster than sequential implementa­tion. Acknowledgement This study was made possible by grants from the Turk­ish Ministry of Science, Industry and Technology (In­dustrial Thesis – San-Tez Programme; with Grant Nr. 01568.STZ.2012-2) and the Scienti.c and Technological Research Council of Turkey -TÜBITAK (with Grant Nr. 112E168). The authors would like to express their grati­tude to these institutions for their support. References [1] R. Marler, S. Arora (2009) Transformation methods for multiobjective optimization, Engineering Opti­mization, vol. 37, no. 1, pp. 551–569. [2] N. Srinivas, K. Deb (1995) Multi-Objective function optimization using non-dominated sorting genetic al­gorithms, Evolutionary Computation, vol. 2, no. 3, pp. 221-–248. [3] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Com­putation, vol. 6, no. 2, pp. 182-–197. [4] J. D. Schaffer (1985) Multiple objective optimization with vector evaluated genetic algorithms, Proceed­ings of the International Conference on Genetic Al­gorithm and their Applications, pp. 93–100. [5] J. Branke, K. Deb (2008) Integrating user prefer­ences into evolutionary multiobjective optimization, Knowledge Incorporation in Evolutionary Comput­ing, Springer, pp. 461–478. [6] O. T. Altinoz, A. E. Yilmaz, G. Ciuprina (2013) A Multiobjective Optimization Approach via Systemat­ical Modi.cation of the Desirability Function Shapes, Proceedings of the 8th International Symposium on Advanced Topics in Electrical Engineering. [7] G. Derringer, R. Suich (1980) Simultaneous op­timization of several response variables,Journal of Quality Technology, vol. 12, no. 1, pp. 214–219. [8] NVIDIA Corporation (2012) CUDA dynamic paral­lelism programming, NVIDIA. [9] E. Ziztler, K. Deb, L. Thiele (2000) Comparison of multiobjective evolutionary algorithms: Empirical re­sults, Evolutionary Computation Journal, vol. 8, no. 2, pp. 125–148. Using a Genetic Algorithm to Produce Slogans Polona TomašicˇXLAB d. o. o., Pot za Brdom 100, SI-1000 Ljubljana, Slovenia and Jožef Stefan International Postgraduate School, Jamova cesta 39, SI-1000 Ljubljana, Slovenia E-mail: polona.tomasic@xlab.si Gregor Papa Computer Systems Department, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia and Jožef Stefan International Postgraduate School, Jamova cesta 39, SI-1000 Ljubljana, Slovenia E-mail: gregor.papa@ijs.si Martin ŽnidaršicˇDepartment of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia and Jožef Stefan International Postgraduate School, Jamova cesta 39, SI-1000 Ljubljana, Slovenia E-mail: martin.znidarsic@ijs.si Keywords: genetic algorithm, slogan generation, computational creativity, linguistic resources Received: December 1, 2014 Creative tasks, such as creation of slogans for companies, products or similar entities, can be viewed from the combinatorial perspective – as a search through the space of possible combinations. To solve such a combinatorial optimization problem, we can use evolutionary algorithms. In this paper, we present our solution for generation of slogans based on a genetic algorithm and linguistic resources. We also compare it to the unguided slogan generator. Povzetek: Na kreativne naloge, kot je snovanje sloganov za podjetja in produkte, lahko gledamo s kombi­natoriˇcnega vidika – kot na iskanje v prostoru možnih kombinacij. Za reševanje tovrstnih kombinatoriˇcnih optimizacijskih problemov lahko uporabljamo evolucijske algoritme. V tem ˇ clanku predstavljamo rešitev za generiranje sloganov na podlagi genetskega algoritma in jezikovnih virov. Predstavljeno rešitev primer­ jamo tudi z generatorjem sloganov brez vodenja. 1 Introduction Automated generation of slogans is a problem from the .eld of Computational Creativity [5]. There are very few studies dedicated to slogan generation. In fact, the only one we came across is the BRAINSUP framework [19], which is based on beam search through a carefully de.ned space of possible slogans. This space gets reduced by applying user speci.ed constraints on keywords, domain, emotions, and other properties of slogans. High quality slogans are often a result of group brain­storming. Several individuals present their ideas and the proposed slogans are then mixed into new slogans, and some new ideas emerge. This brainstorming process is sim­ilar to the evolution, from which we got the idea of using evolutionary algorithms for slogan generation. The initial slogans from brainstorming represent an initial population, mixing the best proposed slogans represents recombina­tion, and new included ideas represent mutations. Evolu­tionary algorithms have already been applied to different natural language processing problems [2]. In this paper, we present our slogan generation proce­dure which is not in.uenced by the user in any way, apart from being provided with a short textual description of the target entity. The method is based on a genetic algo­rithm (GA) [3]. Genetic algorithms are the most traditional evolutionary algorithms and they ensure a good coverage of the search space. They have been successfully used for generating recipes [17], poetry [13] and trivial dialog phrases [16]. However, genetic algorithms have not been previously used for slogan generation. Our method follows the BRAINSUP framework in the initial population genera­tion phase, and it uses a collection of heuristic slogan func­tions in the evaluation phase. We tested our slogan generator and compared it to the random slogan generator. The statistical results are in fa­vor of our method. However, even though the generated slogans can present a good starting point for brainstorm­ing, their quality is not yet at the desired level. The rest of the paper is organized as follows. In Section 2 we present the linguistic and semantic resources used in our solution. Section 3 provides a detailed description of the entire slogan generation process. It includes descrip­tion of the evaluation functions and it clari.es the differ­ence between the slogan generator and the unguided slogan generator. The performed experiments and the discussion of the results are presented in Section 4. The conclusions are drawn in Section 5. 2 Resources Linguistic and semantic resources are a prerequisite for any kind of text generation. We use them at several steps of our method – for generation of initial population, mutation, and evaluation. Some are available as extended libraries for programming languages, others are available for download from the Internet, and some databases were created by our­selves. The origin of the data and the process is brie.y described in the following paragraphs. 1. Database of famous slogans: it serves as a basis for the initial population generation and for comparison with generated slogans. It contains 5,249 famous slo­gans obtained from the Internet. 2. Database of frequent grammatical relations be­tween words in sentences: for its acquisition we used the Stanford Dependencies Parser [14]. Stanford de­pendencies are triplets containing two words and the name of the relation between them. The parser also provides part-of-speech (POS) tags and phrase struc­ture trees. To get representatives of frequent gram­matical relations between words, we parsed 52,829 random Wikipedia pages, sentence by sentence, and obtained 4,861,717 different dependencies. 3. Database of slogan skeletons: slogan skeletons were obtained by parsing famous slogans with the Stanford Dependencies Parser. A slogan skeleton contains in­formation about each position in the sentence – its POS tag and all its dependence relations with other words in the sentence. It does not contain any content words, only stop words. An example of a skeleton is shown in Figure 1. Figure 1: Example of a skeleton. 3 Slogan Generation An input of our slogan generator is a short textual descrip­tion about the target entity. It is the only required input from a user. It is used to obtain the name of the target en­tity and a set of keywords. An output is a list of generated slogans. The whole procedure is shown in Algorithm 1. 3.1 Extraction of the Keywords and the Main Entity The most frequent non-negative words from the input text are selected as keywords. Negative words are detected us­ing the Nodebox English Linguistics library [18]. The main entity is usually the name of the company and is obtained by selecting the most frequent entity in the whole text using the nltk library [4]. 3.2 Generation of the Initial Population of Slogans The procedure of generating the initial population of slo­gans is based on the BRAINSUP framework [19], with some modi.cations. It follows the steps in Algorithm 2. Skele­tons are obtained from the database of slogan skeletons. Fillers are the words from the database of all grammatical relations between words in sentences that satisfy all pre­de.ned dependencies and POS tags. If there are any key­words in a set of all possible .ller words, the algorithm assigns them higher priority for the selection phase. The main difference between our algorithm and the BRAIN­SUP method is in the selection of .ller words. We don’t consider any user speci.ed constraints, while the BRAIN­SUP framework uses beam search in the space of all possi­ble lexicalizations of a skeleton to promote the words with the highest likelihood of satisfying the user speci.cations. Thus using our method we can produce many different slo­gans from the same slogan skeleton, whereas BRAINSUP produces only one for given user speci.cations. 3.3 Evaluation of Slogans An aggregated evaluation function is used to evaluate the slogans. It is composed of 9 different sub-functions, each assessing a particular feature of a slogan, with scores in the interval [0,1]. Parameter of the aggregation function is a list of 9 weights that sum to 1. They de.ne the proportions of sub-functions in the overall score. In this subsection, we give a short description for every one of them. 3.3.1 Bigram Function In order to work with 2-grams, we obtained the dataset of 1,000,000 most frequent 2-grams and 5,000 most fre­quent words in Corpus of Contemporary American English (COCA) [6]. We assume that slogans containing many fre­quent 2-grams, are more likely to be semantically coherent. Algorithm 1: SloganGenerator 1 Input: A textual description of a company or a product T , Size of the population SP, Maximum number of iterations MaxIterations, Crossover probability pcrossover, Mutation probability pmutation, Set of evaluation weights W . 2 Output: A set of generated slogans S. 1: Keywords, Entity . GetKeywordsAndEntity(T ) 2: P . CreateInitialPopulation(SP, Keywords, Entity) 3: Evaluate(P ) 4: Iteration . 0 5: while Iteration < MaxIterations do 6: P arents . ChooseParentsForReproduction(P ) 7: Children . Crossover(P arents, pcrossover) 8: Children . Mutation(Children, pmutation) 9: NewGeneration . DeleteSimilarSlogans(P, Children) 10: while Size(NewGeneration) 0 do 3: SloganSkeleton . SelectRandomSloganSkeleton() 4: while not AllEmptySlotsFilled(SloganSkeleton) do 5: EmptySlot . SelectEmptySlotInSkeleton(SloganSkeleton) 6: F illers . FindPossibleFillerWords(EmptySlot) 7: F illerW ord . SelectRandomFillerWord(F illers) 8: FillEmptySlot(SloganSkeleton, F illerW ord) 9: end while 10: AddFilledSkeleton(S, SloganSkeleton) 11: SP . SP - 1 12: end while 3.3.2 Length Function The length function is very strict, it assigns score 1 to slo­gans with less than eight words, and score 0 to longer ones. The threshold between 0 and 1 was set according to the re­sults of the experiments, which showed that a large major­ity of the generated slogans that contained more than seven words were grammatically incorrect and semantically in­coherent. Also, more than 90% of the famous slogans are less than eight words long. This function acts as an abso­lute constraint and that is why no values between 0 and 1 are allowed. 3.3.3 Diversity Function The diversity function evaluates a slogan by counting the number of repeated words. The highest score goes to a slo­gan with no repeated words. If a slogan contains identical consecutive words, it receives score 0. 3.3.4 Entity Function It returns 1, if slogan contains the main entity, and 0, if it doesn’t. 3.3.5 Keywords Function If one up to half of the words in a slogan belong to the set of keywords, the keywords function returns 1. If a slogan doesn’t contain any keyword, the score is 0. If more than half of the words in the slogan are keywords, the score is 0.75. 3.3.6 Word Frequency Function This function prefers slogans with many frequent words. A word is considered to be frequent, if it is among 5,000 most frequent words in COCA. The frequency score is obtained by dividing the number of frequent words by the number of all words in the slogan. 3.3.7 Polarity and Subjectivity Functions Polarity of a slogan indicates whether slogan contains pos­itive or negative words. For instance, the adjective “happy" is a positive word. In a similar way subjectivity of a slogan indicates whether slogan contains words that express the attitude of the author. For instance, the adjectives “good" and “bad" both represent the opinion of the author and are therefore subjective. The polarity and subjectivity scores are calculated based on the adjectives in the slogan, using the sentiment function from pattern package for Python [7]. 3.3.8 Semantic Relatedness Function This function computes the relatedness between all pairs of content words in a slogan. Stop words are not taken into account. Each pair of words gets a score based on the path distance between corresponding synsets (sets of synonyms) in WordNet [15]. The .nal score is the sum of all pairs’ scores divided by the number of all pairs. 3.4 Production of a New Generation of Slogans A list of all generated slogans is ordered descending with regard to the evaluation score. We use 10% elitism [8]. The other 90% of parent slogans are selected using a roulette wheel [11]. A new generation is built by pairing parents and per­forming the crossover function followed by the mutation function, which occur with probabilities pcrossover and pmutation, respectively. Offspring are then evaluated and compared to the parents, in order to remove very similar ones. If the number of the remaining slogans is smaller than the size of the population, some additional random slogans are generated using the method for creation of ini­tial population. After that, slogans proceed into the next generation. These steps are repeated until the prede.ned number of iterations is achieved. 3.4.1 Crossover We use two types of crossover functions, the big and the small one. Both inspect POS tags of the words in both par­ents, and build a set of possible crossover locations. Each element in the set is a pair of numbers. The .rst one pro­vides a position of crossover in the .rst parent and the sec­ond one in the second parent. The corresponding words must have the same POS tag. Let the chosen random pair from the set be (p, r). Using the big crossover, the part of P. Tomašiˇc et al. the .rst parent, from the p-th position forward, is switched with the part of the second parent, from the r-th position forward. For the small crossover only the p-th word in the .rst parent and the r-th word in the second parent are switched. Examples for the big and the small crossover are illustrated in Figure 2. Figure 2: Examples for the big and the small crossover.1 3.4.2 Mutation Two types of mutation are possible. Possible big muta­tions are: deletion of a random word; addition of an ad­jective in front of a noun word; addition of an adverb in front of a verb word; replacement of a random word with new random word with the same POS tag. Small mutations are replacements of a word with its synonym, antonym, meronym, holonym, hypernym or hyponym. A meronym is a word that denotes a constituent part or a member of something. The opposite of a meronym is a holonym – the name of the whole of which the meronym is a part. A hy­pernym is a general word that names a broad category that includes other words, and a hyponym is a subdivision of more general word. Functions for obtaining such replacements are embedded into the Nodebox English Linguistics library and are based on the WordNet lexical database. 3.4.3 Deletion of Similar Slogans Every generated slogan is compared to all its siblings and to all the evaluated slogans from the previous generation. If a child is identical to any other slogan, it gets removed. If more than half of child’s words are in another slogan, the two slogans are considered similar. Their evaluation scores are being compared and the one with higher score remains in the population while the other one is removed. The child is also removed if it contains only one word or if it is longer than 10 words. Deletion of similar slogans prevents the generated slogans to converge to the initial ones. This has been checked by testing our method without the deletion of similar slogans phase. 1Slogans used in the examples were or still are of.cial slogans of the following companies: General Electric, United Airlines, Nike, and BC Dairy Association. 3.5 Correction of Grammatical Errors Crossover and mutation functions may cause grammatical errors in generated slogans. For instance, incorrect usage of determiners (e.g., “a apple" instead of “an apple"), se­quence of incompatible words (e.g., “a the"), and others. Spelling mistakes were much less frequent. In order to remove both types of errors in the .nal slo­gans, we tested different spell-and grammar checkers. One example of a spell-checker is Hunspell [12]. Its downside is that it works on one word at a time and does not take the word’s context into account. As the majority of er­rors in slogans originated from grammar, we tested sev­eral grammar checkers. They, on the other hand, work on the sentence level rather than on the word level. Most of these grammar checkers are available as online services, and don’t support API calls. One that does is python-ginger [10] – a Python package for .xing grammar mis­takes. It comes with an unof.cial Ginger [9] API. This tool corrects different types of grammatical mistakes. It is also used for contextual spelling correction. We used python-ginger only on .nal slogans, the ones that are displayed to the user, because the corrected slogan may not have the same structure anymore. Possible added words, or replac­ing a word with another one with different POS tag would cause errors while executing crossover, mutation and eval­uation functions. 3.6 Unguided Slogan Generator For the purpose of evaluation of our slogan generation method, we also implemented an unguided slogan gener­ator (USG). This generator produces random slogans, such as the ones in the initial population. The only difference between our method and the unguided slogan generation method is in the production of a new generation. USG has no crossover and the mutation steps. Instead it produces a new generation using a method for creation of initial popu­lation. Thus children are independent of the previous gen­eration. The algorithmic steps are shown in Algorithm 3. 4 Experiments We tested the slogan generation method on different input texts and for different values of algorithm parameters. We analyzed the results of every iteration of the genetic algo­rithm to see how the slogans’ scores changed and made further assessment of the generator by comparing its results with the results of the unguided slogan generator. 4.1 Experimental Setting 4.1.1 Input Text In the presented experiments, we use a case of the Croatian provider of marine solutions, Sentinel. Sentinel is a control module that provides more security to boat owners, and is Informatica 39 (2015) 125–133 129 comprised of a network of sensors and a central informa­tion hub. It ensures the vessel is monitored at all times. The input text was obtained from the Sentinel’s web-page [21]. 4.1.2 Algorithm Parameters Different combinations of weights of the evaluation func­tion were tested on a set of manually evaluated slogans. We added one constraint – the weight of the keywords function had to be at least 0.2 in order to include keywords in the slo­gans. Without this constraint the computed weight for the keywords was almost zero. The comparison of the com­puted and the manually assigned scores showed that the highest matching was achieved with the following weights: [bigram: 0.25, length: 0.01, diversity: 0.01, entity: 0.1, keywords: 0.2, frequent words: 0.25, polarity: 0.01, sub­jectivity: 0.02, semantic relatedness: 0.15]. Probabilities for crossover and mutation were set to pcrossover =0.8 and pmutation =0.7. The probability for mutation was set very high, because it affects only one word in a slogan. Consequently the mutated slogan is still very similar to the original one. Thus the high mutation probability does not prevent population from converging to the optimum solution. For the algorithm to decide which type of crossover to perform, we set probabilities for the big, the small and both crossovers to 0.4, 0.2 and 0.4, re­spectively. The mutation type is chosen similarly. Proba­bilities of the big and the small mutation were set to 0.8 and 0.2. These algorithm parameters were set according to the results of testing on a given input text, as their combination empirically leads to convergence. We performed three experiments and for each of them we executed 20 runs of the algorithm using the same input parameter values. The difference between these three tests was in the size of the population (SP) and the number of it­erations (NIt). Those were chosen according to the desired number of all evaluations (. 6, 800 NoE), and the NoE was set according to the desired execution time for one run of the algorithm – approximately 2 hours. 1. SP: 25, NIt: 360 2. SP: 50, NIt: 180 3. SP: 75, NIt: 120 4.1.3 Comparison with the Unguided Slogan Generator For comparison, we performed three experiments with the unguided slogan generator. For each of them we executed 20 runs of the algorithm using the same input parameter values as in the experiments with slogan generator. The initial populations were also identical. The number of iter­ations were again chosen so as to match the number of all evaluations (. 6, 800 NoE) in the experiments with slogan generator: 1. SP: 25, NIt: 300 2. SP: 50, NIt: 150 3. SP: 75, NIt: 100 Algorithm 3: UnguidedSloganGenerator 1 Input: A textual description of a company or a product T , Size of the population SP, Maximum number of iterations MaxIterations, Set of evaluation weights W . 2 Output: A set of generated slogans S. 1: Keywords, Entity . GetKeywordsAndEntity(T ) 2: P . CreateInitialPopulation(SP, Keywords, Entity) 3: Evaluate(P ) 4: Iteration . 0 5: while Iteration < MaxIterations do 6: Children . CreateInitialPopulation(SP, Keywords, Entity) 7: NewGeneration . DeleteSimilarSlogans(P, Children) 8: while Size(NewGeneration) 0.05) between the samples of ranks analysed, con­.rming the suitability of Mann-Whitney U test to evaluate the hypothesis above mentioned (Section 4). Figures 1 and 2 illustrate as CDFs the 60 average pro.t values obtained with production plans generated under each strategy in problem instance 1 and 2, respectively. Both .gures clearly show that the CDFs of average pro.t obtained under BS, IS and HS dominate the CDF of pro.t obtained under ES (.rst-order stochastic dominance). Fur­thermore, results from Mann-Whitney U test statistically show that average pro.t values obtained under ES are stochastically smaller (p< 0.01) than the ones obtained under BS, IS and HS in both problem instances, as shown in Tables 5 and 6. These results demonstrate that ES is an inadequate noise handling strategy in our setting: this result is likely to be driven by the limited computational budget available, and a stronger performance of ES may potentially be achieved when considering performance upon convergence. Table 1: Parameters used for baseline, implicit averaging, explicit averaging and hybrid strategies BS IS ES HS n 1 1 10 1 P opulationSize 50 100 50 100 Generations 500 250 50 240 Fitness evaluations 25,000 25,000 25,000 . 25,000 Table 2: Probabilities for Pl and PDFs for .l Instance 1 Instance 2 Instance 1 and 2 .l Work centre (l) Pl Pl PDF Mean (d) 1 0.00 0.00 — — 2 0.10 0.30 Exponential 0.0847 3 0.25 0.45 Exponential 0.0935 4 0.15 0.35 Exponential 0.1338 5 0.00 0.00 — — 6 0.00 0.00 — — 7 0.00 0.00 — — Table 3: Descriptive statistics of average pro.ts and average computation times per strategy in problem instance 1 BS IS ES HS Mean (USD) 703,689 715,376 555,932 707,503 Minimum (USD) 550,417 484,229 460,416 550,014 Maximum (USD) 793,316 795,716 689,685 792,696 Std dev (USD) 55,539 60,416 47,322 60,754 Average computation time (s) 1256 1144 1235 1369 Table 4: Descriptive statistics of average pro.ts and average computation times per strategy in problem instance 2 BS IS ES HS Mean (USD) 707,600 703,420 543,140 723,460 Minimum (USD) 536,840 538,990 437,930 561,550 Maximum (USD) 785,810 783,040 656,800 790,460 Std dev (USD) 60,137 61,744 43,412 51,258 Average computation time (s) 1284 1108 1214 1265 No dominance (neither .rst nor second degree stochas­tic dominance) can be determined among the CDFs of av­erage pro.t obtained under BS, IS and HS in problem in­stance 1, where all three strategies appear equally compet­itive. These results are in accordance with results from Mann-Whitney U test, which indicate stochastic homo­geneity (p> 0.05) among samples obtained under BS, IS and HS in problem instance 1. It is interesting that popula­tion size (i.e. different setups of implicit averaging) has no signi.cant effect in this setting as evidenced under BS and IS. In problem instance 2, results from Mann-Whitney U tests also indicate stochastic homogeneity (p> 0.05) among samples obtained under BS, IS and HS. However, the CDFs of average pro.t obtained under HS dominates the CDFs of average pro.t obtained under IS and under BS, respectively (.rst and second-order stochastic dominance). These results indicate that, even in a setting with a lim­ited evaluation budget, the accurate .tness estimates from explicit averaging can be bene.cial to optimization. It is clear that the difference between IS and HS arises during the .nal stages of the optimization only, while IS contin­ues optimization during additional generations, HS focuses resources on explicit averaging across its last population. When seen in combination with the poor performance of ES, our results suggest that the trade-off between improved exploration (from evaluating more individuals) and accu­rate .tness evaluations (through simulation replicates for the same individual) needs to be carefully balanced in this setting. This .nding appears in line with previous research that has delivered contradictory results regarding the rela­tive performance of explicit and implicit averaging. Table 5: Values for Mann-Whitney U statistic obtained in problem instance 1 BSIS HS ES BS — 1525 1705 107** IS — — 1662 117** HS— — — 113** ES——— — ** p< 0.01 Table 6: Values for Mann-Whitney U statistic obtained in problem instance 2 BSIS HS ES BS — 1719 1549 103** IS — — 1485 96** HS—— — 36** ES——— — ** p< 0.01 J. E. Diaz and J. Handel 6 Conclusion Implicit averaging strategies reduce the impact of noise by having a suf.ciently large population, which ensures that individuals from promising regions of the search space are sampled repeatedly [10, 17]. On the other hand, explicit averaging strategies use average .tness values, obtained across a speci.c number of .tness evaluations, to ensure that evaluation of individuals is based on .tness estimates that are more robust to noise [10]. One of the key .ndings of this study was that, in the con­text of our real-world problem, a noise-handling strategy based on explicit averaging did not provide a competitive performance. More generally, this points to the fact that the computational costs incurred by simulation replicates may be problematic in constrained time settings. Furthermore, we found that implicit averaging per­formed robustly for both of the population sizes used. The performance of our hybrid strategy does indicate that some effort towards explicit averaging may become important with increasing variability. Under low levels of .tness variability, the hybrid strategy, implicit averaging and our baseline showed a comparable performance. This situation changed with increasing levels of .tness variability, when HS started to enhance overall performance. Compared to a pure, implicit averaging strategy, the hy­brid strategy misses out on the last few generations of op­timization. Our results show, however, that this disadvan­tage is more than counter-balanced by the bene.ts from an accurate .nal selection step that reduces the likelihood of choosing an inferior individual (in terms of average .tness) as the .nal solution. 7 Limitations and Future Research The relevance of obtaining more reliable .tness estimates increases with the level of variability, since there is a higher risk of choosing an inferior solution. It is, therefore, intu­itive that the .nal selection mechanism implemented in HS would be more bene.cial in such circumstances. But at the same time, the number of .tness evaluations needed to obtain reliable estimates is expected to raise with higher .tness variability, leaving to the evolutionary process a smaller share of the computational budget. Therefore, fur­ther research may focus on investigating the right trade-off between exploration (IS) and accurate .tness evaluation (ES). In this sense, the application of different sampling techniques (e.g., Latin Hypercube) during the .nal selec­tion mechanism might be worthy of future investigation, as it may allow a reduction in the number of .tness evalua­tions required in this last step. Our results underline issues around the computational cost of explicit averaging, but also highlight that sporadic use of this strategy may, nevertheless, be bene.cial. In this study, the use of explicit averaging in the hybrid strategy was limited to the .nal selection step only. Future research Figure 1: CDFs of average pro.t values obtained with production plans generated under the four different strategies in problem instance 1. Figure 2: CDFs of average pro.t values obtained with production plans generated under the four different strategies in problem instance 2. may consider the possibility of using explicit averaging at earlier points during the optimization process. Acknowledgement Juan Esteban Diaz expresses his gratitude to the Secretariat for Higher Education, Science and Technology of Ecuador, who has supported him with a doctoral scholarship. References [1] M. Bhattacharya, R. Islam, A. N. Mahmood (2014) Uncertainty and evolutionary optimization: A novel approach, Proceedings of the 9th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 988–993. [2] J. Branke, C. Schmidt (2003) Selection in the pres­ence of noise, Proceedings of the Genetic and Evolu­tionary Computation Conference (GECCO), pp. 766– 777. [3] J. Branke, C. Schmidt, H. Schmec (2001) Ef.cient .t­ness estimation in noisy environments. Proceedings of the Genetic and Evolutionary Computation Con­ference (GECCO), pp. 243–250. [4] K. Deb (2000) An ef.cient constraint handling method for genetic algorithms, Computer methods in applied mechanics and engineering, vol. 186, no. 2, pp. 311–338. [5] K. Deep, K. P. Singh, M. Kansal, C.Mohan (2009) A real coded genetic algorithm for solving integer and mixed integer optimization problems, Applied Math­ematics and Computation, vol. 212, no. 2, pp. 505– 518. [6] J. E. Diaz Leiva, J. Handl (2014) Simulation-based ga optimization for production planning, Proceedings of the Student Workshop on Bioinspired Optimization Methods and their Applications (BIOMA), pp. 27–39. [7] C. Ehrenberg, J. Zimmermann (2012) Simulation-based optimization in make-to-order production: scheduling for a special-purpose glass manufacturer, Proceedings of the Winter Simulation Conference (WSC), pp. 1–12. [8] G. Figueira, B. Almada-Lobo (2014) Hybrid simulation-optimization methods: A taxonomy and discussion, Simulation Modelling Practice and Theory, vol. 46, pp. 118–134. [9] M.-R. Ghasemi, J. Ignatius, A. Emrouznejad (2014) A bi-objective weighted model for improving the dis­crimination power in MCDEA, European Journal of Operational Research, vol. 233. no. 3, pp. 640–650. J. E. Diaz and J. Handel [10] Y. Jin, J. Branke (2005) Evolutionary optimization in uncertain environments -a survey, IEEE Transactions on Evolutionary Computation, vol. 9, no. 3, pp. 303– 317. [11] H. B. Mann, D. R. Whitney (1947) On a test of whether one of two random variables is stochasti­cally larger than the other, The annals of mathemati­cal statistics, vol. 18, no. 1, pp. 50–60. [12] F. Neri, X. del Toro Garcia, G. L. Cascella, N. Salva­tore (2008) Surrogate assisted local search in pmsm drive design, COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering, vol. 27, no. 3, pp. 573–592. [13] D. W. Nordstokke, B. D. Zumbo (2010) A new non-parametric levene test for equal variances, Psicológ­ica, vol. 31, no. 2, pp. 401–430. [14] C. Qian, Y. Yu, Y. Jin, Z.-H. Zhou (2014) On the ef­fectiveness of sampling for evolutionary optimization in noisy environments, Lecture Notes in Computer Science, vol. 8672, pp. 302–311. [15] G. Rudolph (2001) A partial order approach to noisy .tness functions, Proceedings of the Congress on Evolutionary Computation (CEC), vol. 1, pp. 318– 325. [16] Y. Sano, H. Kita, I. Kamihira, M. Yamaguchi (2000) Online optimization of an engine controller by means of a genetic algorithm using history of search, Pro­ceedings of the 26th IEEE Annual Conference of the Industrial Electronics Society (IECON), vol. 4, pp. 2929–2934. [17] A. Syberfeldt, A. Ng, R. I. John, P. Moore (2010) Evolutionary optimisation of noisy multi-objective problems using con.dence-based dynamic resam­pling, European Journal of Operational Research, vol. 204, no. 3, pp. 533–544. [18] S. Yitzhaki (1982) Stochastic dominance, mean vari­ance, and gini’s mean difference, The American Eco­nomic Review, vol. 72, no. 1, pp. 178–185. Data Mining-Assisted Parameter Tuning of a Search Algorithm Jurij Šilc Computer Systems Department, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia E-mail: jurij.silc@ijs.si Katerina Taškova Institute of Computer Science, Johannes Gutenberg University Mainz, Staudingerweg 9 55128 Mainz, Germany E-mail: ktaskova@uni-mainz.de Peter Korošec Computer Systems Department, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia, and Faculty of Mathematics, Science and Information Technologies, University of Primorska Glagoljaška 8, SI-6000 Koper, Slovenia E-mail: peter.korosec@ijs.si Keywords: data mining, differential ant-stigmergy algorithm, low-discrepancy sequences, meta-heuristic optimization, parameter tuning Received: December 1, 2014 The main purpose of this paper is to show a data mining-based approach to tackle the problem of tuning the performance of a meta-heuristic search algorithm with respect to its parameters. The operational behavior of typical meta-heuristic search algorithms is determined by a set of control parameters, which have to be .ne-tuned in order to obtain a best performance for a given problem. The principle challenge here is how to provide meaningful settings for an algorithm, obtained as result of better insight in its behavior. In this context, we discuss the idea of learning a model of an algorithm behavior by data mining analysis of parameter tuning results. The study was conducted using the Differential Ant-Stigmergy Algorithm as an example meta-heuristic search algorithm. Povzetek: Osnovni namen ˇ clanka je pokazati, kako se lahko z uporabo tehnik podatkovnega rudar­jenja lotevamo problema uglaševanja sposobnosti metahevristˇ cnega iskalnega algoritma z vidika njegovih parametrov. Delovanje znaˇcnega iskalnega algoritma je doloˇ cilnega metahevristiˇceno z naborom njegovih krmilnih parametrov, ki morajo biti za dosego najboljših sposobnosti pri danem problemu dobro uglašeni. Temeljni izziv je kako zagotoviti najboljšo nastavitev algoritma, ki bo rezultat vpogleda v njegovo vedenje. V zvezi s tem razpravljamo o ideji uˇcenja modela za obnašanje algoritma na osnovi analize podatkovnega rudarjenja rezultatov uglaševanja njegovih parametrov. Študija je narejena z uporabo Diferencialnega al­goritma s stigmergijo mravelj, kot primera metahevristiˇ cnega iskalnega algoritma. 1 Introduction The research interest for meta-heuristic search algorithms has been signi.cantly growing in the last 25 years as a result of their ef.ciency and effectiveness to solve large and complex problems across different domains [2]. The state-of-the-art nature-inspired meta-heuristic algorithms for high-dimensional continuous optimization include also algorithms inspired from the collective behavior of social organisms [14]. One such algorithm, which we will address in this paper, is the Differential Ant-Stigmergy Algorithm (DASA) ini­tially proposed by Korošec [6], and further improved in [8]. DASA is inspired by the ef.cient self-organizing behav­ior of ant colonies emerging from a pheromone-mediated communication, known as stigmergy [3]. One of the .rst stigmergy-based algorithms designed for continuous func­tion optimization was Multilevel Ant Stigmargy Algorithm [7]. Naturally, DASA can be classi.ed within the Ant Colony Optimization (ACO) framework. However, the use of a continuous pheromone model in the form of Cauchy prob­ability distribution with representation of the search space in the form of the so-called differential graph makes DASA dissimilar from the original ACO paradigm. The rationale behind DASA is in memorizing (via the pheromone model updates) the “move” in the search space that improves the current best solution and using it in further search. As it is the case with most of the meta-heuristic algorithms, the operational behavior of DASA is determined by a set of control parameters. In practice, these parameters have to be .ne-tuned in order to obtain best performance of the al­gorithm. It can be quite inconvenient for the users as: – they usually do not have insight into the behavior of the algorithm, and – even if a default setting exists, it may not be adequate for a speci.c instance or type of a problem. Moreover, parameter tuning is computationally expensive task. The principle challenge here is how to provide meaning­ful (default) settings for DASA, obtained as result of better insight into the algorithm’s behavior. Furthermore, can we .nd optimal regions in DASA parameter space by analyz­ing the patterns in the algorithm’s behavior with respect to the problem characteristics? Related to this, we discuss the preliminary .ndings based on data mining analysis of pa­rameter tuning results. More precisely, the parameter tun­ing task is approached by two-step procedure that combines a kind of experimental design with data mining analysis. We use Sobol’ sequences [10] for even sampling of the algorithm parameter space to generate a large and diverse set of parameter settings. These are used as input to DASA to be tuned on a given function optimization problem. The performance of DASA on the given function optimization problem, in terms of function error, is captured at different time points for all sampled parameter settings. The data collected in the .rst step, DASA performance with corre­sponding parameter settings, is subject for intelligent data analysis, i.e., multi-target regression with Predictive Clus­tering Trees [1]. Parameter sampling combined with regression has been already used by Stoean et al. [11] for tuning meta­heuristics: Latin hypercubes parameter sampling is com­bined with single-target regression with Support Vector Machines. Our approach modi.es the former by replacing the Latin hypercube sampling by Sobol’ sequences, as the former is best suited in the case when a single parameter dominates the algorithm’s performance, while it should be used with care if there are interactions among the sampled parameters [9]. Moreover, we de.ne the regression task as multi-target regression, taking into account more than one target (in this case the function error at few time points) with the goal to .nd parameter settings for the given al­gorithm that will not only solve the problem but will also solve the optimization problem fastest. The reminder of this paper is structured as follows. Sec­tion 2 introduces the differential ant-stigmergy algorithm. Then, Section 3 addresses the parameter tuning task and Section 4 presents the experimental evaluation with the re­sults. After that, Section 5 discusses the idea of post-hoc analysis of parameter tuning by data mining. Finally, Sec­tion 6 summarizes this study and outlines possible direc­tions for further work. 2 The Differential Ant-Stigmergy Algorithm The version of DASA used in our experimental evaluation is described in details by Korošec et al. [8] (see Figure 1). J. Šilc et al. DASA introduces the concept of variable offsets (re­ferred as to parameter differences) for solving the continu­ous optimization problems. By utilizing discretized offsets of the real-valued problem parameters, the continuous op­timization problem is transformed to a graph-search prob­lem. More precisely, assuming a multidimensional param­eter space with xi being the current solution for the i-th parameter, we de.ne new solutions xas follow: i x = xi + ..i, (1) i where .i is called the parameter difference and is selected from the following set: .i =.- .{0}. .+ i , (2) i where .- = {.- | .- = -bk+Li-1,k =1, 2,...,di} (3) i i,ki,k and .+ = {.+ | .+ = bk+Li-1,k =1, 2,...,di}. (4) i i,ki,k Here, di = Ui - Li +1, (5) Li = llogb(Ei)J, (6) Ui = llogb(max(xi) - min(xi))J, (7) i =1, 2,...,D, D is dimension of the problem, b is the discretization base, E is the maximal computer arithmetic’s precision, and the weight . = Random_Integer(1,b - 1) is added to enable a more .exible movement over the search space. In principle, DASA relies on two distinctive characteris­tics, differential graph and continuous pheromone model. Here, we will brie.y discuss these two characteristics and outline the main loop of the DASA search process. First, DASA transforms the D-dimensional optimization problem into a graph-search problem. The corresponding differential graph is a directed acyclic graph obtained by .ne discretization of the continuous parameters’ offsets. The graph has D layers with vertices, where each layer corresponds to a single parameter. Each vertex corresponds to a parameter offset value that de.nes a change from the current parameter value to the parameter value in the next search iteration. Furthermore, each vertex in a given layer is connected with all vertices in the next layer. The set of possible vertices for each parameter depends on the pa­rameter’s range, the discretization base and the maximal computer arithmetic’s precision, which de.nes the minimal possible offset value. Ants use these parameters’ offsets to navigate through the search space. At each search itera­tion, a single ant positioned at a certain layer moves to a speci.c vertex in the next layer, according to the amount of pheromone deposited in the graph vertices belonging to this layer. Figure 1: The Differential Ant-Stigmergy Algorithm Second, DASA performs pheromone-mediated search that involves best-solution-dependent pheromone distribu­tion. The amount of pheromone is distributed over the ver­tices according to the Cauchy Probability Density Function (CPDF) [9]. DASA maintains a separate CPDF for each parameter. Initially, all CPDFs are identically de.ned by a location offset set to zero and a scaling factor set to one. As the search process progresses, the shape of the CPDFs changes: CPDFs shrink and stretch as the scaling factor de­creases and increases, respectively, while the location off­sets move towards the offsets associated with the better so­lutions. The search strategy is guided by three user-de.ned real positive factors: the global scale increase factor, s+, the global scale decrease factor, s-, and the pheromone evaporation factor, .. In general, these three factors de­.ne the balance between exploration and exploitation in the search space. They are used to calculate the values of the scaling factor and consequently in.uence the dispersion of the pheromone and the moves of the ants. Finally, the main loop of DASA consists of an iterative improvement of a temporary-best solution, performed by searching appropriates paths in the differential graph. The search is carried out by m ants, all of which move simul­taneously from a starting vertex to the ending vertex at the last level, resulting in m constructed paths. Based on the found paths, DASA generates and evaluates m new can­didate solutions. The best among the m evaluated solu­tions is preserved and compared to the temporary-best so­lution. If it is better than the temporary-best solution, the latter is replaced, while the pheromone amount is redis­tributed along the path corresponding to the path of the pre­served solution and the scale factor is accordingly modi.ed to improve the convergence. If there is no improvement over the temporary-best solution, then the pheromone dis­tributions stay centered along the path corresponding to the temporary-best solution, while their shape shrinks in or­der to enhance the exploitation of the search space. If for some .xed number of tries all the ants only .nd paths com­posed of zero-valued offsets, the search process is restarted by randomly selecting a new temporary-best solution and re-initializing the pheromone distributions. 3 Parameter Tuning To obtain the best possible performance on a given prob­lem, one should consider a task speci.c tuning of the pa­rameter setting for the optimization algorithm used. Deter­mining the optimal parameters is an optimization task in it­self, which is extremely computationally expensive. There are two common approaches for choosing parameters val­ues: parameter tuning and parameter control. The .rst ap­proach selects the parameter settings before running the op­timization algorithm (and they remain .xed while perform­ing the optimization). The second approach optimizes the algorithm’s parameters along with the problem’s parame­ters. Here, we will focus on the .rst approach, parameter J. Šilc et al. tuning. A detailed discussion and survey of parameter tuning methods is given by Eiben and Smit [4]. According to this survey, one way to approach parameter tuning is by sampling methods. Sampling methods reduce the search effort by decreasing the number of investigated parameter settings as compared to the full factorial design: the ba­sic full factorial design investigates 2k parameter settings, subject to k parameters, each of which have 2 possible val­ues; in the more general case, parameters can have arbitrary number of values; moreover, an increase in the number of investigated parameters means an exponential increase in the number of parameter settings to be tested. Two widely used sampling methods are Latin-squares [9] and Taguchi orthogonal arrays [12]. However, these are not the most ro­bust sampling techniques, e.g., Latin-squares or Latin hy­percube sampling is good in the case where one of the pa­rameters dominates the algorithm’s performance, while it should be used with care if there are interactions among the parameters. Ultimately, we would like to .nd a sampling schema that will be able to detect the interactions among the parame­ters, will be independent from user-speci.ed information regarding the particular parameter values to be considered (typical for factorial design), and will deliver small but rep­resentative sample of the parameter search space. The .rst two requirements are satis.ed by the pure random sam­pling, but the last is not, as random sampling does not guar­antee that the sampled values are evenly spread across the entire domain. The so-called low-discrepancy sequences were specially designed to ful.ll all three requirements. Therefore, Sobol’ sequences, a representative variation of low-discrepancy sequences introduced by Sobol’ [10], was considered for sampling the parameter space of DASA in this study. Sobol’ sequences, sampled from a D-dimensional unit search space, are quasi-random sequences of D-tuples that are more uniformly distributed than uncorrelated random sequences of D-tuples. These sequences are neither ran­dom nor pseudo-random: they are cleverly generated not to be serially uncorrelated by taking into account which tu­ples in the search space have already been sampled. For a detailed explanation and overview of the schemas for gen­erating Sobol’ sequences, we refer to [9]. The particular implementation of Sobol’ sampling used in our analysis is based on the Gray code order [5]. 4 Experimental Evaluation Since data mining methods can only discover patterns actu­ally present in the data, the dataset subject to analysis must be large enough and informative enough to contain these patterns, i.e., to describe different types of algorithm’s be­havior. For this reason, we decided to use a simple test function, which matched this requirement and was used for building an example case model. Table 1: Parameter settings for DASA* and DASA. Algorithm DASA. DASA* Parameter D = 20 D = 40 D = 20 D = 40 m 10105 7 . 0.2 0.2 0.324 0.388 s+ 0.02 0.02 0.201 0.136 s- 0.01 0.01 0.289 0.344 b 10106 8 Table 2: Median values of the function errors for the Sphere function Algorithm DASA. DASA* FEs D = 20 D = 40 D = 20 D = 40 25 × D 16.7 18.3 2.53 9.31 250 × D 0.0003 0.0021 0 0 2500 × D 0 0 0 0 25000 × D 0 0 0 0 Therefore, the performance of DASA was evaluated on the Sphere function: f(x)= |z|2 + f(xopt), (8) where z = x - xopt and xopt is optimal solution vector, such that f(xopt) is minimal. Function f(x) is de.ned over D-dimensional real-valued search space x and is scalable with the dimension D. It has no speci.c value of its optimal solution (it is randomly shifted in x-space) and has an arti.­cially chosen optimal function value (it is randomly shifted in f-space). In this study, we considered the Sphere func­tion with respect to two dimensions, D = 20 and D = 40. The performance of DASA is dependent on the values of .ve parameters: three real-valued parameters that directly in.uence the search heuristic (s+, s-, and .) and two integer-valued parameters (m and b). Therefore, we con­sidered all of them for tuning DASA performance on the Sphere function for both search space dimensions, D = 20 and D = 40. Using the Gray-code-based Sobol’ genera­tor we generated 5000 parameter settings (5-tuples). Note that the Sobol’ sampling generates numbers on the unit in­terval: in order to obtain the true parameter settings, we mapped these values on the prede.ned search range of pa­rameter values. The latter for each of the .ve tuned param­eters was de.ned as follows: 4 . m . 200, 0 . . . 1, 0 . s+ . 1, 0 . s- . ., and 2 . b . 100. Moreover, the mapped values for the integer-valued parameters m and b were rounded to the closest integer value. Finally, due to implementation reasons, the upper bound of the global scale decrease factor s- was actually limited by the value of the evaporation factor .. In the next step, the performance of the Sobol’ sampled parameter settings were tested on the Sphere benchmark function. Due to the stochastic nature of DASA, every parameter setting was used in a multiple-run experimental evaluation. Each run included 25000 × D function evalua­tions (FEs). The number of runs was set to 15. The results gathered by the parameter tuning process are most often subjected to ordinal data analysis, which includes ranking of the different sampled parameter sets according to some calculated statistics, e.g., best or mean performance of the algorithm in some prede.ned number of runs [13]. In this case, performance of the algorithm is expressed in terms of the function error, i.e., the difference between the ob­tained and optimal function value. In order to .nd a setting that will be satisfactory in terms of convergence speed, we captured the error values at four different time points, cor­responding to 25 × D, 250 × D, 2500 × D, and 25000 × D FEs. The optimal performing parameter setting was chosen based on the median best performance over all runs aggre­gated over all time points for a given dimension (D = 20 and D = 40). A common approach is to use the mean performance, but we took the median in order to avoid the problems that the mean has when observing large vari­ance in the function values across the runs. More precisely, given a function, an individual rank is assigned to every setting (out of 5000) for the four time points. A single .­nal rank is calculated by ranking the sum of the four in­dividual rankings assigned to the parameter settings. The best-ranked parameter setting for a given dimension de.nes instance of DASA referred to as DASA*. The results of DASA tuning subject to ordinal data anal­ysis are presented in Tables 1 and 2. Table 1 reports the tuned parameter settings for both DASA* instances. In addition, the default parameter setting for DASA from [8] is given as a reference for comparison. The correspond­ing instance is referred to as DASA. . Results in Table 2 represent the median values of the function errors, at the four time points, obtained by DASA* and DASA. instances for both dimensions. Note, that er­ror value below 10-8 was treated as zero. The table clearly shows that DASA* is better than DASA. . Figure 2: Median error distributions for the Sphere func­tion in the case of D = 40. 5 Data Mining Analysis Parameter tuning of an algorithm leads to a better per­formance, however it is a computationally expensive and problem-dependent task. Considering this, the idea is to extend the simple tuning that delivers a single parameter set and analyze the gathered data in an intelligent way. The intelligent analysis can extract patterns (regularities) in the explored parameter space that de.ne a speci.c behavior of DASA. To this end, data mining methods for automated discovery of patterns in data can be used. As data mining methods can only discover patterns that are present in the data, the dataset subject to analysis must be large enough and informative enough to contain these patterns, i.e., to describe different types of algorithm’s behavior. Related to this, we considered a data mining approach on a represen­tative example, i.e., error model of the Sphere function. To begin with, consider the graph in Figure 2 that visu­alizes the Sphere function error distribution. The graph de­picts the distribution of the median error values obtained by 5000 parameter settings at four different points for D = 40. As we are more concerned with the practical signi.cance between a large and a small error value than the statisti­cal signi.cant difference between two actual error values, the error values are discretized into nine intervals, each of which is represented with a color chosen according to the error magnitude between black (error below 10-8) and white (error above 102). The graph clearly shows that the sampled settings determine different DASA performance. As evident, there is a big cluster of parameter settings that solve this function to the aimed accuracy (error below 10-8) in the given time budget (106 FEs). Moreover, subset of this cluster solves the function for an order of magnitude less FEs. Our aim, therefore, is to .nd a (common) descrip­tion of this cluster, in terms of DASA parameter relations, J. Šilc et al. that represents a good behavior of DASA (as well as what parameter relations lead to a bad DASA performance). For this purpose, we formulated the problem as a predic­tive modeling task using decision trees to model the func­tion error values in terms of the values of DASA parame­ters. Since the function error variables are continuous, the task at hand is a regression task. Furthermore, as our goal is to model the behavior of DASA at all time points simul­taneously, the problem at hand is then a multi-target re­gression. To this end, we used Predictive Clustering Trees (PCTs) which are implemented in Clus system [1]. In this case, the median error at the four time points de.ne the four target attributes considered for modeling (with the PCT) in dependence from the descriptive attributes, i.e., the func­tion dimension and the .ve DASA parameters. The re­sulting dataset is composed of 10002 rows described with the 5001 parameter settings (including the default setting) of DASA applied to the two different dimensions, and 10 columns corresponding to the 10 attributes. Figure 3 presents a PCT model for the Sphere function error. Each internal node of the tree contains a test on a descriptive attribute, while the leaves store the model pre­dictions for the function error, i.e., a tuple of four values. The predictions are calculated as the mean values of the corresponding error values for the data instances belonging to the particular leaf (represented by a box). In fact, each leaf identi.es one cluster of data instances (the size of the cluster is the value in the small box). The predictive perfor­mance of the model was assessed with 10-cross-validation. Note that this particular model was learned on the com­plete dataset subject to constraints on the maximal tree size of 25 nodes. We did this because the original model con­tained 1643 nodes (of which 822 leaves) and despite its better predictive performance, both training and testing, it was not comprehensible; aiming for an explanatory model, small and comprehensible, we considered the smaller tree obtained with the limitation of the size. The predictive per­formance of both models in terms of Root Relative Mean Squared Error (RRMSE) and Pearson Correlation Coef.­cient (PCC) are given in Table 3. Note that RRMSE rep­resents the relative error with respect to the mean predictor performance, while PCC represents the linear correlation between the data and the model predictions. Good models have RRMSE values closer to 0 and PCC closer to 1. The model in Figure 3 outlines 13 clusters of data in­stances, of which two (depicted with light-gray boxes) represent a good DASA performance. According to this model, the number of ants, m, is the most important DASA parameter for its performance on the Sphere func­tion. More precisely, if m> 83, independent of the values of the other parameters, DASA solves the 20-dimensional functional problem for the given time budget. Moreover, if m . 83 another DASA parameters become important as well. For example, if 43 0.009 and D = 20 then DASA solves the function with error 3 × 10-6, while the pattern m . 43 and s.0.040 and s- > 0.656 describes a poor DASA performance regard­ Figure 3: Predictive clustering tree representation of the error model for the Sphere function. less of the function dimension. An interesting fact is that, the evaporation factor is not essential for DASA perfor­mance on the Sphere function. Moreover, the model also shows that is more dif.cult to describe the behavior of DASA for the 40-dimensional function problem than the 20-dimensional one. Finally, note that the training performance (learned on the complete dataset) of the model in terms of the error and the correlation coef.cient is best for the .rst target, while it gets worse with respect to the other three targets (see Ta­ble 3). This is especially signi.cant if we take into account the testing performance of the model estimated with 10­cross-validation. However, the training performance is ac­ceptable in our case, as we are interested in understanding the behavior of DASA and not aiming to obtain a model for prediction. 6 Conclusion The principle challenge of meta-heuristic design is provid­ing a default algorithm con.guration, in terms of parameter setting, that will perform reasonably well in general (prob­lem) case. However, while it is a good initial choice, the default algorithm con.guration may result in low quality solutions on a speci.c optimization problem. In practice, the algorithms parameters have to be .ne-tuned in order to obtain best algorithm’s performance for the problem at hand, leading to the computational expensive task of pa­rameter tuning. So, if the tuning task is unavoidable, the question is: can we use the results from the parameter tun­ing to extract some knowledge about the algorithm’s be­havior? Related to this, the study focused on the problem of tun­ing the performance of the Differential Ant-Stigmergy Al­gorithm (DASA) with respect to its parameters. As it is the case with most of the meta-heuristic algorithms, the oper­ational behavior of DASA is determined by a set (.ve) of control parameters. The existing default setting of DASA parameters [8] is obtained by experimentation with both real and benchmark optimization problems, but not as a result of some systematic evaluation. Furthermore, there is no deeper understanding of the impact of a particular parameter or parameters relations on the performance of DASA. In this context, we performed a systematic evalua­tion of DASA performance obtained by solving the Sphere function optimization problem with 5000 Sobol’ sampled DASA parameter settings regarding two dimensions, 20 and 40. Furthermore, we discussed the idea of learning a model of DASA behavior by data mining analysis of the parame­ter tuning results. In this context, we formulated the prob­lem as multi-target regression and applied predictive clus­tering trees for learning a model of DASA behavior with respect to the function error performance. The obtained model revealed that the parameter denoting number of ants is the most important parameter for DASA performance on the 20-dimensional function problem. On the other hand, the evaporation factor is not essential for DASA perfor­mance on the Sphere function. Further work will focus on additional experimental eval­uation and data mining analysis of data with respect to more complex functions problems. This idea can be fur­ Table 3: Model performance with respect to RRMSE and PCC Measure RRMSE PCC Training 1643 nodes 25 nodes 1643 nodes 25 nodes 25 × D 0.166 0.408 0.986 0.913 250 × D 0.373 0.679 0.928 0.734 2500 × D 0.487 0.562 0.873 0.827 25000 × D 0.472 0.546 0.882 0.837 Mean 0.396 0.557 0.843 0.689 Testing 1643 nodes 25 nodes 1643 nodes 25 nodes 25 × D 0.219 0.450 0.976 0.893 250 × D 0.638 0.817 0.777 0.584 2500 × D 1.019 1.039 0.304 0.246 25000 × D 1.053 1.055 0.234 0.218 Mean 0.806 0.875 0.426 0.312 ther extended to building models of DASA behavior that will include the optimization problem characteristics (such as multimodality, separability, and ill-conditioning) as de­scriptive attributes as well. The latter can provide insights on how to con.gure DASA performance with respect to the type of the optimization problem. Moreover, these in­sights can serve as a valuable information for improvement of DASA design. References [1] H. Blockeel, J. Struyf (2002) Ef.cient Algorithms for Decision Tree Cross-validation, Journal of Machine Learning Research, vol. 3, pp. 621–650. [2] C. Blum, A. Roli (2003) Metaheuristics in Combina­torial Optimization: Overview and Conceptual Com­parison, ACM Computing Surveys, vol. 35, no. 3, pp. 268–308. [3] E. Bonabeau, M. Dorigo, G. Theraulaz (1999) Swarm Intelligence: From Natural to Arti.cial Systems, Ox­ford University Press. [4] A. E. Eiben, S. K. Smit (2011) Parameter Tuning for Con.guring and Analyzing Evolutionary Algorithms, Swarm and Evolutionary Computation, vol. 1, no. 1, pp. 19–31. [5] S. Joe, F. Y. Kuo (2008) Constructing Sobol Se­quences with Better Two-dimensional Projections, SIAM Journal on Scienti.c Computing, vol. 30, no. 5, pp. 2635–2654. [6] P. Korošec (2006) Stigmergy as an Approach to Metaheuristic Optimization, Ph.D. dissertation, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia. [7] P. Korošec, J. Šilc (2008) Using Stigmergy to Solve Numerical Optimization Problems, Computing and Informatics, vol. 27, no. 3, pp. 341–402. [8] P. Korošec, J. Šilc, B. Filipiˇc (2012) The Differential Ant-stigmergy Algorithm, Information Sciences, vol. 192, no. 1, pp. 82–97. [9] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery (1992) Numerical Recipes, Cambridge Uni­versity Press. [10] I. M. Sobol’ (1967) Distribution of Points in a Cube and Approximate Evaluation of Integrals, USSR Com­pututational Mathematics and Mathematical Physics, vol. 7, no. 4, pp. 86–112. [11] R. Stoean, T. Bartz-Beielstein, M. Preuss, C. Stoean (2009) A Support Vector Machine-Inspired Evolu­tionary Approach for Parameter Setting in Meta­heuristics, CIOP Technical report 01/09, Faculty of Computer Science and Engineering Science, Cologne University of Applied Science, Germany. [12] G. Taguchi, T. Yokoyama (1993) Taguchi Methods: Design of Experiments, ASI Press. [13] E.-G. Talbi (2009) Metaheuristics: From Design to Implementation, John Wiley & Sons. [14] X.-S. Yang (2008) Nature-Inspired Metaheuristic Al­gorithms, Luniver Press. A High Resolution Clique-based Overlapping Community Detection Algorithm for Small-world Networks András Bóta University of Szeged, Institute of Informatics Address, P. O. Box 652., 6701 Szeged, Hungary E-mail: bandras@inf.u-szeged.hu Miklós Krész University of Szeged, Juhász Gyula Faculty of Education, Boldogasszony bvd. 6, 6720 Szeged, Hungary E-mail: kresz@jgypk.u-szeged.hu Keywords: network science, community detection, overlapping communities Received: June 24, 2013 In this paper we propose a clique-based high-resolution overlapping community detection algorithm. The hub percolation method is able to .nd a large number of highly overlapping communities. Using different hub-selection strategies and parametrization we are able to .ne tune the resolution of the algorithm. We also propose a weighted hub-selection strategy, allowing the algorithm to handle weighted networks in a natural way, without additional .ltering. We will evaluate our method on various benchmarks, and we will also demonstrate the usefulness of our algorithm on a real-life economic case-study. Povzetek: Predstavljena je nova hevristika za reševanje evklidskega BDMST problema. Primerjalni testi pokažejo prednosti pred obstojeˇ cimi metodami. 1 Introduction One of the landmarks in graph theory was the introduc­tion of small-world networks by Watts and Strogatz [31]. They have observed, that in real-life networks, the typical distance between two randomly chosen nodes grows pro­portionally to the logarithm of the number of nodes in the network. Since then, several other properties of real-life networks was discovered. The degree distribution of these networks follows a power-law [2], and the edge distribution is not only globally, but also locally inhomogeneous. This latter feature is called community structure [11]. The goal of community detection is the discovery of this structure. While the phenomenon of communities is well observed, an exact de.nition is dif.cult to .nd. In recent years, a large number of community detection algorithms have been proposed. Most of these consider communities to be disjoint vertex sets, and adopt the fol­lowing intuition: They are looking for a partitioning of the nodes, which maximizes the number of edges between the nodes inside the sets, and minimizes them between the sets. It is also a goal to .nd meaningful communities, i.e. they discard trivial solutions of the problem (like a single com­munity containing all of the vertices). Newman proposed modularity [27] as an ef.cient way to measure the good­ness of disjoint communities. A comprehensive review of community detection can be found in [10]. The traditional de.nition of community allows disjoint vertex sets only. Based on the observation that in real-life networks, nodes can belong to multiple communities, Palla et al. introduced the concept of overlapping community de­tection and proposed the clique percolation method [29] as a solution. The idea of .nding maximal cliques and joining them according to some criteria is the basis of several over­lapping community detection algorithms [17, 21]. Other approaches are based on block models [12, 7], edge cluster­ing [9, 1], label propagation [13] or optimization according to some .tness function [25, 20]. Measuring the goodness of overlapping community de­tection algorithms is complicated, since there is no agree­ment on the de.nition of an overlapping community. The speci.cations of different applications depend mainly on the ratio of overlaps between communities: several ap­proaches require only a loose relaxation of the original ”non-overlapping” de.nition in such way that occurence of nodes belonging to multiple communities is strongly restricted [13]; other concepts prefer highly overlapping community structure [20]. The resolution of the methods are closely tied to the ratio of overlaps. A highly overlap­ping community structure is often associated with a large number of relatively small communities, however the op­posite is not always true. Hierarchical or multiresolutional methods combine these approaches. Corresponding to this, the output of the above mentioned algorithms can be fundamentally different. There are basi­cally two types of evaluation in the literature: one can use some kind of benchmark network like in [18, 22, 26, 32], and compare the results to the already known community structure of the network [14, 28]. Another option is using a real-life application as an example, similar to a case-study. In this paper, the authors propose the hub percolation overlapping community detection method. A node, that is a member of many adjacent cliques is considered more im­portant. We refer to these nodes as hubs. We expand and join cliques if they contain the same hubs. One of the ad­vantages of this method is, that both the hub selection and the joining criteria is adjustable. This allows us to discover different kinds of community structures from large, loosely overlapping groups to ones with a dense, highly overlap­ping structure. We also propose a hub-selection strategy able to handle weighted networks in a natural way with­out the need for .ltering or pruning edges. Finally, we will rely on the framework proposed by Pluhár et al. in [3], and show how several popular algorithms can be represented in it. We will use well-known benchmark networks [29, 26, 11] to demonstrate the difference between hub selection strategies in terms of community sizes, the size of over­laps and the number of singletons: nodes without com­munities. Then we will evaluate the performance of our method in two different ways. We will use the community based graph generator of Lancichinetti and Fortunato [18] to compare the results of our method to the OSLOM al­gorithm of the same authors [20], the COPRA method of Gregory [13], and the clique percolation method of Palla et al. [29]. We will also present a case study: we will exam­ine the communities of an economic network constructed from the Hungarian company register. We will focus our attention on three aspects of the companies: the geograph­ical location of them, the industrial sector they belong to and the age of the companies. 2 General framework The authors of [3] described a general framework for over­lapping community detection. In this section we summa­rize their approach. Here, and throughout the paper, by a graph G we mean an undirected simple graph with vertex (or node) set V (G) and edge set E(G). Edges might have arbitrary weight. According to the framework introduced in [3], most community detection algorithms consist of two phases. Taking an input graph G, the .rst phase constructs a hyper-graph F =(V, H), where V (F)= V (G), and H. 2V . The elements of H are considered the building blocks of communities. The second phase adds a distance function d to set H, creating a metric space M =(H,d). Using function d, a clustering algorithm creates a set of clusters C. Finally, the arising clusters are associated to the subsets of V such that Ki = .H.Ci.CH, where Ki, the ith com­munity corresponds to Ci, the ith cluster and Ki is just the union of the vertex set of those hyperedges that belong to Ci. It is easy to show, that this framework applies to most A. Bóta et al. community detection algorithms. In the case of the clique percolation method [29], H contains the k-cliques1 of the original graph, and function d is: 1, if |Ki . Kj| = k - 1,d(Ki,Kj )= ., otherwise In the same paper [3], the authors have proposed the N++ community detection algorithm with a general distance function, where H is the same as above and d(Ki,Kj)=1 only if |Ki . Kj|. c, where c is a param­eter of the algorithm. In other cases d(Ki,Kj)= .. This method has proven its usefulness in applications [6, 16]. It is also possible to describe non-clique based methods using this formulation. In the case of COPRA [13], each element of H initially only contains one unique vertex v . V (G). In the second phase, these are joined according to a belonging coef.cient. A threshold is introduced to provide a lower bound for community membership. 3 The hub percolation method The motivation for creating an advanced community de­tection algorithm came from our previous work with the general framework of community detection [3]. Our aim was to create a .exible clique-based method taking into consideration our experiences with the clique percolation method [29] and the N++ method [3]. Much of the details of the algorithm described in this section comes from expe­riences gained during test runs on well-known benchmark networks like [32, 29, 26, 22]. The hub percolation method has two simple ideas at its core. A natural property of most approaches for overlap­ping community detection2 is that cliques (fully connected subgraphs) are considered to be the purest communities. Therefore our method uses cliques at the beginning of the building process. An important observation on real-life net­works is, that inside a community some members are more important than others with respect to the role of the nodes in connecting different communities. We will denote these nodes as hubs. In the building process the cliques of the graph are extended according to a limited percolation rule: two k-cliques are joined if they share k - 1 vertices. As a result of this process, the set of extended cliques consists of the building blocks of community detection. The joining phase of our method merges these extended cliques if they share the same hubs. Considering these ideas, an outline of the hub percolation algorithm is as follows: 1. Find the set C of all maximal cliques of size greater or equal than 3 on graph G. 2. Select the set of hubs H. 3. Create the set of extended cliques C. . 1Fully connected subgraphs containing exactly k nodes. 2In the following chapters, we will refer to overlapping community detection simply as community detection. 4. Compute the set of communities K: Take the union of extended cliques if one of them contains all the hubs in the other one. Finding the set of all maximal cliques in a graph is a well-studied NP-hard problem of graph theory. Unfortu­nately an n-vertex arbitrary graph may contain i3n/3 maxi­mal cliques in the worst case [24]. Because of their unique structure, this number is signi.cantly lower in small world networks allowing algorithms like in [30, 4, 8] to list the set of maximal cliques in reasonable time even for large net­works. In this work we used the modi.ed Bron-Kerbosch algorithm described in [8]. The hub selection strategy is an important part of the algorithm. Hubs represent the locally important nodes in the network. As a consequence, whether a node is a hub should depend on the t-neighborhood3 of the given node, where t is a small number. In our interpretation hubs con­nect communities, therefore the deciding factor in hub se­lection should be the number of cliques the vertex belongs to. Each node v is assigned a hub value hv according to the above rule, then some of them are selected if their value is higher than the average or median hub values in their t-neighborhood. It is also possible to extend the selection strategy to weighted networks. We will discuss hub selec­tion in the next subsection. In our method, cliques of the network are extended with a a one-step percolation rule, then merged if they share the same hubs. Introducing the .ltering parameter k . 2, let us consider all cliques of size equal to k on the subgraph induced by the set of hubs H. We will denote the set of k-cliques on G[H] as CH . Then, we expand the elements of CH according to a one-step percolation rule. Let Ce denote the set of merged cliques ce = cH .c0 .···.c. with cH . CH , c0,...,c. . C and |c0 . cH |. 2,..., |c. . cH |. 2 4 . The last step of our method corresponds to the joining phase of the community detection framework. We merge elementary communities if they contain the same hubs, more precisely, we take the union of two elementary com­munities ce0 and ce1 if ce0 . H . ce1 . H. We iterate this process by adding the new merged clique to Ce and removing the original ones. At the end of the process Ce contains the communities of graph G. Note, that depend­ing on the hub selection strategy Ce may contain duplicate members, the merging process eliminates this problem as well. Each element of Ce is a union of the cliques of G and contains at least k hubs. We will refer to the members of Ce as elementary communities. The hub percolation method Input : Graph G, parameter k Informatica 39 (2015) 177–187 179 1. Find all maximal cliques of graph G using any exact algorithm or heuristic. Let C denote the set of cliques. 2. For all v . V (G), let hv = |Hv|, Hv = {h|v . h, h . C}. 3. Select the set of hubs H according to the hub selection strategy. 4. Let CH denote the set of k-cliques on the subgraph induced by the hubs G[H]. 5. Create the set of extended cliques Ce according to the following rule: for all cH . CH .nd all cliques c . C where |c . cH |. 2. Let c0,...,c. denote the cliques satisfying this criterion. Create the union of cliques ce = cH . c0 .· ··. c., and add ce to Ce. 6. For all ce0 ,ce1 . Ce add the union of them to Ce if ce0 . H . ce1 . H, and remove ce0 and ce1 from Ce. Iterate until there are no more merges. 7. The set Ce contains the communities of graph G. Figure 1: The communities of Zachary’s karate club net­work [32]. Hubs are marked as diamond shapes. Nodes with multiple colors indicate overlapping nodes. The me­dian hub selection strategy was used with k =2. Nodes 9, 3, 33 form an additional community and node 9 belongs to three communities. It is easy to see, that the general framework proposed in applies to the hub percolation method. The edges of the hypergraph correspond to the extended communities in Ce, while the distance function is 3A t-neighbothood of a vertex v is the set of vertices N tv , where u . . .. if Ki . H . Kj . H or N t. tv only if the length of the shortest path from u to v is less or equal then 1, d(Ki,Kj)= Kj . H . Ki . H, 4Our experiances on various benchmark networks indicate that this .. value gives the best performance. otherwise . The community structure of Zachary’s karate club net­work [32] can be seen on Figure 1. This network is a well-known social network, that represents friendships between the members of the club. Our method identi.es .ve com­munities5, the most interesting ones being the green and blue ones, as well as the one represented by the red triangle. During Zachary’s observation the club split into two parts, because some friendships were broken. Most community detection algorithms are able to identify these subgroups even before the actual split. In our case the borders of the green and blue communities represent the borders between the two subgroups, and the red group identi.es the edge that was broken when the split occurred. 3.1 Hub selection strategies Hub selection is a crucial part of the algorithm. As we have mentioned before, each node in the graph is assigned a hub value based on the number of cliques it belongs to. Based on this value the selection strategy chooses the set of hubs H. Hubs represent ”locally important” nodes so the criterion of the hub property of nodes or “hubness" should depend only on the tight neighborhood of the node. In our interpretation, this criterion depends on some simple sta­tistical property of the .rst or second neighborhood of the given node, namely the average or median of neighboring hub values. At the beginning of our work on several famous bench­mark networks [26, 32] we have quickly found out, that the 2-neighborhood strategies are often not robust enough to select the appropriate hubs: hubs were relatively rare, which resulted in small overlaps and a larger than accept­able number of nodes without community memberships. A general experience was, that hubs should be ”common enough”, so that most of the nodes have one or more in their direct neighborhood. Considering this, the 1-neighborhood median selection strategy provided the best community structure on these benchmark networks. This may not be the case, however, with other real-life networks. In order to extend our algo­rithm to handle different kinds of networks, we can gener­alize suggest another hub selection rule. Still considering only the direct neighborhood of nodes, we calculate the av­erage hub value and multiply it with a parameter q> 0. If the hub value of the node is higher than the mean, we select it as a hub. This approach makes hub selection more .ex­ible, allowing the algorithm to adapt to different require­ments. A small value of q selects higher number of hubs re­sulting in larger communities with greater overlaps, while increasing q has the opposite effect. This also allows the al­gorithm to discover several layers of community structure on the network. Finally, hub selection can be extended to weighted graphs in a natural way. As before, the hub value of a node is the number of cliques it belongs to. Then the values are 5The median hub selection strategy was used with k =2, see subsec­tion 3.1. A. Bóta et al. multiplied with the strength6 of the node. After this, the process is the same as in the previous strategies. In summary we propose the following hub selection strategies: – 1-neighborhood median: A node is selected as a hub if its hub value is greater than the median of hub values in its one-neighborhood. – 1-neighborhood mean with multiplier: A node is se­lected as a hub if its hub value is greater than the mean of hub values in its one-neighborhood multiplied with a parameter q> 0. – 1-neighborhood weighted mean with multiplier: The hub values are multiplied with the strength of the nodes. Beside this, the strategy is the same as above. As a recommendation, the median strategy should be tried .rst, and if it does not give satisfactory results the average strategy should be used with q =1 initially, de­creasing or increasing its value in small steps depending on the requirements. In practice 0