Special Issue: 
Bioinspired Optimization 
Guest Editors: Jurij Šilc Aleš Zamuda 

Editorial Boards 
Informatica is a journal primarily covering intelligent systems in the European computer science, informatics and cognitive com­munity; scienti.c and educational as well as technical, commer­cial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international refereeing. It publishes scienti.c papers ac­cepted by at least two referees outside the author’s country. In ad­dition, it contains information about conferences, opinions, criti­cal examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and infor­mation industry are presented through commercial publications as well as through independent evaluations. 
Editing and refereeing are distributed. Each editor from the Editorial Board can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Edi­torial Board. Referees should not be from the author’s country. If new referees are appointed, their names will appear in the list of referees. Each paper bears the name of the editor who appointed the referees. Each editor can propose new members for the Edi­torial Board or referees. Editors and referees inactive for a longer period can be automatically replaced. Changes in the Editorial Board are con.rmed by the Executive Editors. 
The coordination necessary is made through the Executive Edi­tors who examine the reviews, sort the accepted articles and main­tain appropriate international distribution. The Executive Board is appointed by the Society Informatika. Informatica is partially supported by the Slovenian Ministry of Higher Education, Sci­ence and Technology. 
Each author is guaranteed to receive the reviews of his article. When accepted, publication in Informatica is guaranteed in less than one year after the Executive Editors receive the corrected version of the article. 
Executive Editor – Editor in Chief 
Anton P. Železnikar Volariˇ
ceva 8, Ljubljana, Slovenia s51em@lea.hamradio.si http://lea.hamradio.si/~s51em/ 
Executive Associate Editor -Managing Editor 
Matjaž Gams, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +3861251 93 85 
matjaz.gams@ijs.si http://dis.ijs.si/mezi/matjaz.html 
Executive Associate Editor -Deputy Managing Editor 
Mitja Luštrek, Jožef Stefan Institute 
mitja.lustrek@ijs.si 
Executive Associate Editor -Technical Editor 
Drago Torkar, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +3861251 93 85 
drago.torkar@ijs.si 
Contact Associate Editors 
Europe, Africa: Matjaz Gams 
N. and S. America: Shahram Rahimi Asia, Australia: Ling Feng Overview papers: Maria Ganzha 
Editorial Board 
Juan Carlos Augusto (Argentina) 
Vladimir Batagelj (Slovenia) 
Francesco Bergadano (Italy) 
Marco Botta (Italy) 
Pavel Brazdil (Portugal) 
Andrej Brodnik (Slovenia) 
Ivan Bruha (Canada) 
Wray Buntine (Finland) 
Zhihua Cui (China) 
Hubert L. Dreyfus (USA) 
Jozo Dujmovi´

c (USA) Johann Eder (Austria) Ling Feng (China) Vladimir A. Fomichov (Russia) Maria Ganzha (Poland) Sumit Goyal (India) Marjan Gušev (Macedonia) 
N. Jaisankar (India) 
Dariusz Jacek Jakóbczak (Poland) 
Dimitris Kanellopoulos (Greece) 
Samee Ullah Khan (USA) 
Hiroaki Kitano (Japan) 
Igor Kononenko (Slovenia) 
Miroslav Kubat (USA) 
Ante Lauc (Croatia) 
Jadran Lenarciˇˇ

c (Slovenia) Shiguo Lian (China) Suzana Loskovska (Macedonia) Ramon L. de Mantaras (Spain) Natividad Martínez Madrid (Germany) Sando Martinciˇ´c (Croatia) 
c-Ipiši´Angelo Montanari (Italy) Pavol Návrat (Slovakia) Jerzy R. Nawrocki (Poland) Nadia Nedjah (Brasil) Franc Novak (Slovenia) Marcin Paprzycki (USA/Poland) Wiesław Pawłowski (Poland) Ivana Podnar Žarko (Croatia) Karl H. Pribram (USA) Luc De Raedt (Belgium) Shahram Rahimi (USA) Dejan Rakovi´
c (Serbia) Jean Ramaekers (Belgium) Wilhelm Rossak (Germany) Ivan Rozman (Slovenia) Sugata Sanyal (India) Walter Schempp (Germany) Johannes Schwinn (Germany) Zhongzhi Shi (China) Oliviero Stock (Italy) Robert Trappl (Austria) Terry Winograd (USA) Stefan Wrobel (Germany) Konrad Wrona (France) Xindong Wu (USA) Yudong Zhang (China) Rushan Ziatdinov (Russia & Turkey) 

Editors' Introduction to the Special Issue on .Bioinspired 

Optimization” 
The possibly changing and uncertain environment attracts and retains the fittest members of biological populations, which accumulate experience and improve, from adapting and competing among themselves. Their material of experience is exchanged and propagated from iteration to iteration according to the laws of nature. Relying on elementary activities of individuals, societies of these biological populations exhibit complex emergent behaviors. 
Assemblies of genes, insects, bird flocks, and many other fascinating natural phenomena have been a rich source of inspiration in computer algorithms design for decades. Specifically, optimization is an area where these techniques are studied and exercised with particular practical success. 
As a result bioinspired optimization algorithms (evolutionary algorithms, genetic algorithms, evolution strategies, evolutionary programming, genetic programming, ant colony optimization, particle swarm optimization, artificial immune systems, etc.) were designed to overcome the drawbacks of traditional algorithms in demanding application scenarios including those where little, if any, information is available to assist problem solving. The emerging challenges inspire new methods to be delivered and existing ones being introduced for specific tasks. 
This special issue of Informatica – an International Journal of Computing and Informatics includes selected extended versions of student papers presented during: 
– 	
the Fifth International Conference on Bioinspired Optimization Methods and their Applications (BIOMA 2012) held in Bohinj, Slovenia, on 24–25 May 2012 and 

– 	
the Student Workshop on Bioinspired Optimization Methods and their Applications (BIOMA 2014), held in Ljubljana, Slovenia, on 13 September 2014. 


After the selection and approval of the reviewing committee, this special issue presents seven valuable contributions. They were contributed by 21 co-authors coming from five countries (Germany, Romania, Slovenia, Turkey, and United Kingdom) 
The first paper is entitled Differential Evolution Control Parameters Study for Self-Adaptive Triangular Brushstrokes and contributed by Aleš Zamuda and Uroš Mlakar. This work describes a lossy image representation where a reference image is approximated by an evolved image, constituted of variable number of triangular brushstrokes. Experimental results show the viability of the proposed encoding and optimization results with statistical tests that confirm the improved performance with the self-adaptation of the control parameters over the fixed control parameters. 
The second paper, Parallel Implementation of Desirability Function-Based Scalarization Approach for Multiobjective Optimization Problems, contributed by 
Okkes Tolga Altinoz, Eren Akca, Asim Egemen Yilmaz, Anton Duca, and Gabriela Ciupriana, presents the results obtained for the parallel CUDA implementation of the previously proposed desirability-based scalarization approach for the solution of the multi-objective optimization problems. The CUDA implemented approach allows for roughly 20 times speedup compared to sequential implementation, provided a suitable number of solutions to be evaluated is given. 
The third paper is Using a Genetic Algorithm to Produce Slogans, by Polona Tomašič, Gregor Papa, and Martin Žnidaršič. This paper describes a new solution based on the use of linguistic resources and evolutionary computing for invention of slogans. The approach utilizes a tool to check grammatical mistakes in trial solutions. A real case data is also studied, where slogans for Sentinel company are evolved. 
The fourth paper, entitled Comparing Evolutionary Operators and Search Spaces in the Construction of Facial Composites, by Joseph James Mist, Stuart James Gibson, and Christopher John Solomon, addresses three experiments concerning the use of interactive evolutionary algorithms in the creation of facial composites. The approach was validated by roughly 20 participants using and assessing and it, thereby generating face-spaces, using different search algorithms, and assessing the comparison of different algorithms. 
The fifth paper in this special issue is Heuristics for Optimization of LED Spatial Light Distribution Model, by David Kaljun, Darja Rupnik Poklukar, and Janez Žerovnik. This work presents a genetic algorithm and several versions of local search heuristics for optimization of a model of LED and secondary lens combination with symmetric spatial light distribution. They give a parameter and mechanisms combination study on the lighting task challenged. The yielding hybrid approach outperformed the standard genetic algorithm, and also outperformed a local search when inspected closely. 
The sixth paper is entitled Implicit and Explicit Averaging Strategies for Simulation-Based Optimization of a Real-World Production Planning Problem and contributed by Juan Esteban Diaz and Julia Handl. This paper explores the impact of noise handling strategies on optimization performance in the context of a real-world production planning problem. Since the stochastic nature of the fitness values may impact on optimization performance, authors proposed explicit and implicit averaging strategies to address this issue. They show that under increased levels of fitness variability, a hybrid strategy starts to outperform pure implicit and explicit averaging strategies for evaluation of a real-world production planning problem. 
Finally, the seventh paper Data Mining-Assisted Parameter Tuning of a Search Algoritm contributed, by Jurij Šilc, Katerina Taškova, and Peter Korošec, deals 
with the problem of tuning the performance of a meta­heuristic search algorithm with respect to its parameters. The principle challenge here is how to provide meaningful settings for an algorithm, obtained as result of better insight in its behavior. They apply their approach in learning a model for the DASA algorithm and give some conclusions on the suggested parameters tuning based on the knowledge obtained, such as number of ants and the evaporation factor. 
We would like to thank the authors of the papers for their individual contributions and all anonymous dedicated reviewers for their criticism and time to help us making final decisions. Without their valuable and strong support, we could not have made this special issue successful. 
J. Šilc et al. 
As Guest Editors, we hope the readers will find the Special Issue interesting and informative, as well as that the papers will stimulate further progress in the field of .Bioinspired Optimization”. 
Jurij Šilc Aleš Zamuda 
Guest Editors 

Differential Evolution Control Parameters Study for Self-Adaptive Triangular Brushstrokes 
Aleš Zamuda and Uroš Mlakar Faculty of Electrical Engineering and Computer Science, University of Maribor Smetanova ulica 17, SI-2000 Maribor, Slovenia E-mail: ales.zamuda@um.si, uros.mlakar@um.si 
Keywords: differential evolution, evolutionary computer vision, evolutionary art, image-based modeling, self-adaptation, triangular brushstrokes 
Received: December 1, 2014 
This paper proposes a lossy image representation where a reference image is approximated by an evolved image, constituted of variable number of triangular brushstrokes. The parameters of each triangle brush are evolved using differential evolution, which self-adapts the triangles to the reference image, and also self-adapts some of the control parameters of the optimization algorithm, including the number of trian­gles. Experimental results show the viability of the proposed encoding and optimization results on a few sample reference images. The results of the self-adapting control parameters for crossover and mutation in differential evolution are also compared to results with keeping these parameters constant, like in a basic differential evolution algorithm. Statistical tests are furthermore included to con.rm the improved perfor­mance with the self-adaptation of the control parameters over the .xed control parameters. 
Povzetek: V ˇcna slika aproksimirana z 
clanku je predlagana izgubna predstavitev slike, kjer je referenˇ
evoluirano sliko, ki je sestavljena iz spremenljivega števila potez trikotniškega ˇcopiˇca. Parametre vsake 
poteze ˇca optimiramo s pomoˇ

copiˇcjo diferencialne evolucije, ki samoprilagaja trikotniške poteze na ref­erenˇ
cno sliko in prav tako samoprilagaja nekatere krmilne parametre samega optimizacijskega algoritma, vkljuˇ
cno s številom trikotnikov. Rezultati poizkusov kažejo primernost predlagane metode in rezultati op­timizacije so prikazani za veˇcnih slik. Rezultati samoprilagodljivih krmilnih parametrov 
c izbranih referenˇ

za diferecialno evolucijo so primerjani tudi z rezultati, kjer so ti parametri nespremenljivi, kot je to primer 
pri osnovnem algoritmu diferencialne evolucije. Dodatno so podani še statistiˇcni testi, ki nadalje potrju­
jejo izboljšanje kakovosti pristopa ob samoprilagajanju krmilnih parametrov v primerjavi s pristopom z 
nespremenljivimi krmilnimi parametri. 

1 Introduction 
In this paper, evolvable lossy image representation utiliz­ing an image compared to its evolved generated counterpart image, is proposed. The image is represented using a vari­able number of triangular brushstrokes [7], each consist­ing of triangle vertices coordinates and color parameters. These parameters for each triangle brush are evolved using differential evolution [13, 4], which self-adapts the control parameters, including the proposed self-adaptation for the number of triangles to be used. Experimental results show the viability of the proposed encoding and evolution con­vergence for lossy compression of sample images. Since this paper is an extended version of [8], new additional re­sults are included, where the experiments results with .xed control parameters for differential evolution are included to check and demonstrate the self-adaptation mechanism in.uence on results. The results show clear superiority of the proposed approach with the self-adaptive control pa­rameters over the approach where its control parameters are .xed. 
The approach presented is built upon and compared with [7], by addressing and also extending the original challenge. Namely, the challenge introduced in [7] uses triangles in trying to build an approximate model of an im­age [7]. The triangle is an ef.cient brush shape for this challenge, since it covers more pixels than a single point, and also allows overlaying and blending of colors over sev­eral regional surface pixels, which lines can not. Also, an arbitrary triangle shape is less constrained than any further point-approximated shape, and also other shapes can be built by combining several triangles. Instead of genetic pro­gramming in [7], in this paper differential evolution is used with a .xed size tree-like chromosome vector, which is cut­off self-adaptively to form codon and anti-codon parts of the chromosome. Also, our approach uses a modi.ed chal­lenge, where we can reconstruct the model for the reference image solely using the evolved model without using the ref­erence image, whereas the [7] needs the reference image when drawing pixels to the canvas in deciding which pix­els match the reference image for accepting them into the evolved canvas. Also, in this paper the triangle brushstroke encoding differs and is proposed especially designed for an ef.cient DE encoding. 

In the following section, related work is presented, then the proposed approach is de.ned. In Section 4, the experi­mental results are reported. Section 5 concludes the paper with propositions for future work. 
2 Related Work 
In this section, related work on evolutionary computer vi­sion, evolutionary art, image representation, and evolution­ary optimization using differential evolution, are presented. These topics are used in the proposed method, de.ned in the next section. 
2.1 	Image-Based Modeling, Evolutionary Computer Vision, and Evolutionary Art 
Image-based approaches to modeling include processing of images, e.g., two-dimensional, from which after segmenta­tion certain features are extracted and used to represent a geometrical model [10]. For art drawings modeling, au­tomatic evolutionary rendering has been applied [2, 12]. Heijer and Eiben evolved pop art two-dimensional scal­able vector graphics (SVG) images [6] and de.ned genetic operators on SVG to evolve representational images using SVG, and also to evolve new images, different from source images, leading to new and surprising images for pop-art. Bergen and Ross [3] interactively evolved vector graph­
ics images using genetic algorithm, where solid-coloured opaque or translucent geometric objects or mosaic tile ef­fects with bitmap textures were utilized; they considered 
the art aspect of the evolved image and multiple possible 
A. Zamuda et al. 
images [16, 15]. DE mechanisms were also compared to other algorithms in several studies [17]. Neri and Tirronen in their survey on DE [9] concluded that, compared to the other algorithms, a DE extension called jDE [4], is supe­rior to the compared algorithms in terms of robustness and versatility over a diverse benchmark set used in the survey. Therefore, we choose to apply jDE in this approach. 
The original DE has a main evolutionary loop where a population of vectors is computed within each genera­tion. For one generation, counted as g, each vector xi, .i .{1,..., NP} in the current population of size NP, undergoes DE evolutionary operators, namely the muta­tion, crossover, and selection. Using these operators, a trial vector (offspring) is produced and the vector with the best .tness value is selected for the next generation. For each corresponding population vector, mutation creates a mutant vector vi,g+1 (‘rand/1’ [13]): 
vi,g+1 = xr1,g + F (xr2,g - xr3,g), (1) 
where the indexes r1, r2, and r3 are random and mutu­ally different integers generated in from set {1,..., NP}, which are also different from i. F is an ampli.cation fac­tor of the difference vector, mostly within the interval [0, 1]. The term xr2,g -xr3,g denotes a difference vector, which is named the ampli.ed difference vector after multiplication with F . The mutant vector vi,g+1 is then used for recom­bination, where with the target vector xi,g a trial vector ui,j,g+1 is created, e.g., using binary crossover: 
ui,j,g+1 = 
. .. .. 
vi,j,g+1, if rand(0, 1) . CR 
or j = jrand, (2) 
xi,j,g otherwise,

outcomes due to evolution stochastics and concluded to in­vestigate vector animation of the vectorized image. 
In [14] animated artwork is evolved using an evolu­

tionary algorithm. Then, Izadi et al. [7] evolved trian­gular brushstrokes challenge using genetic programming for two-dimensional images, using unguided and guided searches on a three or four branch genetic program, where roughly 5% similarity with reference images was obtained on average per pixel. In this paper, we build upon and com­pare our new approach with [7], by addressing and also ex­tending this challenge. After extending the challenge, we optimize it using DE, which is described in the next sec­tion. 
2.2 	Evolutionary Optimization Using 
where CR denotes the crossover rate, .j .{1,...,D}is a j-th search parameter of D-dimensional search space, 
rand(0, 1) . [0, 1] is a uniformly distributed random num­ber, and jrand is a uniform randomly chosen index of the search parameter, which is always exchanged to prevent cloning of target vectors. The original DE [13] keeps the control parameters .xed, such as F =0.5 and CR =0.9 throughout optimization. 
However, the jDE algorithm, which is a modi.cation of the original DE, self-adapts the F and CR control parame­ters to generate the vectors vi,g+1 and ui,g+1, correspond­ing values Fi and CRi, .i .{1,..., NP} are updated prior to their use in the mutation and crossover mecha­
nisms: 
Differential Evolution 


Differential evolution (DE) [13] is a .oating-point encod-
Fl + rand1 × Fu if rand2 <.1,
Fi,g+1 = 	(3)
Fi,g otherwise, 

ing evolutionary algorithm for continuous global optimiza­tion. It has been modi.ed and extended several times with various versions being proposed [5]. DE has also been ap­
rand3 if rand4 <.2,
CRi,g+1 = 	(4) 

plied to remote sensing image subpixel mapping [18], im­age thresholding [11], and for image-based modeling using 
evolutionary computer vision to reconstruct a spatial pro-
CRi,g otherwise, 
where {rand1, . . . , rand4}. [0, 1] are uniform random .oating-point numbers and .1 = .2 =0.1. Finally, the se­

cedural tree model from a limited set of two dimensional lection operator evaluates and compares the trial to current 
vector and propagates the .ttest: 
 
ui,g+1 if f(ui,g+1) <f(xi,g), 
xi,g+1 =(5) 
xi,g otherwise. 

3 Differential Evolution for 
Self-Adaptive Triangular 
Brushstrokes 

In this section, the encoding aspect, genotype-phenotpye rendering, and evaluation mechanisms of the proposed ap­proach are de.ned. 
3.1 Encoding Aspect 
We encode an individual compressed image into a DE vector as follows. A DE vector x =(x1,x2,...,x8T max ,Fi, CRi,T L,T U) is com­
ii 

posed of .oating-point scalar values packed sequentially as {xj : .j .{1,...,D +4}}, starting with a triangles-coding part of length D =8T max, and the rest are the self-adaptive control parameters of the vector to be used during the DE. The self-adaptive control parameters part of the x vector encodes and uses the scaling factor F and crossover rate C R as in the jDE [4]; then the Ti L , T U 
i 

.{1,...,T max} control parameters follow. The self-adaptive T L and T U control parameters deter-
ii 

mine index-wise triangles encoded in the vector x to be used for rendering the evolved image, i.e., the portion of x to render an image is {xj : .j .{Ti L,...,T U}}.
i 

In this paper, we propose to have the whole vector rep­resent a triangle set, organized similar to serializing a tree as a linear vector in visiting nodes by depth-.rst search. However, the leaf nodes are mostly exposed to being cut­off, whereas the root node is encoded in the middle of the vector and the near-root nodes are therefore more protected in being retained, since they are more anchored due to cut­offs mostly around the codon edges. After being included into a new trial vector, all nodes have an equal probability of having their triangle data changed. 
In this way, the T L and T U allow us to render only a 
ii 

sub-portion of the triangles set, similarly to taking an in­separable portion of a GP tree traversal as in [7]. This gives us an arbitrary length render set, and keeps the crossover of anti-codon to help us .nd the number of triangles Ti . {1,...,T max}, which is more suitable for image approxi­mation: 
 
T U - T L +1 if T L <T U 
ii ii
Ti =(6)
(T max - T L 
i )+ T U otherwise.
i 

The T L and T U are updated similarly to the Fi control pa-
ii 

rameter: 
 
1 × T maxJ
lrandL if randL 2 <. L ,
T L =(7)
i,g+1 
T L otherwise,
i,g 
Informatica 39 (2015) 105–113 107 

Figure 1: The triangle brush de.nition and the circum­scribed circle. 
 
1 × T maxJ
lrandU if randU <. U ,
2
T U 
=
i,g+1 (8)
T U otherwise,
i,g 
where .L = .U = .1 =0.1 of the jDE. 
3.2 Genotype-Phenotype Rendering 
A DE vector xi, .i .{1,..., NP} encoded using .oating-point numbers xi,j, .j .{1,...,D +4} constituting a genotype is rendered into a phenotype image zi = {zi,x,y}of Rx width and Ry height in pixels, to be compared against a reference image z * as follows. 
The triangle brushstrokes (Figure 1) are represented as (cx,cy, r, .1,.2,bY,bCb,bCr), where cx . [0,...,Rx),
. 
cy . [0,...,Ry), and r . [0,Rx/Tmax] de.ne the cir­cumscribed circle center and radius for the triangle to be rendered; .1 . [1. , 360.) and .2 . [1. , 180.) de.ne the vertices of this triangle on its circumscribed circle; and bY . [16, 236), bCb . [16, 241), and bCr . [16, 241) are the color components of the brush for the triangle contained pixels. 
The triangles’ vertices coordinates encoded by i-th DE vector construct Ti triangles, each triangle Tk = (cx,k,cy,k,rk,.1,k,.2,k), .k .{1,...,Ti} (Tk being packed as xi = {xi,j}, j =8k + m, m .{1,..., 8}), de.ning the vertices of a triangle P1,k, P2,k, and P3,k: 
P1,k = l (cx,k + rk cos .1,k, 
(9) 
cy,k + rk sin .1,k) J , 
P2,k = l (cx,k + rk cos(.1,k + .), 
(10) 
cy,k + rk sin(.1,k + .)) J , 
P3,k = l (cx,k + rk cos .2,k, 
(11) 
cy,k + rk sin .2,k) J . 
The brush color bYCbCr =(bY,bCb,bCr) is .rst trans-
k kkk 
formed into RGB color model as bRGB =(bR,bG,bB)
k kkk 
(bR,bG,bB . [0, 255]), where: 
kkk 
  
bR =1.164(bk Y - 16) + 1.596(bCr - 128)(12)
kk 

bG = l1.164(bY k - 16) - 0.813(bCr - 128)
k	k 
(13)
- 0.391(bCb - 128) J
k 
bB =1.164(bk Y - 16) + 2.018(bCb - 128) (14)

kk For each triangle Tk, a solid color is rendered without antialiasing over the triangle brush area rasterizing [1] with a transparency factor of 1/Ti: 

 
 
255 bRGB 
=. (15)
bkk
Ti 

transparent layer within the evolved image Zi = k zk,x,y and computes R, G, and B color layers for the pixels of the i-th individual: 
 
zk,x,y =
Tk over (x,y) 
image.
bk,x,y 
 
 

(16) 4 Experiments
 
255 bRGB 
k,x,y
=
,Ti 
Tk over (x,y)
The following experiments assess the viability of the ap­

where Tk over (x, y) denotes each triangle being rendered over the pixel (x, y) such that bk,x,y contains the rendered pixels of a brushstroke. Triangles de.ned possibly over the edges of image canvas are drawn by clipping away pixels outside of the canvas area. 
The initialization of a genotype is such that the , bCr

cx,cy,.1,.2,bY,bCb , T L, and T U are initialized uni­
ii 

form randomly to integer values within their respective def­inition intervals, while r is kept as a .oating-point. All pa­rameters are however evolved as .oating-point scalar val­ues in DE. 
3.3 Evaluation 
Evaluation of the phenotype image Zi to be compared against a reference image Z* is as follows. A reference image Z* is represented as RGB-encoded colored pixels 
RGB

integer values in layers Z* = {(z,z,z)}.
x,yx,yx,y

To obtain a difference assessment value, the following comparison metric is used for comparing an evolved image Z = Zi to Z* : 
proach on different control parameters, each with several independent runs. The parameter sets are as follows: the DE population size NP = {25, 50, 100} and Tmax = {10, 20,..., 150}, thereby for each run RNi={0, 1, ..., 51} this counts for total of 45 parameter sets, i.e., 2340 independent runs. The NP and Tmax are .xed during one run. The maximum number of function evaluations (MAXFES) used is same as with [7], MAXFES is 105. For image rendering, basic GDI+ is used. 
4.1 Obtained Results 
The obtained .tness values at the MAXFES termination of 105, over different parameters of Tmax and NP, are seen in Tables 1 and 2. The best values obtained overall for an image are marked in bold underlined text font. The .tness convergence graphs for these best runs are seen in Figure 2, where after the initialization, the .tness is roughly below 40 (i.e., 40% similarity with reference), then drops below 15 for all test images and even further to slightly above 6 for two of them. 
The convergent obtained results depend on the 
R y-1 R x-1 
MAXFES used being same as with [7], but also NP and 
*RR

| zx,y - zx,y | Tmax, as reported below. From Tables 1 and 2, we choose to 
y=0 x=0 
report further evolved images up to MAXFES of 106 with
+ 

. ...... 
f(Z) = 100× 

255 × RxRy 	all images. The best approximated images after MAXFES of 106 are shown in the Figure 3 which shows the evo­lution of the four images. In each line of Figure 3, the 
R y-1 R x-1 
best .tting vectors upto MAXFES of 106 in generations 
*GG

| zx,y - zx,y | (17) g = {0, 100, 200, 400, 700, 1200, 2000}, and the .nal gen­
y=0 x=0 
eration, are shown, then the rightmost the corresponding 
+ 
255 × RxRy 
reference image. Figure 4 shows for each test image, dy­
namics of the number of triangle brushes in current best 
vector during generations, displaying varying convergent 
best Ti values across images. 
. 
Our approach searches for a representative image model 
255 × RxRy 
and the values obtained such as 6.77, can roughly be com­pared to the 4.83 of [7]. Such representation of the problem 

. ...... 
R y-1 R x-1 
*BB
| z - z |
x,y x,y 
y=0 x=0 


Figure 3: The evolved and the reference images (self-adaptive F and CR). 
also makes our NP parameter have higher value, since we have no guided search and the problem is therefore more general. Also, our approach does not use a dynamically re-allocatable morphable variable-size tree structure as in ge­netic programming encoding, inspite it rather uses a .xed size vector and limits its brushstrokes set by two simple bounds, making the approach faster for execution. 
For comparison purposes and since this paper is an ex­tended version of [8], following additional comparison is included. The algorithm is run again with .xed control pa­rameters F =0.5 and CR =0.9 in DE, all other settings are kept same as with the proposed above approach. 
Further, the results in Tables 1 and 2 are statistically tested using t-test with alpha =0.001, against the null hypothesis, that the results obtained with .xed control pa­rameters F =0.5 and CR =0.9 in DE, do not statis­tically differ. The symbol † with the values in bold text font signi.es that the self-adaptive F and CR parameters approach results are signi.cantly better and the symbol ‡ with values in italicized text font signi.es that the .xed parameters approach results are signi.cantly better. Com­paring the statistics on the varied NP and Tmax settings, DE with changing F and CR is 164 times better, 13 times worse, and 3 times with no signi.cant performance differ­ence, compared to the DE with F =0.5, CR =0.9. 
The Figure 5, the best DE run with F =0.5, CR =0.9, nonetheless still shows self-adaptation of the Ti parame­ter – this is an additional indicator that the performance difference lines in the changing of the F and CR control parameters, which, compared to .xed values, improve the approach performance if they are self-adaptive. 
Visually, the performance difference is observed from the rendered images in Figure 6, showing superiority of the proposed approach with self-adaptive control parame­ters over the approach using .xed control parameters. The Figure 7 shows .tness convergence of the best evaluated vector of the best DE run with F =0.5, CR =0.9, this Figure 4: Number of brushstrokes in best vector, for best runs of each test image, self-adaptive F and CR parame­ters. 
Figure 5: Number of brushstrokes in best vector, for best runs of each test image, F =0.5, CR =0.9. 

Table 1: Obtained .tness over Tmax and NP : test instances Liberty and Palace 
Liberty Palace 
NP Tmax  Best  Worst  Average  STD  Best  Worst  Average  STD  
25 10  8.29  11.99  9.93096†  0.8233  8.69  13.69  10.1362†  0.9655  
25 20  8.03  13.14  10.0935†  1.0845  7.83  11.5  9.12173†  0.8092  
25 30  8.41  13.74  10.0525†  1.1712  7.52  11.1  8.97942†  0.7992  
25 40  8.13  12.81  10.4408†  1.1416  7.34  11.36  8.91788†  0.8922  
25 50  8.49  13.37  10.6767†  1.1768  7.65  12.53  8.87442†  0.9788  
25 60  7.95  14.65  10.9858†  1.4284  7.9  11.88  8.99673†  0.8761  
25 70  8.28  14.21  11.4075†  1.3630  7.79  13.17  9.50327†  1.0482  
25 80  8.72  15.89  11.7554†  1.6330  7.97  12.34  9.43558†  0.9765  
25 90  8.84  16.24  12.1342†  1.6608  8.41  13.54  9.82†  1.2756  
25 100  9.01  16.74  12.4798†  1.7521  8.62  12.96  9.83635†  0.8869  
25 110  8.07  16.78  12.7412†  1.7849  9.01  14.42  10.4119†  1.2468  
25 120  9.67  16.14  12.8467†  1.7359  8.93  15.13  10.3858†  1.3149  
25 130  10.16  17.96  13.2692†  1.7193  9.02  14.2  10.2858†  1.0292  
25 140  9.29  17.99  13.7029†  1.7886  8.29  13.51  10.7779†  1.0299  
25 150  10.82  18.56  14.0373†  1.6573  9.89  14.91  11.1206†  1.0586  
50 10  7.51  9.69  8.45077†  0.4198  7.43  11.84  8.68058†  0.8825  
50 20  6.78  8.99  7.80173†  0.4987  7.1  11.39  8.79173†  0.9592  
50 30  6.89  9.17  7.81788†  0.5119  7.53  12.58  9.75654†  1.1186  
50 40  6.77  9.87  8.0375†  0.6578  8.27  12.24  10.0575†  0.9537  
50 50  7.08  10.61  8.39923†  0.7056  7.97  13.14  10.3338†  1.1009  
50 60  7.15  10.4  8.67115†  0.7472  8.59  12.49  10.7817†  1.0754  
50 70  7.46  10.9  9.1025†  0.8666  7.58  12.8  10.7744†  1.1086  
50 80  7.6  11.4  9.47981†  0.8689  9.15  13.11  11.3802†  1.0178  
50 90  8.05  12.65  9.67346†  0.9115  9.97  13.41  11.5227†  0.9315  
50 100  8.75  11.75  10.0152†  0.7824  8.55  13.62  11.4356†  0.9923  
50 110  8.93  13.63  10.6356†  0.9682  9.32  13.77  12.0712†  0.9579  
50 120  9.22  13.01  10.7502†  0.9840  9.77  14.21  12.429†  0.8972  
50 130  9.42  12.59  11.0527†  0.7707  11.37  14.07  12.7387†  0.6134  
50 140  9.99  13.39  11.5719†  0.7815  9.69  15.5  12.9317†  0.9708  
50 150  10.2  14.56  12.2633†  1.0702  9.58  15.36  12.8092†  1.1717  
100 10  7.1  9.12  7.98596†  0.4241  7.91  13.88  10.9573†  1.8019  
100 20  6.85  9.77  7.83962†  0.5360  8.86  14.59  12.1117†  1.2862  
100 30  7.15  11.8  8.49077†  1.1563  9.59  16.15  12.9098†  1.0589  
100 40  7.22  13  8.86327†  1.1092  9.65  14.97  13.2477†  1.1543  
100 50  7.41  12.75  9.34846†  1.3939  11.01  15.52  13.8606†  0.9750  
100 60  8.06  12.97  9.77731†  1.1539  11.5  16.14  14.1856†  1.1234  
100 70  8.67  13.28  10.1954†  1.3722  10.77  16.32  14.3629†  1.1713  
100 80  8.73  14.48  11.0929†  1.4093  10.98  17.06  14.9348†  1.1679  
100 90  9.04  14.92  11.3594†  1.3483  11.1  16.8  15.104†  1.2586  
100 100  9.4  16.13  11.6604†  1.4952  10.8  17.62  15.36  1.2330  
100 110  10.17  15.68  12.3365†  1.5685  13.01  17.86  16.0202‡  0.9744  
100 120  10.26  15.45  12.3358†  1.5076  11.07  17.99  15.6113‡  1.6455  
100 130  10.22  16.19  13.2212†  1.6108  12.33  18.37  16.4085‡  1.3168  
100 140  11.42  16.65  13.7808†  1.5502  11.64  18.35  16.1229‡  1.4990  
100 150  11.35  18.68  14.6113†  1.9726  10.11  18.34  16.2929‡  2.0056  

Table 2: Obtained .tness over Tmax and NP : test instances Vegetables and Baboon 
Vegetables Baboon 
NP Tmax  Best  Worst  Average  STD  Best  Worst  Average  STD  
25 10  14.13  17.21  15.7269†  0.7148  15.02  18.59  16.38‡  0.7128  
25 20  12.56  18.03  14.5658†  0.9850  13.44  17.12  15.3815†  0.8129  
25 30  12.33  15.98  13.9215†  0.8475  12.99  19.03  15.0204†  1.1150  
25 40  11.62  16.21  13.674†  1.0436  11.99  16.85  14.4342†  1.0135  
25 50  12.16  17.08  13.88†  1.0726  11.39  17.62  14.4573†  1.2299  
25 60  11.64  17.88  13.6438†  1.2155  11.74  17.51  14.8038†  1.2229  
25 70  11.29  17.15  13.9056†  1.3790  11.88  17.9  14.6267†  1.3495  
25 80  11.61  16.6  14.0871†  1.3881  12.11  17.13  14.3606†  1.2815  
25 90  11.63  17.96  14.1062†  1.4428  11.93  19.41  14.6644†  1.5269  
25 100  11.34  17  14.4533†  1.4694  11.7  18.77  14.7642†  1.7438  
25 110  11.74  19.66  14.6085†  1.7664  12.02  19.11  15.0046†  1.7605  
25 120  12.26  17.91  14.7737†  1.5726  12.2  18.5  15.6467†  1.6086  
25 130  12.1  19.75  14.6338†  1.9283  13.01  19.5  15.4254†  1.5505  
25 140  11.94  19.01  14.7635†  1.6282  12.64  19.37  15.8235†  1.8458  
25 150  12.82  18.7  14.6487†  1.3015  13.13  20.17  15.7952†  1.6923  
50 10  13.03  15  14.0723†  0.4674  13.86  16.52  14.9192‡  0.5494  
50 20  11.66  13.26  12.4644†  0.3184  11.8  14.54  13.271†  0.5569  
50 30  11.12  13.59  12.2425†  0.6528  11.59  13.62  12.5506†  0.5732  
50 40  10.94  14.1  12.1848†  0.6656  11.1  13.84  12.3137†  0.6090  
50 50  11.04  13.92  12.2946†  0.7609  11.34  14.36  12.4075†  0.6304  
50 60  11.29  15.86  12.5506†  0.9222  11.25  14.1  12.3662†  0.6161  
50 70  11.18  15.21  12.6104†  0.8682  11.54  14.57  12.5437†  0.6510  
50 80  11.32  15.26  12.8619†  0.7658  11.07  15.56  12.9473†  0.8087  
50 90  11.84  15.28  13.0077†  0.8038  11.32  16.2  12.857†  1.0291  
50 100  11.72  15.8  13.5058†  0.9565  11.85  15.72  13.2658†  0.7972  
50 110  12.02  15.92  13.5204†  0.8750  11.98  15.56  13.4275†  0.7805  
50 120  11.9  16.87  13.829†  1.1151  12.43  15.66  13.5106†  0.7265  
50 130  12.51  15.97  14.094†  0.8855  12.64  16.32  14.085†  0.8259  
50 140  12.16  17.07  14.8198†  1.2154  12.54  16.31  14.15†  0.8865  
50 150  13.11  17.98  14.9838†  1.2072  13.08  18  14.8765†  1.0178  
100 10  12.56  16.19  13.9815†  0.8083  13.49  16.19  14.5367‡  0.5672  
100 20  11.84  16.45  13.4704†  1.0483  12.02  15.87  13.8244‡  0.8747  
100 30  11.83  17.64  13.9133†  1.3335  12  15.76  13.7206‡  0.9727  
100 40  12.01  17.95  14.6354†  1.3660  11.63  17.01  13.6467‡  1.3582  
100 50  11.87  17.35  14.9156†  1.4272  11.99  17.48  14.1658‡  1.5554  
100 60  12.32  18  15.21†  1.5119  12.12  17.46  14.5021‡  1.4517  
100 70  12.13  18.05  15.6513†  1.2457  12.12  17.16  14.3881†  1.3782  
100 80  12.9  18.86  16.2008†  1.4121  12.13  17.56  14.8656†  1.4214  
100 90  12.32  20.04  16.3233†  1.7789  12.25  18.66  15.2558†  1.5144  
100 100  12.98  20.55  16.7275†  1.7119  13.09  18.42  15.5398†  1.5064  
100 110  13.76  20.18  17.2896†  1.5242  13  19.62  15.84†  1.6164  
100 120  13.12  20.62  17.626†  1.5807  13.34  19.58  16.4725†  1.5223  
100 130  13.52  20.12  17.9052  1.3516  13.84  19.6  16.9367†  1.7362  
100 140  14.08  20.52  18.216†  1.6975  14.3  21  17.4387†  1.7372  
100 150  14.97  21.19  19.1221  1.2128  14.75  21.13  17.9488†  1.6872  


Figure 6: The evolved and the reference images, F =0.5, CR =0.9. 
time with NP = 100 and therefore maximum generation number of 1000. The attained values tend to converge to­wards Tmax, but results are worse since the different Tmax, seen from Figures 4 and 5. 
5 Conclusion 
This paper presents an evolvable lossy image representa­tion, approximating an image by comparing it to its evolved generated counterpart image. The image is represented us­ing a variable number of triangular brushstrokes, each con­sisting of a triangle position and color parameters. These parameters for each triangle brush are evolved using dif­ferential evolution, which self-adapts the control parame­ters for mutation and crossover. Also, the proposed DE extension splits the DE vector in the codon and anticodon parts, where the triangles material is used only from the codon part, adjusting the genetic tree center and its bor­ders, together with the number of triangle brushstrokes to be rendered. Experimental results show the viability of the proposed encoding and evolution convergence for the lossy representation of reference images, where .tness is dis­played dependent on the population size, maximal number of function evaluations allowed, maximal number of trian­gles used in image representation, and different input ref­erence images. While analyzing the NP and T max, more­over in this paper, we have shown that the self-adaptive jDE control parameters handling mechanism is preferable to the .xed control parameters mechanism from the original DE. 
Future work can include increasing MAXFES, address­ing different encoding aspects, evolutionary operators, control-parameters update, Euclidean distance for colors comparison, and more case studies on input images with different properties. 
Figure 7: Fitness convergence, for best runs of each test image, F =0.5, CR =0.9. 
Acknowledgement 
This work is supported in part by Slovenian Research Agency, project P2-0041. 
References 
[1] B. D. Ackland, N. H. Weste (1981) The edge .ag al­gorithm – a .ll method for raster scan displays, IEEE Transactions on Computers, vol. 100, no. 1, pp. 41– 
48. 
[2] P. Barile, V. Ciesielski, M. Berry, K. Trist, (2009) An­imated drawings rendered by genetic programming, 
Proceedings of the Genetic and Evolutionary Com­putation Conference (GECCO), pp. 939–946. 
[3] S. Bergen, B. J. Ross (2012) Automatic and interac­tive evolution of vector graphics images with genetic algorithms, The Visual Computer, vol. 28, no. 1, pp. 35–45. 

[4] J. Brest, S. Greiner, B. Boškovi´
c, M. Mernik, V. Žumer (2006) Self-Adapting Control Parameters in Differential Evolution: A Comparative Study on Nu­merical Benchmark Problems, IEEE Transactions on Evolutionary Computation, vol. 10, no. 6, pp. 646– 
657. 
[5] S. Das, P. N. Suganthan (2011) Differential Evolu­tion: A Survey of the State-of-the-art, IEEE Transac­tions on Evolutionary Computation, vol. 15, no. 1, pp. 4–31. 
[6] E. den Heijer, A. E. Eiben (2012) Evolving pop art using scalable vector graphics, Evolutionary and Bi­ologically Inspired Music, Sound, Art and Design, Springer, pp. 48–59. 
[7] A. Izadi, V. Ciesielski, M. Berry (2011) Evolutionary non photo-realistic animations with triangular brush-strokes, AI 2010: Advances in Arti.cial Intelligence, Springer, pp. 283–292. 
[8] U. Mlakar, 	J. Brest, A. Zamuda (2014) Differen­tial Evolution for Self-adaptive Triangular Brush-strokes, Proceedings of the Student Workshop on Bioinspired Optimization Methods and their Applica­tions (BIOMA), pp. 105–116. 
[9] F. 	Neri, V. Tirronen (2010) Recent Advances in Differential Evolution: A Survey and Experimental Analysis, Arti.cial Intelligence Review, vol. 33, no. 1-2, pp. 61–106. 

[10] L. Quan (2010) Image-Based Modeling, 1st edition, Springer. 
[11] S. Rahnamayan, H. R. Tizhoosh (2008) Image thresh­olding using micro opposition-based Differential Evolution (Micro-ODE), Proceedings of the World Congress on Computational Intelligence (WCCI), pp. 1409–1416. 
Informatica 39 (2015) 105–113 113 
[12] J. Riley, V. Ciesielski (2010) Fitness landscape anal­ysis for evolutionary non-photorealistic rendering, Proceedings of the Congress on Evolutionary Com­putation (CEC), pp. 1–9. 
[13] R. Storn, K. Price (1997) Differential Evolution – A Simple and Ef.cient Heuristic for Global Optimiza­tion over Continuous Spaces, Journal of Global Opti­mization, vol. 11, pp. 341–359. 
[14] K. Trist, V. Ciesielski, P. Barile (2010) Can’t see the forest: Using an evolutionary algorithm to produce an animated artwork. Arts and Technology, Springer, pp. 255–262. 
[15] A. Zamuda, J. Brest (2014) Vectorized procedural models for animated trees reconstruction using differ­ential evolution, Information Sciences, vol. 278, pp. 1–21. 
[16] A. Zamuda, J. Brest, B. Boškovi´c, V. Žumer (2011) Differential Evolution for Parameterized Procedural Woody Plant Models Reconstruction, Applied Soft Computing, vol. 11, no. 8, pp. 4904–4912. 
[17] K. Zielinski, R. Laur (2007) Stopping criteria for a constrained single-objective particle swarm optimiza­tion algorithm, Informatica, vol. 31, no. 1, pp. 51–59. 
[18] Y. 	Zhong, L. Zhang (2012) Remote sensing image subpixel mapping based on adaptive differential evo­lution, IEEE Transactions on Systems, Man, and Cy­bernetics, Part B: Cybernetics, vol. 42, no. 5, pp. 1306–1329. 

Parallel Implementation of Desirability Function-Based Scalarization Approach for Multiobjective Optimization Problems 
O. Tolga Altinoz Ankara University, Electrical and Electronics Engineering, Turkey E-mail: taltinoz@ankara.edu.tr 
Eren Akca HAVELSAN A.S., Ankara, Turkey E-mail: eren.akca@havelsan.com.tr 
A. Egemen Yilmaz Ankara University, Electrical and Electronics Engineering, Turkey E-mail: aeyilmaz@eng.ankara.edu.tr 
Anton Duca and Gabriela Ciuprina Politehnica University of Bucharest, Romania E-mail: anton.duca@upb.ro, gabriela@lmn.pub.ro 
Keywords: parallel implementation, CUDA, particle swarm optimization 
Received: December 1, 2014 
Scalarization approaches are the simplest methods for solving the multiobjective problems. The idea of scalarization is based on decomposition of multiobjective problems into single objective sub-problems. Every one of these sub-problems can be solved in a parallel manner since they are independent with each other. Hence, as a scalarization approach, systematically modi.cation on the desirability levels of the objective values of multiobjective problems can be employed for solving these problems. In this study, de­sirability function-based scalarization approach is converted into parallel algorithm and applied into seven benchmark problems. The performance of parallel algorithm with respect to sequential one is evaluated based on execution time on different graphical processing units and central processing units. The results show that even the accuracy of parallel and sequential codes are same, the execution time of parallel algo­rithm is up to 24.5-times faster than the sequential algorithm (8.25-times faster on average) with respect to the complexity of the problem. 
Povzetek: Pristopi s skalarizacijo sodijo med najenostavnejše naˇckriterijskih problemov. 
cine reševanja veˇ
Zamisel skalarizacije temelji na dekompoziciji veˇ

ckriterijskih problemov v enokriterijske podprobleme, ki jih lahko rešujemo soˇckriteri­
casno, saj niso medsebojno odvisni. Torej lahko uporabimo za reševanje veˇjskih problemov sistematiˇ
cno spreminjanje nivoja zaželenosti ciljnih vrednosti teh problemov. V tej študiji 
smo implementirali vzporedni naˇcin skalarizacije na osnovi funkcije zaželenosti in ga aplicirali na sedmih 
tesnih problemih. Uˇcas
cinek vzporednega algoritma glede na zaporednega smo ovrednotili z ozirom na ˇizvajanja na razliˇcno-procesnih in centralno-procesnih enotah. Vzporedna razliˇ
cnih gra.ˇcica daje enako natanˇcju), glede na zahtevnost 
cne rezultate in je tudi do 24,5-krat hitrejša od zaporedne (8,25-krat v povpreˇ
problema. 

1 Introduction Scalarization approaches are the simplest methods for solving the multiobjective problems. The idea of scalariza­tion is based on decomposition of multiobjective problems 
The problem for determining the best possible solution set into single objective sub-problems. The solutions of these 
with respect to multiple objectives is referred to as a multi-single objective sub-problems form the Pareto approxima­
objective (MO) optimization problem. There are many ap­tion set. However, since the number of sub-problems is 
proaches for the solution of these kinds of problems. The much higher than the number of objectives in multiobjec­
most straightforward approach, the so-called “scalariza­tive problem, and each problem is desired to be solved 
tion” or “aggregation” is nothing but to combine the ob­by single objective optimization algorithm, the computa­
jectives in order to obtain a single-objective [1]. 
tion time of scalarization approaches is much higher such that it becomes unfeasible to be solved by scalarization ap­proaches. For each sub-problem, a speci.c number of func­tion evaluations must be performed by a single objective optimization algorithm. Hence, a bunch of function evalu­ations are evaluated for solving multiobjective optimization problem. Before development of powerful multi-objective optimization algorithms such as the Non-Dominated Sort­ing Genetic Algorithm (NSGA) [2], NSGA-II [3] or Vec­tor Evaluated Genetic Algorithm (VEGA) [4], scalarization techniques were preferred to solve engineering optimiza­tion problems. After the development of successful multi-objective optimization algorithms, scalarization techniques were considered to be old-fashioned, and they were aban­doned due to the necessary of much higher number of func­tion evaluations to obtain approximately same performance as multiobjective optimization algorithms. However, with the aid of parallel architectures and devices, it is possible to reconsider and revisit the scalarization techniques since these techniques are usually suitable for parallelization. 
One of the scalarization approaches for a-priori process is de.ned with the aid of a desirability function in this study. Desirability function is integrated to the particle swarm optimization algorithm in order to normalize the joint objective function values [5]. Then, geometric mean of the desirability levels of each objective is computed in order to obtain a single value. For each sub-problem, the shape of the desirability function is shrunk. Therefore the desirability level is changed and the optimization re­sults are also varied. At the end of this method, a set of possible solutions are composed. This set contains both the dominated and the non-dominated solutions. If nec­essary, the programmer might run a posterior method like non-dominated sorting for selecting the non-dominated so­lutions, as well. However, in this study, the main focus is to obtain the possible solution set. In this study, with a similar motivation, we demonstrate how one of these techniques can be parallelized and present performance of the approach by implementing on the Graphic Processing Units (GPUs) via the Compute Uni.ed Device Architec­ture (CUDA) framework. 
This paper is organized as follows: Section 2 explains the desirability function-based scalarization approach in detail and Section 3 presents a parallel implementation of the proposed method. Section 4 gives the implementation environment, benchmark problems and performance evalu­ation of the proposed method. The last section presents the conclusion and future work off the proposed method. 
2 	Desirability Function-Based Scalarization Approach 
In a general manner, the desirability functions can be ap­plied in order to incorporate the decision maker’s prefer­ences without any modi.cation of the single-objective op­timization algorithm. The decision maker chooses a desir-
O. T. Altinoz et. al. 
ability function and corresponding level. At each steps/it­erations of the algorithm, instead of objective values; de­sirability index is calculated. At the end of the algorithm only a single solution is ready for collected by the decision maker. Even this method uses the advantages of desirabil­ity functions (Desirability functions are explained in Sec­tion 2.1) decision maker has small control on .nal result since a solution is obtained on a region de.ned by the de­sirability function (Figures 3 and 4) instead of on a line like weighted sum approach. However, in this study, by de.n­ing a systematical reduction approach, our aim is not to include or incorporate the preference of the decision maker but to present a generalized multi-objective optimization method for obtaining many possible solution candidates, that proposed method is applied as a scalarization approach like weighted sum method. Therefore a systematic ap­proach was previously proposed by changing the shape of desirability functions by three of the authors of this paper [6]. For N objective problem, N numbers of desirability functions are selected with respect to the boundaries of the problem. Next, desirability functions are divided into lev­els and each level corresponding to one of the single objec­tive implementation. For example of two objective problem case which was investigated in this paper, two desirability functions are de.ned and they are divided into same level (let’s say 10) per function. Since there are two desirabil­ity functions de.ned, there are 100 single objective imple­mentations in total. The previous study [6] show that the performance of the desirability function is greatly depends on the number of the levels, in other words the number of the single objective evaluations. Also the results obtained in the previous study are showed that, it is acceptable for bi-objective problems. However, still the performance of the proposed approach is greatly depends on the number of levels, which increases the total number of computation time. Hence, in this study, the parallel cores of CPU and GPU are using as computation units for single objective optimization algorithms, and the total evaluation times are recorded for comparison. The aim of this paper is to show the applicability of the proposed method with the aid of parallel architectures of CPU and GPU. 
2.1 Desirability Function 
The desirability function idea was .rst introduced by Har­rington in 1965 for the multi-objective industry quality control. After the proposition of the desirability function concept, Deringer and Suich [7] introduced two different desirability function formulations, which become the fun­damental equations of desirability functions. These two de­sirability function de.nitions are given by (1), (2) and (3), which are called one-sided and two-sided, respectively. 
The parameters given in equations are as follows: y is the input, for our case it is the objective function value, hmin, hmax and hmed are the minimum, maximum and the median acceptable values for the domain of the two-sided desirability function. 


Figure 2: The linear desirability functions constructed for the bi-objective optimization problem. 
– 	Depending on the desire about minimization of each 
d1(y)= 
d2(y)= 

... 	.. ... .. 
objective function (i.e., the minimum / maximum tol­1, y<hmin erable values), the relevant desirability function is 
( 
y-hmax 

)r ,hmin <y<hmax (1) constructed. 
hmin-hmax 
0, y>hmax 
– 	The overall desirability value is de.ned as the geomet­ric mean of all desirability functions; this value is to be maximized. 
0, y<hmin 

y-hmin )r 	Particularly, for a bi-objective optimization problem in 
( ,hmin <y<hmax (2)
hmax-hmin 
which the functions f1 and f2 are to be minimized, the 
1, y>hmax 
relevant desirability functions d1(f1) and d2(f2) can be de.ned as in Figure 2. The desirability functions are not necessarily de.ned to be linear; certainly, non-linear de.­
d3(y)= 
. .
. .
. 

0, y<hmin 	nitions shall also be made as described in [7]. 
y-hmin 
hmed-hmin


()t ,hmin <y<hmed Throughout this study, we prefer the linear desirability
(3)
y-hmax 
functions.
( 
)s 
,hmed <y<hmax
hmed-hmax 
In [6], a method for extraction of the Pareto front was 
0, y>hmax 
The desirability level d(y)=1 is the state for fully desir­
proposed by altering the shapes of the desirability functions in a systematical manner. Particularly by: 

able, and d(y)=0 is for a not-desired case. In this respect, d1 one-sided desirability function is useful for minimiza­tion problem. The curve parameters are r, t and s. They are used in order to plot an arc instead of solid line, when desired. Curves plot in Figure 1 demonstrate the effects of the curve parameters and the graphs of the desirability functions. 
2.2 	Method of Desirability Function-Based Scalarization 
The main idea beneath the desirability functions is as fol­lows: 
– 	
The desirability function is a mapping from the do­main of real numbers to the range set [0, 1]. 

– 	
The domain of each desirability function is one of the objective functions; and it maps the values of the rel­evant objective function to the interval [0, 1]. 


– 	
Fixing the parameters f1max_tol and f2max_tol seen in Figure 2 at in.nity, and 

– 	
Varying the parameters f1min_tol and f2min_tol system­atically, 


It is possible to .nd the Pareto front regardless of its con­vexity or concavity. This claim can be illustrated for the bi­objective case as follows: as seen in Figure 3, the param­eters f1min_tol and f2min_tol determine the sector which is traced throughout the solution. The obtained solution cor­responds to a point for which the geometric mean of the two desirability values. As seen in Figure 4, even in the case of concave Pareto front, the solution can be found without loss of generality. In other words, unlike the weighted-sum approach, the method proposed in [6] does not suffer from the concave Pareto fronts. 
In [6], the applicability and the ef.ciency of the proposed scalarization approach was demonstrated via some multi-objective benchmark functions. Each single-objective problem (i.e., the scalarization scheme) was Figure 4: The solution via the desirability-function based approach for concave Pareto front. 




solved with Particle Swarm Optimization. Despite no ex­plicit demonstration or proof, it was claimed that: 
– 	
There were no limitations about the usage of Parti­cle Swarm Optimization; i.e., any other heuristic al­gorithm could be incorporated and implemented. 

– 	
The proposed method can be easily parallelizable. 


In this study, we demonstrate the validity of these claims by performing a parallel implementation on GPUs via the CUDA framework. The next section is devoted to the im­plementation details. 
3 	Parallel Multiobjective Optimization with GPU 
This section is dedicated to explaining the steps and idea of parallelizing the Desirability function-based scalarization approach with the aid of CUDA library. 
3.1 	Fundamentals of CUDA Parallel Implementation 
The researchers familiar with the programming languages used to desire a programming language or framework let­ting them write parallel codes easily. For this purpose in 2007, NVidia [8] introduced a software framework called CUDA. By means of this, a sequential function code can be converted to a parallel kernel by using the libraries and some pre.x expressions. By this way, the programmers do not need to learn a new programming language. They are able to use their previous know-how related to C/C++, and enhance this knowledge with some basic expressions intro­duced by CUDA. However, without the knowledge about the CUDA software and the parallel architecture hardware, it is not possible to write ef.cient codes. 

CUDA programming begins with the division of the ar­chitectures. It de.nes the CPU as host and GPU as de­vice. The parallel programming actually is the assignment of duties to parallel structure and collection of the results by CPU. In summary, the codes are written for CPU on C/C++ environment, and these codes include some paral­lel structures. These codes are executed by the host. Host commands device for code executed. When the code is exe­cuted by the device, the host waits until the job is .nished, then a new parallel duty can be assigned, or results from the .nished job can be collected by the host. Thus, the de­vice becomes a parallel computation unit. Hence, parallel computing relies on the data movement between host and device. Eventhough both host and device are very fast com­putation units, the data bus is slower. Therefore, in order to write an ef.cient program, the programmer must keep his/her code for minimum data transfer between the host and the device. 
The GPU has stream multiprocessors (SMs). Each SM has 8 stream processors (SPs), also known as cores, and each core has a number of threads. In tesla architecture there are 240 SPs, and on each SP has 128 threads, which is the kernel execution unit. The bodies of threads are called groups. The groups are performed collaterally with respect to the core size. If the GPU architecture has two cores, then two blocks of threads are executed simultaneously. If it has four cores, then four blocks are executed collaterally. 
Host and device communicate via data movement. The host moves data to the memory of the GPU board. This memory is called global memory which is accessed from all threads and the host. The host has also access to con­stant and texture memories. However, it cannot access the shared memory, which is a divided structure assigned for every block. The threads within the block can access their own shared memory. The communication of the shared memory is faster than the global memory. Hence, a par­allel code must contain data transfers to shared memory more often, instead of global memory. 
In this study, random numbers are needed to execute the algorithm. Hence, instead of the rand() function of the C/C++ environment, CURAND library of the CUDA pack has been employed. In addition, the CUDA Event is pre­ferred for accurate measurement of the execution time. In the next section, the parallel implementation of desirability function-based scalarization was explained in detailed. 
Informatica 39 (2015) 115–123 119 
3.2 	Parallel Implementation of Desirability Function-Based Scalarization 
The main idea of our parallel implementation throughout this study is illustrated in Figure 5. 
Each scalarization scheme is handled in a separate thread; after the relevant solutions are obtained, they are gathered in a centralized manner to constitute the Pareto front from which the human decision maker picks a solu­tion according to his/her needs. This approach ensures that the number of solutions found that can be found in parallel is limited by the capability of the GPU card used. 
As stated before, we implemented the Particle Swarm Optimization Algorithm for veri.cation of the aforemen­tioned claims. The parallel CUDA implementation was compared to the sequential implementation on various GPUs and CPUs. 

Figure 5: The parallel CUDA implementation of the desirability-function based approach. 
It was seen that both implementations (sequential and parallel CUDA) were able to .nd the same solutions but in different elapsed times. As seen in Figure 6, if the num­ber of Pareto front solutions increase, the advantage of the parallel CUDA increases dramatically. 
Figure 6 presents parallel implementation of scalariza­tion approach for the weighted sum method. The simple convex problem is selected and de.ned in (4) and (5) as a test bed for present the performance of the parallelization method for scalarization. 

2
f1(x)= x 	(4) 
f2(x)=(x - 2)2 (5) 

According to Figure 6, the performance of high and mid-level GPU cards are approximately 10-times faster than se­quential implementation. The results obtained in Figure 6 yields the following conclusions: 
– 	
For a small number of Pareto solutions, CPU performs better against GPU 

– 	
After 64 solutions, parallel implementation presents better results than sequential code 

– 	
An old-fashion mobile GPU performs almost same as a relatively high level CPU. 

– 	
As the number of solution increases, the professional high level GPU devices perform more stable than gen­eral purpose GPUs. 



4 	Implementation, Results, and Discussion 
The parallel desirability function-based scalarization ap­proach was applied to solve seven benchmark problems. These problems are selected based on the complexity against execution time on computation unit. Since the av­erage number of execution time is considered in the study, problems from simple calculation to problems with more branch and complex functions. In this section the bench­mark problems and the results with respect to execution time is presented. 
4.1 Benchmark Problems 
In this study, ten benchmark problems [9] with different complexity and Pareto shape are selected to present the per­formance of the method. Table 1 gives the mathematical formulations of the problems. The performance compar­ison is performed not only on the accuracy of the results, but more importantly on the execution time. As given in Ta­ble 1 the complexity of the benchmark problems are given from simple to more complex problems. The reason be­hind is that as the complexity of the function is increased, the single processors have to accomplish much more cal­culations, and since the single processors on a GPU has lower capacity than CPU, it will be a good comparison for not only the number of solutions in solution space but also the problem complexity. 
Table 1 presents as three columns. The .rst column gives the known-names of benchmark problems. The reader can be access amount of information about the function by searching by selecting keyword as function name. The sec­ond column is the mathematical formulation of the func­tion. As the order of row increases the complexity of the 
O. T. Altinoz et. al. 
function also increases. The last column is for the de.nes of the range of the decision variables. 
4.2 Implementation Results 
Table 2 presents the execution time comparison of CPU (Xeon E2620) and GPU (Tesla K20) for various numbers of levels from 8×8 to 100×100, number of single objective evaluations are 64 and 104 respectively. For low complex problems, until 225 numbers of levels (400 levels need for hard problems), the CPU outperforms GPU implementa­tion with respect to execution time. It is reasonable since only small portion of cores on GPU can be used. But lower number of relatively very fast cores are .nished the exe­cutions earlier than GPU. From 400 to 6, 400 levels, GPU computation time of parallel codes exceeds CPU time. At 6, 400 levels, the difference between CPU and GPU is at the peak grade. After that level, the advantage of GPU re­duces. In other words, the GPU implementation acts more sequentially, since there are not any empty resources to ex­ecute parallel implementation. Among all of the problems, UF1 is the hardest for GPU implementation since the com­putation time is the longest for this problem. The main reasons are that: a) checking mechanism for even and odd parts that adds branch to the code, b) square of the trigono­metric function. for GPU implementation branch are the time consuming programming codes such that in an if-else, both parts are evaluated by the architecture, that reduces the resources. 
The average execution time of CPU is 8.25-times slower than average GPU execution time. The following results are obtained for comparison the execution time: 
– 	
For a small number of solutions, CPU outperforms GPU 

– 	
The increase on CPU execution time is proportional to the number of solutions. Hence, the execution time on CPU increases. 

– 	
The GPU implementations are much bene.cial for overall comparison. 

– 	
For a very high number of solutions, the improve­ments obtained in GPU slowly decreases since GPU contains limited number of stream (multi)processors. At some point the improvements are not lower than . 10-times on average. 


5 	Conclusion 
In this study, desirability function-based scalarization ap­proach is evaluated in a parallel fashion. Since the perfor­mance of sequential and parallel implementations are sim­ilar to each other, the execution time of these codes are compared based on different number of solutions. The re­sults show that, for small number of solutions, parallel im­plementation is slower when compared to sequential im­plementation. But as the number of solution increases, the 

Table 1: Multiobjective benchmark problems 
Function name  Mathematical description  Decision variable range  
ZDT1  f1(x) = x1 f2(x) = g(1 - f1 g ) g = 1 + 9 n-1 n i=2 xi  0 . xi . 1  
ZDT2  f1(x) = x1 f2(x) = g(1 - ( f1 g )2) g = 1 + 9 n-1 n i=2 xi  0 . xi . 1  
ZDT3 UF1 UF2 UF3 UF4  f1(x) = x1 f2(x) = g(1 - f1 g - x1 g sin(10.x1)) g = 1 + 9 n-1 n i=2 xi f1(x) = x1 + 2 |J1| i.J1 (xi - sin(6.x1 + i. n ))2 f2(x) = 1 - . x1 + 2 |J2| i.J2 (xi - sin(6.x1 + i. n ))2 J1 = {i| i is odd and 2 . i . n}, J2 = {i| i is even and 2 . i . n} f1(x) = x1 + 2 |J1| i.J1 y2 i f2(x) = 1 - . x1 + 2 |J2| i.J1 y2 i yi = xi - (0.3x2 1 cos(24.x1 + 4i. n ) + 0.6x1) cos(6.x1 + i. n ), i . J1 xi - (0.3x2 1 cos(24.x1 + 4i. n ) + 0.6x1) sin(6.x1 + i. n ), i . J2 f1(x) = x1 + 2 |J-1| ((4 i.J1 y2 i ) - (2 i.J1 cos( 20yi.. i )) + 2) f2(x) = 1 - . x1 + 2 |J-2| ((4 i.J2 y2 i ) - (2 i.J2 cos(20yi.. i )) + 2) yi = xi - x 0.5(1+ 3(i-2) n-2 ) 1 f1(x) = x1 + 2 |J1| i.J1 h(yi) f2(x) = 1 - x2 1 + 2 |J2| i.J2 h(yi) yi = xi - sin(6.x1 + i. n ), h(t) = t 1+e2t  0 . xi . 1 0 . xi . 1 -1 . xi-1 . 1 0 . xi . 1 -1 . xi-1 . 1 0 . xi . 1 0 . xi . 1 -2 . xi-1 . 2  

Table 2: Execution time comparison [seconds] of benchmark functions, where improvement, impr, is the scale factor shows how many times the GPU is faster than CPU, so that if impr < 1 means CPU is faster than GPU 
# of levels for  Devices  
2 desirability  &  ZDT1  ZDT2  ZDT3  UF1  UF2  UF3  UF4  Average  
functions  impr  
CPU  0.133  0.109  0.19  0.11  0.109  0.109  0.094  0.1220  
8 × 8  GPU  0.433  0.4504  0.483  0.4917  0.4861  0.4906  0.408  0.4633  
impr  0.3072  0.2420  0.3934  0.2237  0.2242  0.2222  0.2304  0.2633  
10 × 10 15 × 15  CPU GPU impr CPU GPU impr  0.221 0.439 0.5034 0.446 0.4424 1.0081  0.153 0.451 0.3392 0.333 0.4576 0.7277  0.291 0.4848 0.6002 0.576 0.4904 1.1746  0.222 0.4934 0.4499 0.42 0.499 0.8417  0.199 0.49 0.4061 0.418 0.4944 0.8455  0.197 0.4914 0.4009 0.413 0.4967 0.8315  0.168 0.405 0.4148 0.372 0.409 0.9095  0.2073 0.4649 0.4450 0.4254 0.4699 0.9055  
20 × 20 25 × 25  CPU GPU impr CPU GPU impr  0.8 0.4281 1.8687 1.21 0.4393 2.7544  0.564 0.442 1.2760 0.893 0.4573 1.9528  0.997 0.4781 2.0853 1.521 0.491 3.0978  0.717 0.5 1.4340 1.12 0.5 2.2400  0.706 0.4977 1.4185 1.444 0.4954 2.9148  0.728 0.5 1.4560 1.114 0.499 2.2325  0.811 0.4146 1.9561 0.987 0.408 2.4191  0.7604 0.4658 1.6421 1.1841 0.4700 2.5159  
30 × 30 40 × 40  CPU GPU impr CPU GPU impr  1.753 0.4424 3.9625 3.162 0.4451 7.1040  1.266 0.4566 2.7727 2.186 0.453 4.8256  2.279 0.4871 4.6787 4.094 0.4893 8.3671  1.582 0.501 3.1577 2.794 0.4999 5.5891  1.589 0.4973 3.1953 2.854 0.4983 5.7275  1.59 0.4979 3.1934 2.757 0.4991 5.5239  1.428 0.4132 3.4560 2.508 0.4151 6.0419  1.6410 0.4708 3.4880 2.9079 0.4714 6.1684  
50 × 50 60 × 60  CPU GPU impr CPU GPU impr  4.879 0.4488 10.8712 6.946 0.4709 14.7505  3.431 0.4639 7.3960 4.798 0.4864 9.8643  6.138 0.4967 12.3576 9.492 0.518 18.3243  4.412 0.5119 8.6189 6.236 0.5287 11.7950  4.382 0.5 8.7640 6.411 0.518 12.3764  4.298 0.501 8.5788 6.391 0.519 12.3141  3.889 0.4321 9.0002 6.233 0.4587 13.5884  4.4899 0.4792 9.3695 6.6439 0.5000 13.2876  
70 × 70 80 × 80  CPU GPU impr CPU GPU impr  9.52 0.4995 19.0591 12.488 0.6179 20.2104  6.764 0.5144 13.1493 8.87 0.6321 14.0326  11.959 0.5417 22.0768 15.892 0.6488 24.4945  8.566 0.5489 15.6058 11.11 0.6388 17.3920  8.548 0.539 15.8590 11.366 0.635 17.8992  8.562 0.5435 15.7534 11.538 0.6362 18.1358  7.592 0.4923 15.4215 13.307 0.607 21.9226  8.7873 0.5256 16.7036 12.0816 0.6308 19.1553  
90 × 90 100 × 100  CPU GPU impr CPU GPU impr  15.776 0.8299 19.0095 19.2579 1.1157 17.2608  11.246 0.854 13.1686 13.863 1.149 12.0653  20.027 0.8749 22.8906 24.504 1.1812 20.7450  14.039 0.8424 16.6655 17.252 1.12 15.4036  14.138 0.84 16.8310 19.219 1.222 15.7275  14.053 0.8432 16.6663 17.74 1.125 15.7689  14.583 0.8335 17.4961 15.49 1.132 13.6837  14.8374 0.8454 17.5325 18.1894 1.1493 15.8078  


Figure 6: Comparison of the sequential Java and the parallel CUDA implementations. 
GPU is almost 20-times faster than sequential implementa­tion. 
Acknowledgement 
This study was made possible by grants from the Turk­ish Ministry of Science, Industry and Technology (In­dustrial Thesis – San-Tez Programme; with Grant Nr. 01568.STZ.2012-2) and the Scienti.c and Technological Research Council of Turkey -TÜBITAK (with Grant Nr. 112E168). The authors would like to express their grati­tude to these institutions for their support. 
References 
[1] R. Marler, S. Arora (2009) Transformation methods for multiobjective optimization, Engineering Opti­mization, vol. 37, no. 1, pp. 551–569. 
[2] N. Srinivas, K. Deb (1995) Multi-Objective function optimization using non-dominated sorting genetic al­gorithms, Evolutionary Computation, vol. 2, no. 3, pp. 221-–248. 
[3] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Com­putation, vol. 6, no. 2, pp. 182-–197. 
[4] J. D. Schaffer (1985) Multiple objective optimization with vector evaluated genetic algorithms, Proceed­ings of the International Conference on Genetic Al­gorithm and their Applications, pp. 93–100. 
[5] J. Branke, K. Deb (2008) Integrating user prefer­ences into evolutionary multiobjective optimization, Knowledge Incorporation in Evolutionary Comput­ing, Springer, pp. 461–478. 
[6] O. T. Altinoz, A. E. Yilmaz, G. Ciuprina (2013) A Multiobjective Optimization Approach via Systemat­ical Modi.cation of the Desirability Function Shapes, Proceedings of the 8th International Symposium on Advanced Topics in Electrical Engineering. 
[7] G. Derringer, 	R. Suich (1980) Simultaneous op­timization of several response variables,Journal of Quality Technology, vol. 12, no. 1, pp. 214–219. 
[8] NVIDIA Corporation (2012) CUDA dynamic paral­lelism programming, NVIDIA. 
[9] E. Ziztler, K. Deb, L. Thiele (2000) Comparison of multiobjective evolutionary algorithms: Empirical re­sults, Evolutionary Computation Journal, vol. 8, no. 2, pp. 125–148. 

Using a Genetic Algorithm to Produce Slogans 
Polona TomašicˇXLAB d. o. o., Pot za Brdom 100, SI-1000 Ljubljana, Slovenia and Jožef Stefan International Postgraduate School, Jamova cesta 39, SI-1000 Ljubljana, Slovenia E-mail: polona.tomasic@xlab.si 
Gregor Papa Computer Systems Department, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia and Jožef Stefan International Postgraduate School, Jamova cesta 39, SI-1000 Ljubljana, Slovenia E-mail: gregor.papa@ijs.si 
Martin ŽnidaršicˇDepartment of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia and Jožef Stefan International Postgraduate School, Jamova cesta 39, SI-1000 Ljubljana, Slovenia E-mail: martin.znidarsic@ijs.si 
Keywords: genetic algorithm, slogan generation, computational creativity, linguistic resources 
Received: December 1, 2014 
Creative tasks, such as creation of slogans for companies, products or similar entities, can be viewed from the combinatorial perspective – as a search through the space of possible combinations. To solve such a combinatorial optimization problem, we can use evolutionary algorithms. In this paper, we present our solution for generation of slogans based on a genetic algorithm and linguistic resources. We also compare it to the unguided slogan generator. 
Povzetek: Na kreativne naloge, kot je snovanje sloganov za podjetja in produkte, lahko gledamo s kombi­natoriˇcnega vidika – kot na iskanje v prostoru možnih kombinacij. Za reševanje tovrstnih kombinatoriˇcnih optimizacijskih problemov lahko uporabljamo evolucijske algoritme. V tem ˇ
clanku predstavljamo rešitev 

za generiranje sloganov na podlagi genetskega algoritma in jezikovnih virov. Predstavljeno rešitev primer­
jamo tudi z generatorjem sloganov brez vodenja. 

1 Introduction 
Automated generation of slogans is a problem from the .eld of Computational Creativity [5]. There are very few studies dedicated to slogan generation. In fact, the only one we came across is the BRAINSUP framework [19], which is based on beam search through a carefully de.ned space of possible slogans. This space gets reduced by applying user speci.ed constraints on keywords, domain, emotions, and other properties of slogans. 
High quality slogans are often a result of group brain­storming. Several individuals present their ideas and the proposed slogans are then mixed into new slogans, and some new ideas emerge. This brainstorming process is sim­ilar to the evolution, from which we got the idea of using evolutionary algorithms for slogan generation. The initial slogans from brainstorming represent an initial population, mixing the best proposed slogans represents recombina­tion, and new included ideas represent mutations. Evolu­tionary algorithms have already been applied to different natural language processing problems [2]. 
In this paper, we present our slogan generation proce­dure which is not in.uenced by the user in any way, apart from being provided with a short textual description of the target entity. The method is based on a genetic algo­rithm (GA) [3]. Genetic algorithms are the most traditional evolutionary algorithms and they ensure a good coverage of the search space. They have been successfully used for generating recipes [17], poetry [13] and trivial dialog phrases [16]. However, genetic algorithms have not been previously used for slogan generation. Our method follows the BRAINSUP framework in the initial population genera­tion phase, and it uses a collection of heuristic slogan func­tions in the evaluation phase. 
We tested our slogan generator and compared it to the random slogan generator. The statistical results are in fa­vor of our method. However, even though the generated slogans can present a good starting point for brainstorm­ing, their quality is not yet at the desired level. 
The rest of the paper is organized as follows. In Section 2 we present the linguistic and semantic resources used in our solution. Section 3 provides a detailed description of the entire slogan generation process. It includes descrip­tion of the evaluation functions and it clari.es the differ­ence between the slogan generator and the unguided slogan generator. The performed experiments and the discussion of the results are presented in Section 4. The conclusions are drawn in Section 5. 


2 Resources 
Linguistic and semantic resources are a prerequisite for any kind of text generation. We use them at several steps of our method – for generation of initial population, mutation, and evaluation. Some are available as extended libraries for programming languages, others are available for download from the Internet, and some databases were created by our­selves. The origin of the data and the process is brie.y described in the following paragraphs. 
1. 	
Database of famous slogans: it serves as a basis for the initial population generation and for comparison with generated slogans. It contains 5,249 famous slo­gans obtained from the Internet. 

2. 	
Database of frequent grammatical relations be­tween words in sentences: for its acquisition we used the Stanford Dependencies Parser [14]. Stanford de­pendencies are triplets containing two words and the name of the relation between them. The parser also provides part-of-speech (POS) tags and phrase struc­ture trees. To get representatives of frequent gram­matical relations between words, we parsed 52,829 random Wikipedia pages, sentence by sentence, and obtained 4,861,717 different dependencies. 

3. 	
Database of slogan skeletons: slogan skeletons were obtained by parsing famous slogans with the Stanford Dependencies Parser. A slogan skeleton contains in­formation about each position in the sentence – its POS tag and all its dependence relations with other words in the sentence. It does not contain any content words, only stop words. An example of a skeleton is shown in Figure 1. 


Figure 1: Example of a skeleton. 
3 Slogan Generation 
An input of our slogan generator is a short textual descrip­tion about the target entity. It is the only required input from a user. It is used to obtain the name of the target en­tity and a set of keywords. An output is a list of generated slogans. The whole procedure is shown in Algorithm 1. 
3.1 	Extraction of the Keywords and the Main Entity 
The most frequent non-negative words from the input text are selected as keywords. Negative words are detected us­ing the Nodebox English Linguistics library [18]. The main entity is usually the name of the company and is obtained by selecting the most frequent entity in the whole text using the nltk library [4]. 
3.2 	Generation of the Initial Population of Slogans 
The procedure of generating the initial population of slo­gans is based on the BRAINSUP framework [19], with some modi.cations. It follows the steps in Algorithm 2. Skele­tons are obtained from the database of slogan skeletons. Fillers are the words from the database of all grammatical relations between words in sentences that satisfy all pre­de.ned dependencies and POS tags. If there are any key­words in a set of all possible .ller words, the algorithm assigns them higher priority for the selection phase. The main difference between our algorithm and the BRAIN­SUP method is in the selection of .ller words. We don’t consider any user speci.ed constraints, while the BRAIN­SUP framework uses beam search in the space of all possi­ble lexicalizations of a skeleton to promote the words with the highest likelihood of satisfying the user speci.cations. Thus using our method we can produce many different slo­gans from the same slogan skeleton, whereas BRAINSUP produces only one for given user speci.cations. 
3.3 	Evaluation of Slogans 
An aggregated evaluation function is used to evaluate the slogans. It is composed of 9 different sub-functions, each assessing a particular feature of a slogan, with scores in the interval [0,1]. Parameter of the aggregation function is a list of 9 weights that sum to 1. They de.ne the proportions of sub-functions in the overall score. In this subsection, we give a short description for every one of them. 
3.3.1 	Bigram Function 
In order to work with 2-grams, we obtained the dataset of 1,000,000 most frequent 2-grams and 5,000 most fre­quent words in Corpus of Contemporary American English (COCA) [6]. We assume that slogans containing many fre­quent 2-grams, are more likely to be semantically coherent. 

Algorithm 1: SloganGenerator 

1 	Input: A textual description of a company or a product T , Size of the population SP, Maximum number of iterations MaxIterations, Crossover probability pcrossover, Mutation probability pmutation, Set of evaluation weights W . 
2 	Output: A set of generated slogans S. 
1: 	Keywords, Entity . GetKeywordsAndEntity(T ) 
2: 	P . CreateInitialPopulation(SP, Keywords, Entity) 
3: 	Evaluate(P ) 
4: 	Iteration . 0 
5: 	while Iteration < MaxIterations do 
6: P arents . ChooseParentsForReproduction(P ) 
7: Children . Crossover(P arents, pcrossover) 
8: Children . Mutation(Children, pmutation) 

9: NewGeneration . DeleteSimilarSlogans(P, Children) 
10: while Size(NewGeneration) <SP do 

11: AddRandomlyGeneratedSlogan(NewGeneration) 
12: end while 
13: Evaluate(NewGeneration) 
14: P . SPBestSlogans(NewGeneration) 
15: Iteration . Iteration +1 
16: end while 
17: S . P 
Algorithm 2: CreateInitialPopulation 

1 Input: Size of the population SP, a set of target keywords K, and the target entity E. 2 Output: A set of initial slogans S. 
1: 	S .Ř 
2: 	while SP > 0 do 

3: SloganSkeleton . SelectRandomSloganSkeleton() 
4: while not AllEmptySlotsFilled(SloganSkeleton) do 
5: EmptySlot . SelectEmptySlotInSkeleton(SloganSkeleton) 
6: F illers . FindPossibleFillerWords(EmptySlot) 
7: F illerW ord . SelectRandomFillerWord(F illers) 
8: FillEmptySlot(SloganSkeleton, F illerW ord) 
9: end while 
10: AddFilledSkeleton(S, SloganSkeleton) 
11: SP . SP - 1 
12: end while 

3.3.2 Length Function 
The length function is very strict, it assigns score 1 to slo­gans with less than eight words, and score 0 to longer ones. The threshold between 0 and 1 was set according to the re­sults of the experiments, which showed that a large major­ity of the generated slogans that contained more than seven words were grammatically incorrect and semantically in­coherent. Also, more than 90% of the famous slogans are less than eight words long. This function acts as an abso­lute constraint and that is why no values between 0 and 1 are allowed. 
3.3.3 Diversity Function 
The diversity function evaluates a slogan by counting the number of repeated words. The highest score goes to a slo­gan with no repeated words. If a slogan contains identical consecutive words, it receives score 0. 
3.3.4 Entity Function 
It returns 1, if slogan contains the main entity, and 0, if it doesn’t. 
3.3.5 Keywords Function 
If one up to half of the words in a slogan belong to the set of keywords, the keywords function returns 1. If a slogan doesn’t contain any keyword, the score is 0. If more than half of the words in the slogan are keywords, the score is 
0.75. 

3.3.6 	Word Frequency Function 
This function prefers slogans with many frequent words. A word is considered to be frequent, if it is among 5,000 most frequent words in COCA. The frequency score is obtained by dividing the number of frequent words by the number of all words in the slogan. 
3.3.7 	Polarity and Subjectivity Functions 
Polarity of a slogan indicates whether slogan contains pos­itive or negative words. For instance, the adjective “happy" is a positive word. In a similar way subjectivity of a slogan indicates whether slogan contains words that express the attitude of the author. For instance, the adjectives “good" and “bad" both represent the opinion of the author and are therefore subjective. The polarity and subjectivity scores are calculated based on the adjectives in the slogan, using the sentiment function from pattern package for Python [7]. 
3.3.8 	Semantic Relatedness Function 
This function computes the relatedness between all pairs of content words in a slogan. Stop words are not taken into account. Each pair of words gets a score based on the path distance between corresponding synsets (sets of synonyms) in WordNet [15]. The .nal score is the sum of all pairs’ scores divided by the number of all pairs. 
3.4 	Production of a New Generation of Slogans 
A list of all generated slogans is ordered descending with regard to the evaluation score. We use 10% elitism [8]. The other 90% of parent slogans are selected using a roulette wheel [11]. 
A new generation is built by pairing parents and per­forming the crossover function followed by the mutation function, which occur with probabilities pcrossover and pmutation, respectively. Offspring are then evaluated and compared to the parents, in order to remove very similar ones. If the number of the remaining slogans is smaller than the size of the population, some additional random slogans are generated using the method for creation of ini­tial population. After that, slogans proceed into the next generation. These steps are repeated until the prede.ned number of iterations is achieved. 
3.4.1 	Crossover 
We use two types of crossover functions, the big and the small one. Both inspect POS tags of the words in both par­ents, and build a set of possible crossover locations. Each element in the set is a pair of numbers. The .rst one pro­vides a position of crossover in the .rst parent and the sec­ond one in the second parent. The corresponding words must have the same POS tag. Let the chosen random pair from the set be (p, r). Using the big crossover, the part of 
P. Tomašiˇc et al. 
the .rst parent, from the p-th position forward, is switched with the part of the second parent, from the r-th position forward. For the small crossover only the p-th word in the .rst parent and the r-th word in the second parent are switched. Examples for the big and the small crossover are illustrated in Figure 2. 

Figure 2: Examples for the big and the small crossover.1 
3.4.2 	Mutation 
Two types of mutation are possible. Possible big muta­tions are: deletion of a random word; addition of an ad­jective in front of a noun word; addition of an adverb in front of a verb word; replacement of a random word with new random word with the same POS tag. Small mutations are replacements of a word with its synonym, antonym, meronym, holonym, hypernym or hyponym. A meronym is a word that denotes a constituent part or a member of something. The opposite of a meronym is a holonym – the name of the whole of which the meronym is a part. A hy­pernym is a general word that names a broad category that includes other words, and a hyponym is a subdivision of more general word. 
Functions for obtaining such replacements are embedded into the Nodebox English Linguistics library and are based on the WordNet lexical database. 
3.4.3 	Deletion of Similar Slogans 
Every generated slogan is compared to all its siblings and to all the evaluated slogans from the previous generation. If a child is identical to any other slogan, it gets removed. If more than half of child’s words are in another slogan, the two slogans are considered similar. Their evaluation scores are being compared and the one with higher score remains in the population while the other one is removed. The child is also removed if it contains only one word or if it is longer than 10 words. Deletion of similar slogans prevents the generated slogans to converge to the initial ones. This has been checked by testing our method without the deletion of similar slogans phase. 
1Slogans used in the examples were or still are of.cial slogans of the following companies: General Electric, United Airlines, Nike, and BC Dairy Association. 

3.5 Correction of Grammatical Errors 
Crossover and mutation functions may cause grammatical errors in generated slogans. For instance, incorrect usage of determiners (e.g., “a apple" instead of “an apple"), se­quence of incompatible words (e.g., “a the"), and others. Spelling mistakes were much less frequent. 
In order to remove both types of errors in the .nal slo­gans, we tested different spell-and grammar checkers. One example of a spell-checker is Hunspell [12]. Its downside is that it works on one word at a time and does not take the word’s context into account. As the majority of er­rors in slogans originated from grammar, we tested sev­eral grammar checkers. They, on the other hand, work on the sentence level rather than on the word level. Most of these grammar checkers are available as online services, and don’t support API calls. One that does is python-ginger [10] – a Python package for .xing grammar mis­takes. It comes with an unof.cial Ginger [9] API. This tool corrects different types of grammatical mistakes. It is also used for contextual spelling correction. We used python-ginger only on .nal slogans, the ones that are displayed to the user, because the corrected slogan may not have the same structure anymore. Possible added words, or replac­ing a word with another one with different POS tag would cause errors while executing crossover, mutation and eval­uation functions. 
3.6 Unguided Slogan Generator 
For the purpose of evaluation of our slogan generation method, we also implemented an unguided slogan gener­ator (USG). This generator produces random slogans, such as the ones in the initial population. The only difference between our method and the unguided slogan generation method is in the production of a new generation. USG has no crossover and the mutation steps. Instead it produces a new generation using a method for creation of initial popu­lation. Thus children are independent of the previous gen­eration. The algorithmic steps are shown in Algorithm 3. 
4 Experiments 
We tested the slogan generation method on different input texts and for different values of algorithm parameters. We analyzed the results of every iteration of the genetic algo­rithm to see how the slogans’ scores changed and made further assessment of the generator by comparing its results with the results of the unguided slogan generator. 
4.1 Experimental Setting 
4.1.1 	Input Text 
In the presented experiments, we use a case of the Croatian provider of marine solutions, Sentinel. Sentinel is a control module that provides more security to boat owners, and is 
Informatica 39 (2015) 125–133 129 
comprised of a network of sensors and a central informa­tion hub. It ensures the vessel is monitored at all times. The input text was obtained from the Sentinel’s web-page [21]. 
4.1.2 	Algorithm Parameters 
Different combinations of weights of the evaluation func­tion were tested on a set of manually evaluated slogans. We added one constraint – the weight of the keywords function had to be at least 0.2 in order to include keywords in the slo­gans. Without this constraint the computed weight for the keywords was almost zero. The comparison of the com­puted and the manually assigned scores showed that the highest matching was achieved with the following weights: [bigram: 0.25, length: 0.01, diversity: 0.01, entity: 0.1, keywords: 0.2, frequent words: 0.25, polarity: 0.01, sub­jectivity: 0.02, semantic relatedness: 0.15]. 
Probabilities for crossover and mutation were set to pcrossover =0.8 and pmutation =0.7. The probability for mutation was set very high, because it affects only one word in a slogan. Consequently the mutated slogan is still very similar to the original one. Thus the high mutation probability does not prevent population from converging to the optimum solution. For the algorithm to decide which type of crossover to perform, we set probabilities for the big, the small and both crossovers to 0.4, 0.2 and 0.4, re­spectively. The mutation type is chosen similarly. Proba­bilities of the big and the small mutation were set to 0.8 and 
0.2. These algorithm parameters were set according to the results of testing on a given input text, as their combination empirically leads to convergence. 
We performed three experiments and for each of them we executed 20 runs of the algorithm using the same input parameter values. The difference between these three tests was in the size of the population (SP) and the number of it­erations (NIt). Those were chosen according to the desired number of all evaluations (. 6, 800 NoE), and the NoE was set according to the desired execution time for one run of the algorithm – approximately 2 hours. 
1. 
SP: 25, NIt: 360 

2. 
SP: 50, NIt: 180 

3. 
SP: 75, NIt: 120 


4.1.3 	Comparison with the Unguided Slogan Generator 
For comparison, we performed three experiments with the unguided slogan generator. For each of them we executed 20 runs of the algorithm using the same input parameter values as in the experiments with slogan generator. The initial populations were also identical. The number of iter­ations were again chosen so as to match the number of all evaluations (. 6, 800 NoE) in the experiments with slogan generator: 
1. 
SP: 25, NIt: 300 

2. 
SP: 50, NIt: 150 

3. 
SP: 75, NIt: 100 



Algorithm 3: UnguidedSloganGenerator 

1 	Input: A textual description of a company or a product T , Size of the population SP, Maximum number of iterations MaxIterations, Set of evaluation weights W . 
2 	Output: A set of generated slogans S. 
1: 	Keywords, Entity . GetKeywordsAndEntity(T ) 
2: 	P . CreateInitialPopulation(SP, Keywords, Entity) 
3: 	Evaluate(P ) 
4: 	Iteration . 0 
5: 	while Iteration < MaxIterations do 

6: Children . CreateInitialPopulation(SP, Keywords, Entity) 
7: NewGeneration . DeleteSimilarSlogans(P, Children) 
8: while Size(NewGeneration) <SP do 

9: AddRandomlyGeneratedSlogan(NewGeneration) 
10: end while 
11: Evaluate(NewGeneration) 
12: P . SPBestSlogans(NewGeneration) 
13: Iteration . Iteration +1 
14: end while 
15: S . P 

In USG, children in new generations are frequently identi­cal to parents, and therefore need no evaluation (we already have the scores of the parents). We wanted to compare the two generators based on the number of evaluations, not the number of iterations. For our slogan generator to reach the same number of evaluations as the unguided slogan gen­erator, it needs to perform more iterations of genetic algo­rithm. That is why the numbers of iterations in SG and USG differ. 
4.2 Results and Discussion 
Comparing the statistical results of the initial and .nal pop­ulations of slogans, there were no major differences be­tween the 20 runs of the algorithm on the same input data for all 6 experiments. The number of evaluations for each run is approximately 6, 800. 
Statistics of average initial slogans’ scores are in Table 1. The numbers are the same for both generators. Average .­nal slogans’ scores are in Table 2. The average minimum score is much higher using the unguided slogan generator (USG). This is because in our slogan generator (SG) many slogans get deleted in the deletion phase of the algorithm. Consequently some new random slogans are automatically included in a new generation, and they can have very low evaluation scores. However, SG has higher maximum slo­gan scores. This suggests that the usage of crossover and mutation functions actually increases the slogan scores. The average score of the 10 best slogans is higher using the SG. 
Numbers in both tables show that average slogans’ scores increased a lot from the initial population to the .­nal one. Figures 3 and 4 show the relation between average slogan scores and the number of performed evaluations in a genetic algorithm using SG and USG. Using the USG causes the scores to increase immensely already in the .rst few iterations of the genetic algorithm. After that, they do not increase much anymore. In SG slogans’ scores increase a little bit slower, but at some point they exceed the USG scores. 
From the two graphs in Figures 3 and 4 one might con­clude that the unguided slogan generator is at least as good as our developed slogan generation method. However, the numbers are calculated on slogans from a whole genera­tion. In practice we don’t expect the user to go through all 75 .nal slogans, but only a few. Thus only the best few slogans from the .nal list are important. Table 3 shows the average scores for the 10 best .nal slogans. In this case the slogan generator outperforms the unguided slogan genera­tor. 
In the following two lists, there are examples of slogans for one speci.c run of the algorithm. The .rst list contains 10 best-rated initial slogans and the second one contains 10 best-rated .nal slogans for the case when the size of the population was set to 50. Evaluation scores are in the brackets. The .nal slogans list contains the corrected ver­sions of slogans using the Ginger API. 
Initial Population: 
1. 
Former of.ces for all its members houses. (0.692) 

2. 
The lowest water to play Sentinel build. (0.664) 

3. 
Land routes to better places. (0.663) 

4. 
The Sentinel performance is topic. (0.662) 

5. 
On day to perform. (0.642) 

6. 
The side take in region. (0.639) 

7. 
Even now right as not. (0.638) 

8. 
A precise application consists with a pair. (0.632) 

9. 
Draft the choice of allowing. (0.629) 

10. 
The initiative in pursuing systems and weapons. 


(0.623) 

Table 1: Comparison of average initial slogans’ scores for population sizes 25, 50 and 75. 
Size of the population  Minimum  Maximum  Average  Median  Standard deviation  
25  0.000  0.713  0.287  0.359  0.257  
50  0.000  0.740  0.289  0.302  0.257  
75  0.000  0.730  0.274  0.295  0.251  

Table 2: Comparison of average .nal slogans’ scores using our slogan generator (SG) and the unguided slogan generator (USG) for population sizes 25, 50 and 75 
Size of the population  Minimum  Maximum  Average  Median  Standard deviation  
25 (SG)  0.578  0.906  0.801  0.823  0.088  
50 (SG)  0.511  0.927  0.793  0.807  0.090  
75 (SG)  0.488  0.939  0.773  0.791  0.094  
25 (USG)  0.763  0.840  0.795  0.796  0.021  
50 (USG)  0.723  0.837  0.767  0.761  0.032  
75 (USG)  0.707  0.840  0.750  0.743  0.036  

Table 3: Comparison of average scores of 10 best .nal slogans, using our slogan generator (SG) and the unguided slogan generator (USG) for population sizes 25, 50 and 75 
Size of the population  Minimum  Maximum  Average  Median  Standard deviation  
25 (SG)  0.833  0.906  0.869  0.870  0.023  
50 (SG)  0.877  0.927  0.895  0.892  0.019  
75 (SG)  0.871  0.939  0.902  0.902  0.023  
25 (USG)  0.799  0.840  0.816  0.813  0.013  
50 (USG)  0.804  0.837  0.819  0.813  0.011  
75 (SG)  0.801  0.840  0.818  0.814  0.013  

Final Slogans: 
1. 
Enjoy the part like water for sentinel. (0.958) 

2. 
Enjoy a take of routine on sentinel. › 
Enjoy a track of routine on sentinel. (0.958) 


3. 
Make all safety in safe for sentinel. › 
Make all safety in safe for a sentinel. (0.958) 


4. 
Demand and enjoy the use in sentinel. › Demand and enjoy the ease in sentinelthe sentinel. 


(0.958) 
5. 
Write a base for demand on sentinel. (0.948) 

6. 
Demand the of potential as sentinel. (0.945) 

7. 
Enjoy a sentinel performance show. (0.922) 

8. 
Themes for head on sentinel. (0.913) 

9. 
Contents with application on sentinel. › 
Contents with application of sentinel. (0.913) 


10. 
Make the sentinel performance plays. (0.897) 



The analysis of initial populations and .nal slogans in all runs of experiments shows that the majority of slogans are semantically incoherent and have grammatical errors. However, slogans produced with the unguided slogan gen­erator seemed more structured and semantically coherent. This is understandable, since the crossover and mutation functions in our slogan generator affect the sentence struc­ture a lot. The percentage of corrected .nal slogans is also in favor of the unguided slogan generator: 24.6% of .nal slogans produced with USG got corrected with the Ginger API, while the percentage of corrected .nal slogans for SG is 33.9%. But we need to take into account the fact that Ginger API does not work without mistakes. Some of the corrections are strange or unnecessary (e.g., see example 4 in the .nal slogans list). 
5 Conclusion 
The proposed slogan generation method works and could be potentially useful for brainstorming. It produces slo­gans solely from the textual description of the target en­tity. No other user speci.cations are needed. The genetic algorithm ensures higher slogan scores with each new iter­ation. Our method outperforms the unguided slogan gen­erator whose best 10 .nal slogans have signi.cantly lower average scores. The unguided slogan generator also needs more than six times more time to produce and evaluate the same number of slogans as our slogan generator. 
The evaluation function is inherently hard to formalize and seems not yet fully aligned with human evaluation. The de.nitions of evaluation sub-functions need further improvement in order to increase the quality of slogans, not only their scores. 
The current algorithm is suitable only for production of slogans in English. The lack of resources and different lan­guage properties would require a lot of work in order to adapt our algorithm to another language. 

0.8 
Average Slogans’ Scores
0.6 
0.4 
0.2 


Number of Evaluations 
Figure 3: Slogan generator: average scores of slogans in a relation to the number of evaluations. 


Number of Evaluations 
Figure 4: Unguided slogan generator: average scores of slogans in a relation to the number of evaluations. 

Following are some ideas for the future work that would improve the quality of slogans. One is detecting and cor­recting grammatical errors already during the generation phase. New weights for the evaluation could be com­puted periodically with semi-supervised learning on manu­ally assessed slogans. The parallelization of GA [1] might provide gains in performance. Also, the GA parameters could be adaptively calculated during the optimization pro­cess [20]. 
Acknowledgement 
This research was partly funded by the European Union, European Social Found, in the framework of the Opera­tional Programme for Human Resources Development, by the Slovene Research Agency and supported through EC funding for the project ConCreTe (grant number 611733) and project WHIM (grant number 611560) that acknowl­edge the .nancial support of the Future and Emerging 
Technologies (FET) programme within the Seventh Frame­work Programme for Research of the European Commis­sion. 
References 
[1] E. Alba, J. M. Troya (1999) A survey of parallel dis­tributed genetic algorithms, Complexity, vol. 4, pp. 31–52. 
[2] L. Araujo (2009) How evolutionary algorithms are applied to statistical natural language processing, Ar­ti.cial Intelligence Review, vol. 28, pp. 275–303. 
[3] T. 	Bäck (1996) Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms, Oxford Univer­sity Press. 
[4] S. Bird, E. Klein, E. Loper (2009) Natural language processing with Python, O’Reilly Media. 

[5] S. Colton, R. Mantaras, O. Stock (2009) Computa­tional Creativity: Coming of age, AI Magazine, vol. 30, no. 3, pp. 11–14. 
[6] M. Davies, N-grams data from the Corpus of Contem­porary American English (COCA), www.ngrams. info, downloaded on April 15, 2014. 
[7] T. 	De Smedt, W. Daelemans (2012) Pattern for Python, Journal of Machine Learning Research, vol. 13, pp. 2063–2067. 
[8] D. Dumitrescu, B. Lazzerini, L. C. Jain, A. Du­mitrescu (2000) Evolutionary Computation, CRC Press. 
[9] Ginger, 	www.gingersoftware.com/ grammarcheck, accessed on October 17, 2014. 

[10] Ginger 	API, github.com/zoncoen/ python-ginger, accessed on October 17, 2014. 
[11] J. H. Holland (1992) Adaption in Natural and Arti.­cial Systems, MIT Press. 
[12] Hunspell, 	hunspell.sourceforge.net, ac­cessed on October 20, 2014. 
[13] R. Manurung, G. Ritchie, H. Thompson (2012) Using genetic algorithms to create meaningful poetic text, Journal of Experimental & Theoretical Artifcial In­telligence, vol. 24, pp. 43–64. 
[14] M. Marneffe, B. MacCartney, 	C. Manning (2006) Generating typed dependency parses from phrase 
Informatica 39 (2015) 125–133 133 
structure parses, Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pp. 449–454. 
[15] G. A. 	Miller (1995) WordNet: A Lexical Database for English, Communications of the ACM, vol. 38, pp. 39–41. 
[16] C. S. Montero, K. Araki (2006) Is it correct?: Towards web-based evaluation of automatic natural language phrase generation, Proceedings of the Joint Confer­ence of the International Committee on Computa­tional Linguistics and the Association for Computa­tional Linguistics (COLING/ACL), pp. 5–8. 
[17] R. G. Morris, S. H. Burton (2012) Soup over bean of pure joy: Culinary ruminations of an artifcial chef, Proceedings of the International Conference on Com­putational Creativity (ICCC), pp. 119–125. 
[18] NodeBox, 	nodebox.net/code/index.php/ Linguistics, accessed on October 17, 2014. 
[19] G. Özbal, D. Pighin, C. Strapparava (2013) BRAIN­SUP: Brainstorming Support for Creative Sentence Generation, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1446–1455. 
[20] G. 	Papa (2013) Parameter-less algorithm for evolutionary-based optimization, Computational Optimization and Applications, vol. 56, pp. 209–229. 
[21] Sentinel, sentinel.hr, accessed on October 10, 2014. 

Comparing Evolutionary Operators, Search Spaces, and Evolutionary Algorithms in the Construction of Facial Composites 
Joseph James Mist, Stuart James Gibson and Christopher John Solomon School of Physical Sciences, University of Kent, Canterbury, United Kingdom E-mail: jm441@kent.ac.uk, s.j.gibson@kent.ac.uk, c.j.solomon@kent.ac.uk 
Keywords: interactive evolutionary algorithm, facial composite 
Received: December 1, 2014 
Facial composite construction is one of the most successful applications of interactive evolutionary com­putation. In spite of this, previous work in the area of composite construction has not investigated the algorithm design options in detail. We address this issue with four experiments. In the .rst experiment a sorting task is used to identify the 12 most salient dimensions of a 30-dimensional search space. In the sec­ond experiment the performances of two mutation and two recombination operators for interactive genetic algorithms are compared. In the third experiment three search spaces are compared: a 30-dimensional search space, a mathematically reduced 12-dimensional search space, and a 12-dimensional search space formed from the 12 most salient dimensions. Finally, we compare the performances of an interactive genetic algorithm to interactive differential evolution. Our results show that the facial composite construc­tion process is remarkably robust to the choice of evolutionary operator(s), the dimensionality of the search space, and the choice of interactive evolutionary algorithm. We attribute this to the imprecise nature of hu­man face perception and differences between the participants in how they interact with the algorithms. 
Povzetek: Kompozitna gradnja obrazov je ena izmed najbolj uspešnih aplikacij interaktivnega evoluci­jskega raˇcju kompozitne gradnje niso bile podrobno raziskane 
cunanja. Kljub temu pa do zdaj na podroˇ

možnosti snovanja algoritma. To vprašanje smo obravnavali s štirimi poskusi. V prvem je uporabljeno 
sortiranje za identi.kacijo 12 najbolj izstopajoˇ

cih dimenzij 30-dimenzionalnega preiskovalnega prostora. V drugem primerjamo uˇ
cinkovitost dveh mutacij in dveh rekombinacijskih operaterjev za interaktivni genetski algoritem. V tretjem primerjamo tri preiskovalne prostore: 30-dimenzionalni, matematiˇcno re­ducirani 12-dimenzionalni in 12-dimenzionalni prostor sestavljen iz 12 najpomembnejših dimenzij. Na koncu smo primerjali uspešnost interaktivnega genetskega algoritma z interaktivno diferencialno evolucijo. Rezultati kažejo, da je proces kompozitne gradnje obrazov izredno robusten glede na izbiro evolucijskega operatorja(-ev), dimenzionalnost preiskovalnega prostora in izbiro interaktivnega evolucijskega algoritma. To pripisujemo nenatanˇ
cni naravi percepcije in razlikam med interakcijami uporabnikov z algoritmom. 
1 Introduction 
Consider a situation in which a person witnesses a crime being committed by an unknown perpetrator. In the in­terests of identifying and subsequently locating the perpe­trator, a facial image is often created from the witnesses’ memory of the event. The traditional method is for the witness to select, from a database, individual facial fea­tures which a composite system operator then combines to form a likeness to the perpetrator called a facial compos­ite. However, psychological research has shown that peo­ple generally recognise faces as whole objects (holistically) as opposed to recognising faces as collections of individual features [24, 6]. Also, people .nd it dif.cult to recall faces from memory and describe them whereas recognising an individual from a photograph of their face is a relatively easy task. Holistic methods for facial composite construc­tion have been developed that account for these facets of human memory. EFIT-V [26] and EvoFIT [7] are commer­
cial systems based on these principles that were developed in the early 2000s. EFIT-V is now used by over 75% of police constabularies in the UK and by many other law en­forcement agencies in countries around the world. 
The holistic method represents faces as points in a mul­tidimensional search space. In our work, we refer to such a search space as a face-space due to its conceptual simi­larity to the notion of face-space in cognitive psychology research [25]. The key idea is to navigate from an initial starting point navigate to a unique region of face-space that corresponds to a facial likeness of the perpetrator. 
The dimensions of face-space are determined by the principle components (PCs) of a training set of face images [5]. Each PC represents a unique holistic aspect of facial appearance and accounts for a proportion of the statistical image variance within the training set. The PCs are ordered by decreasing variance such that the .rst PC accounts for more variation than the second PC which accounts for more variation than the third PC etc. Faces not included in the training set, such as a perpetrator’s face, may also be ap­proximated by a weighted sum of the PCs. 

To produce a likeness of a perpetrator, some process for searching the face-space is required. A simple approach is to use a bank of sliders in which each slider corresponds to a single PC. This method has been used in a workable com­posite system [3] but has two drawbacks: it is unlikely that any one slider will produce a change in facial appearance that maps to a simple semantic description (e.g. thin face) and the number of permutations of for the bank of sliders becomes cognitively prohibitive even for a relatively small number of PCs. 
An alternative, less demanding, method for locating a face in a face-space is to use an iterative process whereby generated faces are assessed by the witness according to their similarity to the perpetrator. This method is imple­mented in EFIT-V and EvoFIT using interactive evolution­ary algorithms (IEAs). In IEAs the .tness function evalua­tion, standard in evolutionary algorithms (EAs), is replaced by subjective human evaluation. IEAs are suitable for tasks requiring human assessment of solutions in which input values are dif.cult to optimise individually because of in­teraction between input values and because of the noisy and imprecise nature of human interaction. Takagi [23] pro­vides many examples of tasks that IEAs have been applied to, including the .tting of hearing aids, graphic art, and in­dustrial design. 
Genetic algorithms (GAs) were introduced by Holland in 1973 [12]. GAs can be used to solve problems requiring binary, integer, and real valued inputs and are easy to im­plement. For these reasons, interactive genetic algorithms (IGAs) are a popular choice of IEA. IGAs were used in the implementation of EFIT-V and EvoFIT and have also been applied to tasks such as image .ltering [15] and product design [4]. 
The use of human evaluation places limitations on an IEA which are not usually present in an EA. Fatigue will limit the number of individuals (faces) a user is willing to evaluate. Fatigue also limits the granularity of the scale upon which individuals can be rated. For example, a scale of 1–100 is overly burdensome whereas a simple “good” or “not good” decision is less so [28]. It is a demanding task for users to assign absolute .tness scores to individu­als, which limits the number of individuals that a user can be expected to evaluate. An alternative approach that en­ables users to evaluate more individuals, albeit generally less thoroughly, is to allow the user to compare individuals to each other. For example, individual “A” could be better than, as good as, or worse than individual “B”. The latter approach to evaluation is used in the IEAs implemented for comparison in this work. 
When using an EA to solve a problem, care is taken to choose an appropriate algorithm, operators, and parameter values. In most cases it is feasible to perform many runs, comparing different algorithm design options and parame­ter values to see which yield the best result. Such compar­isons are prohibitively dif.cult when working with IEAs because of the limitations placed by human evaluation. 
In an effort to make these comparisons, mathematical 
J. Mist et al. 
models of human evaluation, which we refer to here as vir­tual users, have been used in place of human participants when optimising aspects of IEAs. These virtual users are effectively EAs implemented with limitations that model those imposed by human evaluation. Virtual users were used in the early development of EFIT-V and EvoFIT to choose effective IGAs, set population sizes, mutation rates, and selection pressures [19, 11, 8, 9]. 
It is dif.cult to judge the usefulness of the virtual user approach as there is virtually no work evaluating design decisions at the parameter/operator level of algorithm de­sign that use human participants. An experiment conducted by Breukelaar et al. [2] used a colour matching task to compare the use of three .xed step size and one variable step size mutation parameters in an interactive evolution strategy. The work concluded that using variable step size enabled colour matches to be achieved quicker than using .xed step sizes. Oinuma et al. [18] compared four recombi­nation operators in a face beauti.cation task and concluded that a novel recombination method introduced in the pa­per performed better than existing recombination methods. These results were not con.rmed using statistical analysis and therefore it is not known whether the observed differ­ences were due to genuine differences between the opera­tors or if they were due noise in the data gathered. More robust testing of design decisions using human participants is required to gauge whether the comparison of parameter values and operators is useful or whether differences be­tween users generally renders any differences between the design options irrelevant. 
EFIT-V uses a face-space model determined by 60 PCs 
[21] whereas the number of PCs used in EvoFIT is harder to discern but [9] and [10] imply that the maximum possible number of PCs is used. The question of the optimal num­ber of PCs to use does not appear to have been addressed since the earliest work in the development of EFIT-V and EvoFIT. The imperfect nature of human face recognition implies that the number of dimensions used in holistic fa­cial composite systems could be reduced signi.cantly with­out any perceived loss in image accuracy. If the number of PCs to be used is reduced then the most obvious PCs to retain are those which account for the most statistical variation in the training set. These PCs may not necessar­ily, however, be those that account for the most perceptual variation. In this paper we ask if human evaluation should play a role in selecting those PCs that are used to create a face-space of reduced dimensionality. 
It is reasonable to expect that the difference between al­gorithms is more signi.cant than the difference between operators. Differential evolution (DE) is a relatively recent metaheuristic algorithm having been introduced by Storn and Price in 1997 [22]. Examples of applications for inter­active differential evolution (IDE) include forensic image segmentation [17] and optimising optical illusions [16]. 
Work on comparing IEAs is as scant as that for com­paring operators and parameter settings. Kurt et al. [13] compared a number of biologically inspired metaheuristic algorithms, including IDE and IGAs, for facial composite construction. It was found that IDE required fewer evalua­tions create a composite but the recognition rate of the IDE composites was lower than for the other algorithms. Lee and Cho [14] compared an IDE algorithm to an IGA and to a direct input manipulation method for an image enhance­ment task and found that participants generally favoured the IDE algorithm for usability. In neither of these experi­ments was a statistical comparison between the algorithms undertaken and so it is unknown whether these results are reliable. 

In this work we construct a 12-dimensional “human reduced” face-space using human evaluation of the dif­ferences between pairs of faces drawn from a larger 30­dimensional face-space. We then compare two mutation operators and two recombination operators in an IGA us­ing a task in which participants create facial composites from memory. In the third experiment the performance of searches using the human-reduced face-space, developed in the .rst experiment, is compared to that of the larger 30­dimensional face-space and a “mathematically reduced” 12-dimensional face-space using the same facial compos­ite task. In the .nal experiment, we compare an IGA to an IDE algorithm. 
2 Theory 
2.1 Face-space model 
A face-space model was constructed that captures the nat­ural variation of shape and texture (the shading and colour) of human faces. The training set of photographs used to build our face-space model consists of 27 male and 63 fe­male faces of various ages. The model building process starts with manually placing 190 land mark points on each photograph to delineate the key facial features at, for ex­ample, at the corners of the eyes, the bottom of the chin, and the outline of the eyebrows. The face shape of each subject in the training set is hence de.ned by a 380 dimen­sional vector containing the x-y coordinates of 190 land mark points. 
The face shapes are aligned, using the Procrustes method, and the mean face shape —s calculated. Princi­pal components analysis (PCA) is used to reduce the 380­dimensional shape model to a smaller number of dimen­sions. Any face shape s can then be approximated as s by the shape model using
 s = Psbs +—s (1) 

where Ps are the PCs of the shape model ordered from most important (the PCs which account for the most vari­ance in the data) to least important and bs is a vector of pa­rameters that determine how the shape PCs are combined to make the face shape. 
In order to create the texture model that encodes the image pixel values, each photograph in the training set is 
Informatica 39 (2015) 135–145 137 
partitioned using its land mark points and Delaunay tri­angulation. Piecewise af.ne transforms are used to warp each training image to the mean face shape thereby form­ing shape normalised texture patterns. PCA is then used to .nd a texture model of much fewer dimensions than the original pixel space of the normalised texture patterns. As with the face shapes, any face texture g may be approxi­mated using
g = Pgbg + g—. (2) 
where Pg are the PCs of the face texture ordered from the most important to least important and bs are parameters that determine how the texture PCs are combined to make the face texture. Finally, a face-space model is created from the combined shape and texture models using PCA to fur­ther reduce the number of dimensions. Thus, the appear­ance model parameters, c, of any face can be approximated as c using
    
wbs wPT ( s - —s)
s
 c = QT. QT(3)
bgPgT (g - g—)
where Q are the appearance PCs of the training set or­dered from the most important to the least important and w scales the shape parameters such that equal signi.cance is assigned to shape and texture. 
New faces can be generated by setting the values of an n-dimensional parameter vector c and performing the above process in reverse. Starting with the extraction of b 
n
n 
b = qici (4) 
i=1 
where qi is the i-th column of matrix Q in Equation 3. The shape and texture parameters bs and bg are extracted from b and are used in Equations 1 and 2 to .nd the shape parameters s and texture parameters g. The pixel intensi­ties in g are rearranged into a two-dimensional (or three-dimensional for colour images) array of pixels which then form an intermediate face image with mean face shape. As­pects of the edge of the face image which are due to the land marking process have a dominant unwarranted effect on the perception of the face. To counter this effect the gen­erated face texture is inserted and blended into a softened background. The resulting image is subsequently warped according to the shape parameters, s, to form the .nal face image. 
It is important to note that there are many features which cannot be reproduced using this method. Apart from obvi­ous highly distinctive features such as birthmarks and scars, more mundane high frequency features such as beards and hair cannot be effectively rendered. In commercial soft­ware these features are added separately using overlays and drawing packages. 
2.2 The interactive algorithms used 
The IEAs used in this work both used the same representa­tion for the genotypes: n-dimensional real valued vectors 

where n is the number of dimensions of the face-space be-from N(0, 1) and multiplied by the SD of the data on the i­
ing used. 	th PC. c
i 
has the further restriction that it is bounded by 

A larger population requires more processing time to a hyperrectangle which designates the edge of the face-
generate the composites and imposes a greater cognitive space, that is c. [-2.5, 2.5] SDs. This was done to re-
i 

burden on the user whereas a smaller population size means that a greater number of generations is required to achieve a satisfactory composite. EvoFIT uses a population size of 18, EFIT-V uses uses a population size of 9. We used a population size of 9 for both the IGA and IDE because this number of images could be displayed at a reasonable scale and also limits the cognitive demands placed on the user when comparing faces. 
The IGA used in this work is very similar to that devel­oped by Frowd [8]. Only three levels of .tness evaluation are allowed: preferred (best), selected, and not selected. Every generation exactly one individual is chosen as the preferred individual. This individual is carried unaltered into the following generation. Eight new individuals are needed to populate each generation. Each new individual has two parents and so a mating pool of sixteen individuals is required. 
Stochastic universal sampling (SUS) [1] is used to select the parents to go into the mating pool. In SUS a “wheel” bearing a super.cial similarity to a roulette wheel, is con­structed based on the .tness values of individuals in the previous generation. In the IGA used in this work, each selected individual is assigned an equal sized section of the wheel except for the preferred individual which is assigned a double sized wedge. To select the parents, a “spinner” comprising sixteen equally spaced arms is spun and for ev­ery arm that “comes to rest” on a particular section the in­dividual corresponding to that section is added once to the parent pool. 
Once the parent pool is .lled, individuals are drawn from the pool in pairs to undergo recombination to form new in­dividuals. Uniform crossover and arithmetic crossover re­combination operators are used in our experiments. In our implementation of uniform crossover there is equal chance that the offspring will inherit each gene from either parent. In our implementation of arithmetic crossover the value of each gene in an offspring is the mean of the values for that gene in the parents. 
After a new individual is created using recombination it undergoes mutation. We used Gaussian addition and Gaus­sian replacement mutation operators in our experiments. In 
Gaussian addition, the mutated gene value c
i 
is given by 
duce the likelihood of implausible faces or faces exhibiting image artefacts. The mutation probability is set by the mu­tation slider and is restricted to the range [0,pmax] where pmax =5/ (the dimensionality of the face-space). 
The IDE algorithm used is an adaptation of basic DE as presented by Price et al. [20]. In DE each member of the population is the main parent of exactly one offspring. This main parent is the target vector and the offspring is known as the trial vector. Three other parents are used to gen­erate each trial vector; the base vector and two difference vectors. Once the trial vectors have been generated each is compared to its target vector. If the trial vector is found to be .tter than its target vector then the trial vector takes the place of target vector in the population. 
The .rst step in creating a trial vector is to create a mu­tant vector according to 
xmutant = xbase + F (xdiff1 - xdiff2) (6) 
where xbase is the base vector, xdiff1 and xdiff2 are the dif­ference vectors, and F is the mutation scale factor which is usually constrained to the range (0, 1). The second step is to cross the mutant vector with the target vector to create the trial vector according to 
xi,mutant if ri < Cr 
xi,trial = 	(7)
xi,target otherwise 
where Cr is the crossover probability and ri is a random number drawn from a uniform distribution in the range (0, 1). To ensure that xtrial .= xtarget, if xtrial = xtarget one random position i in xtrial would be set such that xi,trial = xi,mutant. A virtual user was used to .nd optimal values of F and Cr for the IDE implemented in this work, as some values of F and Cr can lead to, for example, pre­mature convergence. The optimal values were found to be F =0.6 and Cr =0.5. Preliminary testing with human evaluation con.rmed that these values were suitable. 
The target, base, and difference vectors were chosen to be different members of the population. Each vector was used as the base vector exactly once per generation. The order for the base vectors was determined using the ran­dom permutation method. The difference vectors for each 
c
i 
trial vector were chosen at random from the population ex­
= ci + .i · m · ri (5) 
cluding the trial vector’s target and base vectors. 

where .i is the standard deviation (SD) of the data on the i-th PC, m is the mutation factor set by the user on the interface, and ri is a random number from the Gaussian distribution N(0, 1). Gaussian replacement is the name given in this paper to an analogous method to the uniform mutation operator. In uniform mutation, each gene ci in an offspring’s genotype will be replaced, with probability 
3 	Software for Experiments 2, 3, and 4 
We developed software using Matlab that generates faces 
from our face-space model using input values determined 

pm
, by a uniformly distributed random value c
i 
such that 

i . [Lower limit, Upper limit]. The Gaussian replacement operator is similar except that c
using IEAs. The IEAs were designed and built speci.cally 
for this work. 
i 
is a random number taken 


Figure 1: Screenshot of the interface for the IGA 
A screenshot of the interface developed for the IGA is given in Figure 1. For every generation the participant would choose, using the left mouse button, exactly one pre­ferred composite face that best resembled the target face they were trying to recreate. Additionally, if the partici­pant thought that any of the other faces were a good like­ness, they had the option of selecting these using the right mouse button. Anywhere from zero to eight faces could be selected in this way. A green border was placed around the face the participant preferred, a yellow border for those faces the participant thought were also good, and a black border for those faces that were not selected. Once they were satis.ed that they had selected the best match, and any other matches they considered to be good, the par­ticipant would go to the next generation by pressing the ‘Next’ button. The participant would repeat the process until they thought no further improvement was possible, at which point they would click on the ‘Finish’ button. 
A mutation slider was included so that participants could adjust the value of the mutation parameter. For the experi­ments reported in this paper, the mutation slider was decre­mented by 0.03 per generation by the software (the slider’s range was [0, 1]). A ‘Back’ button was included which enabled the participant to go back to the previous gener­ation and make alternative selections or adjust the muta­tion slider if they were not satis.ed with the current gener­ation. This design decision was based on comments from participants in earlier experiments who expressed a desire for such functionality when the population as a whole was worse than that of the previous generation. 
Screenshots of the interface developed for the IDE algo­rithm are given in Figures 2 and 3. In every generation the participant would look for a satisfactory match to the target face within the population. If a satisfactory face was ap­parent the participant could select it and click the ‘Finish’ button. If no such face was apparent they would click the ‘Next’ button to generate the trial vectors and their corre­sponding faces (Figure 2). The faces generated from the trial vectors would be compared to those generated from their target vectors on a pairwise basis (Figure 3). From each pair of faces, the participant was asked to click on the face which most closely resembled the target and then click on the ‘OK’ button. Once the participant had completed the nine pairwise comparisons the new population of individ­uals was presented to them. At this stage the participant could continue or .nish. The participant also had the op­tion of redoing the pairwise comparisons if they thought that the current population was generally worse than that of the previous generation by pressing the ‘Redo’ button. 


Figure 2: Screenshot of the main interface for the IDE algorithm 


Figure 3: Screenshot of the pairwise selection interface for the IDE algorithm 

(a) Faces generated at ±3 SDs on the 1-st PC (b) Faces generated at ±3 SDs on the 30-th PC Figure 4: The pairs of faces at ±3 SDs on the 1-st and 30-th PCs 
4 	Experiment 1: Identifying the most perceptually signi.cant PCs 
4.1 Method 
In the .rst experiment 32 participants performed a face sorting task to determine which 12 of the .rst 30 PCs, de­rived using PCA, are perceptually most signi.cant. Ac­cordingly, thirty pairs of faces were generated from the .rst 30 PCs. Each pair of faces was constructed from points at ±3 SDs along one of the PCs. If we form a ‘large’ 30-dimensional face-space in which a face’s representation is given by c =[c1,c2,...,ci,...,c30] then each pair of points (c+k, c-k) representing a pair of faces has the face-space coordinates 
±3 	SDs if i = k 
c±i = 	(8)
0 otherwise 

The pairs of faces from the 1-st and 30-th PCs are shown in Figure 4 
The faces were printed in their respective pairs on matt photographic paper. Each pair was 5.8 cm high by 10.2 cm wide. There are three reasons why the task was limited to 30 pairs of faces: 30 pairs of faces .t comfortably on a desk’s surface, the differences between each pair of faces becomes smaller for higher order PCs, and the dif.culty of the task increases with the number of pairs. 
At the start of the experiment the pairs of faces were ar­ranged randomly in a grid six pairs high by .ve pairs wide. The participants were instructed to group the 12 pairs of faces which “exhibited the most within pair dissimilarity”. Once the participants had done this they were instructed to sort the 12 pairs of faces from the most similar to the least similar. In preliminary testing, it was observed that the de­gree of dissimilarity between pairs of faces became very hard to discern beyond the 12 most dissimilar pairs. Con­sequently, 12 dimensions were used for the human reduced face-space. 
4.2 Results 
A pair of faces was awarded 12 marks when judged to be the most dissimilar by a participant. Similarly, the second most dissimilar pair was awarded 11 marks, the third 10 marks and so on until the 12 most dissimilar face pairs had been accounted for. The marks were summed over all of the participants to obtain the aggregated rank order of face pairs and hence the perceptual ordering of PCs. The 12 most perceptually signi.cant PCs were found, in order, to be 1, 2, 3, 5, 15, 7, 4, 14, 13, 6, 18, and 9. These are the PCs that were used to build the human reduced face-space. It can be seen that 8 of the 12 PCs in the human reduced face-space are in the .rst 12 PCs of the larger 30­dimensional face-space. 
5 	Experiment 2: Comparison of 
recombination and mutation 
operators 
5.1 Method 
In this experiment 15 participants were used to compare two recombination operators (uniform crossover and arith­metic crossover) and two mutation operators (Gaussian re­placement and Gaussian addition). 
The 12-dimensional human reduced face-space was used in this experiment. This face-space was chosen because it was thought that a face-space constructed using fewer di­mensions may lead to a face match more quickly than one constructed using many dimensions and thus induce less fatigue in the participants. It was not thought that choice of face-space would affect the relative performances of the different recombination and mutation operators. Testing each combination of recombination and mutation opera­tor required 2 × 2=4 runs per participant. Each par­ticipant also did a practice run at the start of the experi­ment in order to gain familiarity with the task and the in­terface. The initial population was the same for every run of the experiment and was designed to be roughly evenly distributed across the human reduced face-space. In an at­tempt to achieve this, K-means clustering was used. To generate the initial population, 1000 points were generated using a 12-dimensional uniform distribution with the limits being at ±2.5 SDs on each axis. The points were grouped into nine clusters using K-means clustering via Matlab’s kmeans function. The centroids of the nine clusters were used as the genotypes for the initial population of faces. 

At the start of each run the participants were given 10 seconds to study the target face which they then tried to recreate from memory using the IGA facial composite pro­cess. The target face was not shown to the participants again until the end of the run. The target faces were cho­sen to be equidistant from the centre of the human re­duced face-space. At the end of every run, participants were shown the composite they had just created and were asked to rate its similarity to the target on a scale from 1 to 10. Composites were then displayed side-by-side with their corresponding target faces and in each case the partic­ipant provided an additional similarity score. The purpose of the without target comparison was to gauge how well the composites matched the faces held in the minds of the participants; in reality witnesses would not have an image of the perpetrator to compare their composites to. The with target comparison was included as a slightly less subjective measure of how good the composites were. 
Three sets of objective data were gathered: the time taken to create the composites, the number of generations it took to create the composites, and the number of times the Back button was used. The time taken, and the num­ber of generations, were used as indicators of how quickly the participants were able to attain face matches. The use of the ‘Back’ button was recorded to provide an indication of how often the searches were producing a generation that was worse than the proceeding one. 
5.2 Results 
Table 1 comprises the means and standard deviations of the following measured variables: number of generations, time taken, number of times the Back button was used, partici­pant rating of their composite without reference to the tar­get, and participant rating of the their composite with ref­erence to the target. Each of the measured variables were subjected to aligned rank transform (ART) with two-way ANOVA [27]; having two mutation operators (Gaussian addition and Gaussian replacement) and two recombina­tion operators (uniform crossover and arithmetic crossover) (Table 2). The differences between the mutation operators, and the differences between the recombination operators, were not signi.cant for any of the measured variables. The interaction between the operators, that is the effect of using any particular mutation/recombination operator pair, was not signi.cant. 
6 	Experiment 3: Comparison of Face-Spaces 
6.1 Method 
In this experiment 21 participants were used to compare three face-spaces: a face-space constructed from the .rst 
J. Mist et al. 
30 PCs of the PCA analysis (the large face-space), a face-space constructed from the .rst 12 PCs (the mathematically reduced face-space), and a face-space constructed form the 12 most perceptually important PCs identi.ed in the .rst experiment (the human reduced face-space). 
As the results of the second experiment showed no sig­ni.cant difference between the operators on any of the recorded measures, arithmetic crossover and Gaussian ad­dition were arbitrarily chosen as the operators for this ex­periment. 
As there were only three test conditions (large face-space, human reduced face-space, and mathematically re­duced face-space) each participant performed two runs for each condition, equal to 2 × 3=6 runs in total. Each participant also performed an additional practice run at the start of the experiment. 
The initial populations for each of the face-spaces were generated using the same method as that used in Exper­iment 2. The target faces were chosen to be equidistant from the centre of the 30-dimensional face-space. They were also chosen such that they could not be represented exactly in the two 12-dimensional face-spaces. This was done to model the error in reconstruction associated with using a low-dimensional face-space. 
6.2 Results 
The measured variables were the same as those for Exper­iment 2. The means and standard deviations of the mea­sured variables for each of the face-spaces are presented in Table 3. 
Performing Friedman’s test on each of the measured variables showed that the differences between the face-spaces were not signi.cant for any of the measured vari­ables (number of generations: .2(2) = 2.11,p =0.349, number of times the ‘Back’ button was used: .2(2) = 0.54,p =0.765, time taken: .2(2) = 2.14,p =0.343, without comparison rating: .2(2) = 2.37,p =0.306, and with comparison rating: .2(2) = 0.71,p =0.700). 
7 	Experiment 4: Comparison of IGA and IDE 
7.1 Method 
In this experiment 22 participants were used to compare an IGA to an IDE algorithm. 
As the results of the second experiment showed no sig­ni.cant difference between the operators on any of the recorded measures, arithmetic crossover and Gaussian ad­dition were arbitrarily chosen as the operators used for this experiment. As the results of the third experiment showed no signi.cant difference between the face-spaces, the hu­man reduced face-space was used in this experiment. 

Table 1: Means (standard deviations) of the dependent variables in the comparison of mutation and recombination opera­tors in the creation of facial composites 
Mutation  Gauss. replacement  Gauss. replacement  Gauss. addition  Gauss. addition  
Recombination  uniform  arithmetic  uniform  arithmetic  
Generations  10.6 (5.10)  12.5 (8.64)  11.5 (4.73)  9.73 (2.49)  
Back count  0.73 (1.33)  0.47 (0.74)  0.87 (1.41)  0.47 (0.64)  
Time taken  195s (91.5s)  222s (155s)  220s (71.1s)  188s (66.2s)  
Without rating  6.27 (1.22)  5.47 (2.00)  6.07 (1.03)  6.07 (1.49)  
With rating  4.40 (2.10)  5.07 (2.19)  4.60 (2.41)  4.40 (2.32)  

Table 2: ART with two-way ANOVA of the dependent variables in the comparison of mutation and recombination opera­tors in the creation of facial composites 
Variable 

Generations 
Back count 
Time taken 
Without comparison rating 
With comparison rating 

Mutation  Recombination  Interaction  
F (1, 56)  p-value  F (1, 56)  p-value  F (1, 56)  p-value  
0.025  0.874  0.041  0.840  0.826  0.367  
0.153  0.670  0.368  0.547  0.055  0.816  
0.427  0.516  0.553  0.460  0.851  0.360  
0.132  0.718  0.510  0.478  0.771  0.384  
0.425  0.517  0.214  0.645  0.571  0.529  

Table 3: Means (standard deviations) of the dependent variables in the comparison of the large, human reduced and mathematically reduced face-spaces in the creation of facial composites 
Face-space 
Large 
Human reduced 
Mathematically reduced 

Without  With  
Generations  Back count  Time taken  target rating  target rating  
10.7  0.50  205s  5.81  4.10  
(4.73)  (0.55)  (80.3s)  (1.13)  (1.25)  
9.38  0.36  186s  6.02  3.95  
(4.31)  (0.42)  (91.8s)  (1.08)  (1.33)  
10.5  0.48  193s  5.86  4.12  
(4.75)  (0.56)  (85.6s)  (1.16)  (1.82)  

There were two test conditions (IGA and IDE) hence we had each participant perform two runs using each condi­tion, equal to 2 × 2=4 runs in total. Each participant also performed two practice runs at the start of the experiment, one for each of the IEAs. 
The initial populations were generated using the same method as that used in Experiment 2. The target faces were chosen to be equidistant from the centre of the human re­duced face-space. 
7.2 Results 
The measured variables were the same as those for Ex­periments 2 and 3 but the use of the IGA’s “Back” button was compared to the use of the IDE’s “Redo” button. The means and standard deviations of the measured variables for each of the algorithms are presented in Table 4. 
Performing exact calculations for Wilcoxon’s signed-rank test on the measured variables showed that the dif­ferences between the face-spaces were not signi.cant for any of the measured variables (number of generations: p =0.571, number of times the “Back”/“Redo” button was used: p =0.625, time taken: p =0.305, without com­parison rating: p =0.553, and with comparison rating: p =0.520). 
The participants were also asked which of the two IEAs they preferred as it was possible to differentiate between the IEAs because of the difference between the interfaces. The IGA was preferred by 6 of the 22 participants, 14 pre­ferred IDE and 2 stated no preference. Performing exact calculations for the sign test showed that this difference was not signi.cant: p =0.115. Those who preferred IDE often stated that they found it easier to compare two faces at a time than nine, which they found made the composite process easier. 
8 Conclusion 
A human reduced face-space for use with an IEA in the creation of facial composites was derived from a higher dimensional PCA based face-space. The performances of searches for faces in the human reduced face-space were compared to those of a mathematically reduced face-space and to the larger face-space. Searches performed using an IGA with two different mutation operators and two differ­ent recombination operators were compared. Searches per­formed using the IGA were compared to those performed using IDE. 
The prioritisation of the PCs with regards to human eval­uation was found to be similar to the numerical ordering returned by PCA itself. The human reduced face-space was found to share 8 of its 12 dimensions with the math­ematically reduced face-space. We note that our data set comprised images captured under conditions of controlled pose, lighting and facial expression. If this were not the case, one might expect greater differences between the per-
J. Mist et al. 
ceptual and numerical orderings of PCs. This is because users can .lter out variability due to lighting, pose, and camera angle; something that selecting the most signi.cant PCs mathematically does not account for. 
No signi.cant differences in the performances of the searches conducted using the different operators were de­tected, nor were any signi.cant differences found between the performances of the IEAs. The dif.culty and uncertain nature of creating a facial composite render any difference in the performances of the operators or the IEAs insigni.­cant. This observation calls into question the utility of us­ing virtual users or even testing with human users to aid in making algorithmic design decisions; and lends strength to the idea that it is safe to make these decisions based on the judgement of the people implementing an IEA. Our work also brings into doubt the validity of conclusions in prior work based on experiments with virtual users or where sta­tistical analysis has been omitted. 
No signi.cant differences in the performances of the searches conducted in the different face-spaces was ob­served. Again this is likely to be due to the imperfect na­ture of face recall and recognition. This result implies that it is possible to reduce the dimensionality of the face-space without any loss of performance. It also shows that using the mathematical ordering of the PCs is acceptable when truncating the face-space and it is unlikely to be necessary to make allowances for human perception. 
References 
[1] J. E. Baker (1987) Reducing bias and inef.ciency in the selection algorithms, Proceedings of the Second International Conference on Genetic Algorithms, pp. 14–21. 
[2] R. Breukelaar, M. Emmerich, T. Bäck (2006) On In­teractive Evolution Strategies, Lecture Notes in Com­puter Science, vol. 3907, pp. 530–541. 
[3] R. Brunelli, O. Mich (1996) SpotIt! 	an interactive identikit system, Graphical Models and Image Pro­cessing, vol. 58, no. 5, pp. 399–404. 
[4] F. Cluzel, B. Yannou, M. Dihlmann (2012) Using evo­lutionary design to interactively sketch car silhouettes and stimulate designer’s creativity, Engineering Ap­plications of Arti.cial Intelligence, vol. 25, no. 7, pp. 1413–1424. 
[5] T. F. Cootes, G. J. Edwards, C. J. Taylor (1998) Ac­tive appearance models, Proceedings of the European Conference on Computer Vision (ECCV), pp. 484– 
498. 
[6] G. Davies, D. Christie (1982) Face recall: An exami­nation of some factors limiting composite production accuracy, Journal of Applied Psychology, vol. 67, no. 1, pp. 103–109. 

Table 4: Means (standard deviations) of the dependent variables in the comparison of the IGA and IDE algorithm 
Back/Redo 	Without With 
Algorithm 
Generations count Time taken target rating target rating 
IGA 

5.05(2.50) 0.07(0.23) 150s(74.4s) 6.39(1.41) 5.00(1.74) 
IDE 

5.34(2.19) 0.14(0.32) 161s(55.3s) 6.55(1.21) 4.68(1.74) 
[7] C. D. Frowd, 	P. J. B. Hancock, EvoFIT, www. evofit.co.uk, Accessed 27/04/2015. 
[8] C. D. Frowd (2001) EvoFIT: A Holistic, Evolution­ary Facial Imaging System. PhD thesis, Department of Psychology, University of Stirling. 
[9] C. D. Frowd, P. 	J. B. Hancock, D. Carson (2004) EvoFIT: a holistic, evolutionary facial imaging tech­nique for creating composites, Transactions in Ap­plied Perception, vol. 1, no. 1, pp. 19–39. 

[10] C. D. Frowd, J. Park, A. McIntire, V. Bruce, M. Pitch-ford, S. Fields, M. Kenirons, P. J. Hancock (2008) Ef­fecting an improvement to the .tness function. How to evolve a more identi.able face, Proceedings of the ECSIS Symposium on Bio-inspired Learning and In­telligent Systems for Security, pp. 3–10. 
[11] S. J. Gibson, C. J. Solomon, A. Pallares Bejarano (2003) Synthesis of photographic quality facial com­posites using evolutionary algorithms, Proceedings of the British Machine Vision Conference, pp. 221–230. 
[12] J. H. Holland (1973) Genetic algorithms and the opti­mal allocation of trials, SIAM Journal on Computing, vol. 2, no. 2, pp. 88–105. 
[13] 
B. Kurt, A. S. Etaner-Uyar, T. Akbal, N. Demir, A. 

S. 
Kanlikilicer, M. C. Kus, F. H. Ulu (2006) Ac­tive appearance model-based facial composite gener­ation with interactive nature inspired heuristics, Lec­ture Notes in Computer Science, vol. 4105, pp. 183– 



190. 

[14] M.-C. Lee, S.-B. Cho (2012) Interactive differential evolution for image enhancement application in smart phone, Proceedings of the IEEE Congress on Evolu­tionary Computation (CEC), pp. 2411–2416. 
[15] J. J. Mist, 	S. J. Gibson (2013) Optimization of weighted vector directional .lters using an inter­active evolutionary algorithm, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1691–1694. 
[16] Z. S. Mohamad, A. Darvish, S. Rahnamayan (2011) Eye illusion enhancement using interactive differen­tial evolution, Proceedings of the IEEE Symposium on Differential Evolution, pp. 135-141. 
[17] H. Mushtaq, S. Rahnamayan, A. Siddiqi (2015) Color Separation in Forensic Image Processing Using Inter­active Differential Evolution, Journal of Forensic Sci­ences, vol. 60, no. 1, pp. 212–218. 
[18] J. Oinuma, K. Arakawa, H. Harashima (2014) Evalu­ation of genetic algorithm for interactive evolutionary face image beautifying system, Proceedings of the 6th International Symposium on Communications, Con­trol and Signal Processing, pp. 594-597. 
[19] A. Pallares-Bejarano (2006) Evolutionary Algorithms for Facial Composite Synthesis. PhD thesis, School of Physicial Sciences, University of Kent. 
[20] K. Price, R. M. Storn, J. A. Lampinen (2006) Differ­ential evolution: a practical approach to global opti­mization, Springer Science & Business Media. 
[21] C. J. Solomon, S. J. Gibson, J. J. Mist (2013) Interac­tive evolutionary generation of facial composites for locating suspects in criminal investigations, Applied Soft Computing, vol. 13, no. 7, pp. 3298–3306. 
[22] R. Storn, K. Price (1997) Differential evolution — a simple and ef.cient heuristic for global optimization over continuous spaces, Journal of global optimiza­tion, vol. 11, no. 4, pp. 341–359. 
[23] H. Takagi (2001) Interactive Evolutionary Computa­tion: Fusion of the Capabilities for EC Optimization and Human Evaluation, Proceedings of the IEEE, vol. 89, no. 9, pp. 1275–1296. 
[24] J. W. Tanaka, M. J. Farah (1993) Parts and wholes in face recognition, Quarterly Journal of Experimental Psychology, vol. 46A, pp. 225–245. 
[25] T. Valentine (1991) A uni.ed account of the effects of distinctiveness, inversion and race in face recognition, Quarterly Journal of Experimental Psychology, vol. 43A, pp. 161–204. 
[26] Visionmetric, EFIT-V, www.visionmetric.com, Accessed 27/04/2015. 
[27] J. O. Wobbrock, L. Findlater, D. Gergle, J. J. Higgins (2011) The aligned rank transform for nonparametric factorial analyses using only anova procedures, Pro­ceedings of the SIGCHI Conference on Human Fac­tors in Computing Systems, pp. 143–146. 
[28] D.-M. Yoon, K.-J. Kim (2012) Comparison of scor­ing methods for interactive evolutionary computation based image retouching system, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 617–618. 

Heuristics for Optimization of LED Spatial Light Distribution Model 
David Kaljun and Darja Rupnik Poklukar Faculty of Mechanical Engineering, University of Ljubljana, Aškerˇ
ceva 6, 1000 Ljubljana, Slovenia E-mail: david.kaljun@fs.uni-lj.si 
Janez Žerovnik Faculty of Mechanical Engineering, University of Ljubljana, Aškerˇceva 6, 1000 Ljubljana, Slovenia and Institute of Mathematics, Physics and Mechanics, Jadranska 19, Ljubljana, Slovenia E-mail: janez.zerovnik@fs.uni-lj.si 
Keywords: local search, iterative improvement, steepest descent, genetic algorithm, Wilcoxon test, least squares approx­imation 
Received: December 1, 2014 
Recent development of LED technology enabled production of lighting systems with nearly arbitrary light distributions. A nontrivial engineering task is to design a lighting system or a combination of luminaries for a given target light distribution. Here we use heuristics for solving a problem related to this engineering problem, restricted to symmetrical distributions. A genetic algorithm and several versions of local search heuristics are used. It is shown that practically useful approximations can be achieved with majority of the algorithms. Statistical tests are performed to compare various combinations of parameters of genetic algorithms, and the overall results of various heuristics on a realistic dataset. 
Povzetek: Napredek tehnologije LED je omogoˇ

cil izdelavo osvetljevalnih sistemov s skoraj poljubno po­razdelitvijo svetlobe. Netrivialna inženirska naloga je, kako naˇ
crtovati osvetljevalni sistem ali kombinacijo svetilk za dano ciljno porazdelitev svetlobe. V sestavku predstavljamo uporabo hevristiˇ
cnih algoritmov 

za reševanje te naloge, kjer predpostavljamo, da je porazdelitev svetlobe osno simetriˇcna. Izkaže se, da 
lahko dobimo praktiˇcno uporabne rešitve z algoritmi za lokalno optimizacijo, z genetskimi algoritmi in s 

hibridnimi algoritmi, ki povezujejo obe ideji. Za izbiro parametrov genetskih algoritmov in za primerjavo 
razliˇcnih algoritmov na izbranem vzorcu realnih podatkov so uporabljeni statistiˇcni testi. 
1 Introduction 
Even the most simply stated optimization problems such as the traveling salesman problem are known to be NP-hard, which roughly speaking means that there is no practical optimization algorithm provided the famous P.
=NP conjec­ture is correct [26]. From practical point of view, knowing that the problem is computationally intractable implies that we may use heuristic approaches. It is well known that best results are obtained when a special heuristics is designed and tuned for each particular problem. This means that the heuristics should be based on considerations of the partic­ular problem and perhaps also on properties of the most likely instances. On the other hand, it is useful to work within a framework of some (one or more) metaheuristcs which can be seen as a general strategies to attack an op­timization problem. Metaheuristics in contrast to heuris­tics often make fewer assumptions about the optimization problem being solved, and so they may be usable for a va­riety of problems, while heuristics are usually designed for particular problem or even particular type of problem in­stances. Compared to optimization algorithms, metaheuris­tics do not guarantee that a globally optimal solution can be found on some class of problems. We say that the heuris­tics search for so called near optimal solutions because in general we also have no approximation guarantee. Several books and survey papers have been published on the sub­ject, for example [25]. 
Most studies on metaheuristics are experimental, de­scribing empirical results based on computer experiments with the algorithms. As experiments provide only a sam­ple that may in addition be biased for a number of reasons, it is often hard to draw any .rm conclusions from the ex­perimental results, even when statistical analysis is applied (see, c.f. [4] and the references there). Some theoretical results are also available, often proving convergence of a particular algorithm or even only showing the possibility of .nding the global optimum. 
Perhaps the most natural and conceptually simple meta­heuristics is local search. In the search space of feasible solutions that is usually regarded as a “landscape”, the so­lutions with extremal values of the goal functions are to be found. In order to speak about local search on the land­scape, a topology is introduced, usually via de.nition of a neighborhood structure. It de.nes which feasible solutions can be obtained in “one step” from a given feasible solu­tion. It is essential that the operation is computationally cheap and that the new value of the goal function is pro­vided. There are two basic variants of the local search, iter­ative improvement and best neighbor (or steepest descent). As the names indicate, starting from initial feasible solu­tion, iterative improvement generates a random neighbor, and moves to the new solution based on the difference in goal function. The procedure stops when there has been no improvement for suf.ciently long time. On the other hand, best neighbor heuristics considers all neighbors and moves to the new solution with best value of the goal function. If there is no better neighbor, the current solution is clearly a local optima. Note that given a particular optimization problem, often many different neighborhood structures can be de.ned giving rise to different local search heuristics. Recently, there has been some work on the heuristics that use and switch among several neighborhoods [21]. 

In fact, most metaheuristics can be seen as variations or improvement of the local search [1]. Examples of popular metaheuristics that can be seen as variations of local search include iterated local search, simulated annealing [17], threshold accepting [7], tabu search [11], variable neigh­borhood search [21], and GRASP (Greedy Randomized Adaptive Search Procedure) [8]. The other type of search strategy has a learning component added to the search, aim­ing to improve the obvious drawback of the local search, complete lack of memory. (An exception is the tabu search that successfully introduces a short time memory.) Meta­heuristics motivated by idea of learning from past searches include ant colony optimization [6, 28, 10, 9, 19], evolu­tionary computation [3] and its special case, genetic algo­rithms, to name just a few. It is however a good question in each particular case whether learning does indeed mean an improvement [29], namely a successful heuristic search must have both enough intensi.cation and diversi.cation. 
Genetic algorithms (GA) are optimization and search techniques based on the natural evolution principles. The basic idea is to allow a population composed of many indi­viduals to evolve under speci.ed selection rules to a point where some of the population individuals reach or at least get close to the optimal solution. The method was devel­oped by John Holland, and popularized by one of his stu­dents, David Goldberg, who was able to solve a dif.cult problem involving the control of gas-pipeline transmission for his dissertation. Since the early days of GA, many ver­sions of evolutionary based algorithms have been tried with varying degrees of success. Nevertheless there are some advantages of GA worth noticing [12, 23]. GA is able to work with continuous or discrete variables, does not require derivative information, it simultaneously searches from a wide sampling of the cost surface, deals with a large num­ber of variables, is well suited for parallel computers, opti­mizes variables with extremely complex cost surfaces (they can jump out of a local minimum), provides a list of opti­mum variables not just a single solution and works well with numerically generated data, experimental data, or an­alytical functions. 
In this paper, a comprehensive experimental study of 
D. Kaljun et al. 
several heuristics on an industrial problem is carried out. It extends and upgrades previous published work on the sub­ject, in particular by introducing a statistically based com­parison of the algorithms. The results of algorithms are statistically tested in order to determine signi.cant differ­ences between them. Another extension of previous work is the genetic algorithm parameter tunning, presented bel­low. Previous related work is the following. The suitability of the model and practical applicability have been shown in [14]. Attempting to improve and speed up the optimiza­tion, different metaheuristics have been implemented and compared. The conference paper [13] reports results of a comparison of local search with a naive genetic algorithm. A hybrid genetic algorithm was proposed in [15]. 
Here we implement and run two versions of genetic algo­rithm, a standard genetic algorithm (SGA) and a hybrid ge­netic algorithm (HGA) where we infuse a short local search as an evolution rule in hope to enhance the population. As the initial experiment was run on various computers, and consequently the results on various computers slightly dif­fered because of different environments and in particular different random generators. The new experiment therefore repeated the complete experiment, this time on the same computer, a standard home PC with a Intel Core I7-4790K @ 4.4 GHz processor. The experiment was run in parallel on 6 threads. In addition, part of the code was rewritten to make it more machine independent. Furthermore, sta­tistical tests on the experimental results were applied thus providing ground for tuning the parameters of genetic al­gorithms and for comparison of various algorithms’ per­formance on the dataset considered. 
The rest of the paper is organized as follows. In the next section we provide background of the engineering applica­tion. Section 3 provides the analytical model and the opti­mization problem that is addressed. In Section 4, overview of the experimental study is given. Details of local search heuristics and genetic algorithms used are given in Sections 5 and 6. Section 7 elaborates tuning of parameters for the genetic algorithms. Main experiment, comparison of local search, standard and hybrid genetic algorithms is presented in Section 8. The paper ends with a summary of conclu­sions and ideas for future work, Section 9. 
2 	Motivation – the Engineering Problem 
The mass production of high power -high ef.cacy white Light Emitting Diodes (LEDs), introduced a revolution in the world of illumination. The LEDs at the basics enable lower energy consumption, never before seen design free­doms and of course endless possibilities on the design of optics systems. The latter in turn enables the optics de­signer to build a lighting system that delivers the light to the environment in a fully controlled fashion. The many possible designs lead to new problems of choosing the op­timal or at least near optimal design depending on possibly different goals such as optimization of energy consump­tion, production cost, and, last but not least, the light pol­lution of the environment. Nevertheless the primary goal or challenge of every luminaire design process is to design a luminaire with an ef.cient light engine. A light engine consists of a source, which are LEDs, and the appropriate secondary optics. The choice of the secondary optics is the key in developing a good system while working with LEDs. For designing such a system nowadays technology provides two options. The .rst option is to have the know-how and the resources to design a speci.c lens to accom­plish the task. However, the cost of resources coupled with the development and production of optical elements may be enormous. Therefore a lot of manufactures are using the second option, that is to use ready made of the shelf lenses. These lenses are produced by several specialized companies in the world that offer different types of lenses for all of the major brands of LEDs. The trick here is to choose the best combination of lenses to get the most ef­.cient system. The usual current practice in development process is a trial and error procedure, where the developer chooses a combination of lenses, and then simulates the system via Monte Carlo ray-tracing methods. The success heavily depends on the engineers’ intuition and experience but also needs sizable computation resources for checking the proposed design by simulation. In contrast to that, we believe that using analytical models and optimization tools may speed up the design and also at the same time possi­bly improve the quality of solutions. The .rst step towards this ambitious goal is to investigate an analytical model and its use for representing single ready made lenses. For this purpose we adopt an analytical model presented by Moreno and Sun [22] and use heuristic methods based on this model to provide good approximations. 



3 Analytical Model and Problem 
I (.; a, b, c)= Imax 
K
Kk=1 
Assume that we have measured values Im(.i) at angles 
ak * cos(. - bk)ck (1)
De.nition 


With so many different LED’s that have different beam 
patterns and many different secondary optics that can be 
placed over these LED’s to control the light distribution, 
.nding the right combination of a LED -lens combo is presumably a very complicated and challenging task. Con­sequently, providing a general analytical model for all of them is also likely to be a very challenging research prob­
.i, i =1, 2,...,N. The goodness of .t is, as usual, de­
.ned to be minimizing the root mean square error (RMS), 
or, formally [22, 24]: 

    K
N
N 
i=1 
1 RMS (a, b, c)=[Im
(.i) - I(.i, a, b, c)]
(2)lem. Here we therefore restrict attention to LED-lens com­

binations that have symmetrical spatial light distributions. In other words, the cross section of the surface which repre­sents the spatial distribution with a section plain that is co­incident with the vertical axis of the given coordinate sys­tem is alike at every azimuthal angle of offset. This yields an analytical model in two dimensions, so it describes a curve rather a surface. To produce the desired surface, we just revolve the given curve around the central vertical axis with the full azimuthal angle of 360. . 
In [14], a normalizing parameter Imax is introduced in 
For a suf.ciently accurate .t, the RMS value must be less than 5% [22, 24]. On the other hand, current standards and technology allow up to 2% noise in the measured data. Therefore, the target results of the .tting algorithms are at less than 5% RMS error, but at the same time there is no practical need for a solution with less than 1% or 2% RMS error. 
We will assume that all data is written in form of vec­tors v= (polar angle [.], intensity [I]). In reality, measured photometric data from the lens manufacturers is available in one of the two standard coded formats. These are the IESNA photometric digital format *.ies [27] used primar­ily in the USA and the European format EULUMDAT *.ldt [2]. The data in the two standard formats can easily be converted into a list of vectors. In addition, due to the pa­rameter Imax each dataset will be normalized during the preprocessing so that in each instance the maximal inten­sity of the vectors will be 1, and the normalizing value Imax is given as additional input value to the algorithms. 

The problem can formally be written as: 
INPUT: Imax and a list of vectors v= (polar angle [.], in­tensity [I]) 
TASK: Find parameters (a1, b1, c1, a2, b2, c2, a3, b3, c3) 
that minimize the RMS error (2). 

4 	Overview of the Experimental Study 
Although the minimization problem de.ned above is con­ceptually simple, it is on the other hand likely to be com­putational hard. In other words, it is a min square error approximation of a function for which no analytical solu­tion is known. 
The experiment was set-up to test the algorithms perfor­mance on different real life LED-lens combinations. 
We have chosen a set of real available lenses to be ap­proximated. The set was taken from the online catalogue of one of the biggest and most present manufacturer in the world Ledil Oy Finland [18]. The selection from the broad spectrum of lenses in the catalogue was based on the decision that the used LED is of the XP-E product line from the manufacturer Cree [5]. And the demand that the lenses have a symmetric spatial light distribution. We have preserved the lens product codes from the catalog, so the reader can .nd the lens by searching the catalog for the code from the .rst column in tables below, c.f. Table 1. 
All of the chosen lenses were approximated with all al­gorithms. To ensure that algorithms’ results could be com­pared the target error was set to 0% and the runtime was de­.ned in terms of basic steps that is de.ned as a generation of a feasible solutions in the local search and an adequate operation for genetic algorithms. This implies that the wall clock runtime was also roughly the same for all algorithms. Details are given below. 
In the experiment and in the study, we address the opti­mization problem as a discrete optimization problem. Nat­ural questions that may be asked here is why use heuris­tics and why discrete optimization heuristics on a continu­ous optimization problem. First, application of an approx­imation method is justi.ed because there is no analytical solution for best approximation of this type of functions. Moreover, in order to apply continuous optimization meth­ods such as the Newton method, usually we would need a 
D. Kaljun et al. 
good approximation in order to assure convergence. There­fore a method for .nding good starting solution before run­ning .ne approximation based on continuous optimization methods is needed. However, in view of the at least 2% noise in the data, these starting solutions may in many cases already be of suf.cient quality! Nevertheless, it may be of interest to compare the two approaches and their combina­tion in future work, although it is not of practical interest for the engineering problem regarded here. 
When considering the optimization problem as a dis­crete problem, the values of parameters to be esti­mated will be a. . [0, 0.001, 0.002,..., 1], b. . [-90, -89.9, -89.8,..., 90], and c. . [0, 1, 2,..., 100]. Hence, the discrete search space here consists of Nt = 1000i * 1800i * 100i ~ 5, 83 * 1024 tuples t = (a1,a2,a3,b1,b2,b3,c1,c2,c3). 
In the experiments, all the heuristics were tested on all instances of the dataset, a long run and a short run. The long run is de.ned to be 4 million steps that are de.ned to be equivalent of one iteration of a basic local search heuris­tics, in other words it is the number of feasible solutions generated by the iterative improvement. The time for other heuristics is estimated to be comparable, and will be ex­plained in detail later. Short runs are one million and two hundred thousand steps long and the long runs have four million steps. The long run CPU time per algorithm and lens was measured to be 16 minutes on the processor Intel Core I7-4790K @ 4.4 GHz and 16 GB of RAM. The code is not fully optimized. The overall runtime of the exper­iment was substantially lowered by use of parallelization. We ran the experiment on 6 of the 8 available CPU threads. 
5 	Local Search Heuristics 
First we discuss the speci.c local search type heuristics. As the original problem is a continuous optimization problem, compared to discrete optimization, there are even more possibilities to de.ne a neighborhood for the local search based heuristics. In fact, the neighborhoods we use can be seen as variable neighborhoods though they are all simi­lar. Below we de.ne two neighborhoods that were imple­mented. 
We have started our experiments with two basic local search algorithms, steepest descent (SD) and iterative im­provement (IF), where in both cases the neighborhoods were de.ned in the same way. We call this neighborhood .xed step size neighborhood. The third local search al­gorithm (IR) is iterative improvement using a second type of neighborhood with random step size. Roughly speak­ing, given a step size and direction as before, we randomly make a step in the direction chosen and the step is at most as long as in the .xed size neighborhood search. Of course, there may be other neighborhoods that would be worth con­sideration. The main reason for not extending the selection of neighborhoods is simply the fact that they already gave us results of suf.cient quality. The local search type heuris­tics used here are explained in more detail below. 

5.1 	Steepest Descent (SD) 
The steepest descent (SD) algorithm begins with the ini­tialization of the initial function parameter values that are 
a1 = a2 = a3 =0.5, b1 = b2 = b3 =0, and c1 = c2 = c3 =1. Next it initializes the search step values which are for da =0.01, for db =1 and for dc = Imax giving the 512 neighbors of the initial solu­
10 

tion: (a1 ± da, b1 ± db, c1 ± dc, a2 ± da, b2 ± db, c2 ± dc, a3 ± da, b3 ± db, c3 ± dc). If there are several neighbors with better RMS value, the search moves to the neighbor with minimal RMS value (if there are more minimal neigh­bors, one of them is chosen, all with the same probability). If none of the 512 is better than the current solution, a new set of neighboring solutions is generated, this time with a step size of dn+1 = dn + d0. This is repeated until n = 10. If there still is no better solution the search stops, the ini­tial step value is multiplied by 0.9 and the search resumes from the current solution with a smaller initial step. The algorithm stops when the number of generated solutions reaches Tmax. 
5.2 	Iterative Improvement – Fixed Neighborhood (IF) 
The iterative improvement with .xed neighborhood (IF) al­gorithm initializes the same neighborhood as SD. Instead of considering all 512 neighbors, the algorithm generates a neighbor randomly, and immediately moves to that neigh­bor if its RMS value is better than the current RMS value. If no better neighbor is found after 1000 trials, it is as­sumed that no better neighbor exists. As above, the al­gorithm changes the size of the step value and continues the search in the same manner as SD algorithm does. The algorithm stops when the number of generated solutions reaches Tmax. 
5.3 	Iterative Improvement – Variable Neighborhood (IR) 
The iterative improvement with a variable neighborhood (IR) algorithm begins as the previous two algorithms. It ini­tializes the same initial function parameter values but a dif­ferent neighborhood which has the search step value within a range, rather than a static .xed value. The ranges are for 
da1 = da2 = da3 = {-0.1, -0.099, -0.098,..., 0.1}, for db1 = db2 = db3 = {-9, -8.9, -8.8,..., 9} and dc1 = dc2 = dc3 = {-10, -9, -8, ..., 10} It begins generating solutions, using the step range around the initial solution and calculating their RMS error. As soon as it generates a better solution, it stops, shifts the focus on that solution, resets the step range to the initial value, and continues the search in the neighborhood of the new best solution. If after four hundred thousand generated solutions no better solu­tion is found, the step range gets doubled, and the search 
Informatica 39 (2015) 147–159 151 
Table 1: RMS error (best values) after 4 · 106 calculating operations 
Lens/Alg.  SD  IF  RAN  IR  
C13353  9.7572  4.9422  5.3896  9.2435  
CA11265  4.154  2.5374  3.722  4.9367  
CA11268  2.6058  2.4788  2.4984  4.0278  
CA11483  3.2673  3.3951  3.1944  3.5698  
CA11525  3.5799  1.0365  1.4805  2.8385  
CA11934  2.1729  1.4969  2.6169  3.5317  
CA12392  1.639  1.5905  1.9988  3.3103  
CA13013  1.7555  0.9042  1.2872  1.7656  
CP12632  4.576  4.3207  4.9078  6.7152  
CP12633  7.1202  2.936  2.7363  3.8963  
CP12634  5.7641  5.6363  6.1473  6.4242  
CP12636  3.1178  3.0801  3.9602  4.3642  
Median  3.4236  2.7367  2.9654  3.9621  

continues in the current neighborhood with a larger neigh­borhood. The stopping condition is the same as before. 
5.4 	Comparison of Local Search Heuristics 
To reduce the performance in.uence of the initial solution we .xed it on all of the local search heuristics which be­gan from the same initial solution that had the parameters set to a1 = a2 = a3 =0.5, b1 = b2 = b3 =0, and c1 = c2 = c3 =1. As the number of steps the local search heuristics need to .nd a local optima can vary heav­ily, it is natural to run a multi start version. As the local search runs sometimes improve the solutions in later it­erations and because some preliminary experiments with multi start versions of the local search algorithms did not show any obvious advantage, we do not consider the mul­tistart version here. However, the trivial random search al­gorithm (RAN) is included in the comparison of the local search algorithms. RAN algorithm is essentially a random solution generator that has only one simple rule. The rule is the boundary de.nition of the search space, so that the so­lutions generated stay inside the search space limits, hence RAN resembles a pure guessing exercise and any meaning­ful algorithm has to outperform RAN. 
Table 1 and Table 2 provide best found solutions for dif­ferent local search algorithms. Best two solutions are writ­ten in bold. 
We can observe that most of the algorithms .nd a good (RMS<5%) solution on almost all instances on both the long and short runs. Obviously, the majority of IF results are among the best, but also SD and RAN perform very good on some instances. Therefore we further compare the algorithms using a statistical test. We test the null hypoth­esis H0: The median of differences between results of algo­rithms equals 0. The test used is a non-parametric related samples Wilcoxon signed rank test, see [30], and the sig­ni.cance level is 0.05. This means that if asymptotic sig­ni.cance is less or equal to 0.05, the H0 is rejected (there is a signi.cant difference between algorithms). Note that Table 2: RMS error (best values) after 1.2 · 106 calculating operations 

Lens/Alg.  SD  IF  RAN  IR  
C13353  9.7572  5.0976  5.3896  10.137  
CA11265  4.154  2.5389  3.9455  6.4726  
CA11268  2.6058  2.4797  2.4984  4.0278  
CA11483  3.2673  3.4077  3.6977  4.3763  
CA11525  3.5799  1.0381  1.4805  4.4415  
CA11934  2.1729  1.5547  2.6169  3.5317  
CA12392  1.639  1.5924  1.9988  3.3103  
CA13013  1.7555  0.9043  1.2872  2.7156  
CP12632  4.576  4.3292  4.9078  6.7152  
CP12633  7.1202  2.9366  2.7363  4.2827  
CP12634  5.7641  5.638  6.1473  6.4713  
CP12636  3.1178  3.0801  3.9602  5.2287  
Median  3.4236  2.7377  3.217  4.4089  

Table 3: Asymptotic signi.cances of Wilcoxon signed rank test for results of local search heuristics at 4·106 calculating operations 
SD  IF  RAN  IR  
SD  0.007  0.388  0.136  
IF  0.008  0.002  
RAN  0.002  

the same test will be repeated on the other versions of al­gorithms further down the text. 
Tables 3 and 4 con.rm that IF signi.cantly outperforms the other algorithms on the dataset. Also we can observe that the null hypothesis could not be rejected between RAN and SD. However we can see a signi.cant deviation in the result of the IR algorithm which is the worst of the algo­rithms having the median value of Md =3, 9621. 
We conclude that in both the long and short runs algo­rithm IF prevails. Therefore IF will be our choice when infusing local search in the hybrid genetic algorithm. 
6 Genetic Algorithms 
The search for a more advanced heuristic method resulted in a very large pool of promising alternatives such as parti­cle swarm optimization [19], .re.y algorithm [28, 10], bat algorithm [9] and of course the well known genetic algo­rithms to name just a few of them. In this experimental 
Table 4: Asymptotic signi.cances of Wilcoxon signed rank test for results of local search heuristics at 1.2 · 106 calcu­lating operations 
SD  IF  RAN  IR  
SD  0.008  0.695  0.034  
IF  0.004  0.002  
RAN  0.002  

D. Kaljun et al. 
study we use a standard genetic algorithm (SGA) [23] and a hybrid genetic algorithm (HGA) [15] that in fact mimics the evolutionary behavior [12, 20, 23], but is enhanced at every generation with the use of a local search algorithm. Encouraging preliminary results with HGA are reported in [15]. We wish to note that in the conference paper [13] lo­cal search based heuristics were compared to another ver­sion of genetic algorithm. As the algorithm in [13] used nonstandard genetic operators, it has been argued that the results are not very useful, and hence we decided to do an­other comparison including SGA. We wish to note that the experimental results given in this paper may slightly dif­fer to preliminary reports [13, 15] because the preliminary results were performed on various computers, and the new experiment reported here is completely rerun on the same machine. Also, parts of the code were rewritten in order to be more computer and system independent. 
6.1 Standard Genetic Algorithm (SGA) 
In our genetic algorithms we use three genetic operators: selection, cross-breading and mutation. The selection [12] operator works as a kind of a .lter where more .tter in­dividuals in a population get to have higher weights as the less .tter. This is then transmitted to the cross-breading op­erator in the way that the individuals with higher weights are more likely to be chosen as parents. 
The cross-breading or crossover operator [12, 20, 23] is where a population is created by generating new solutions. These are created by randomly combining and crossing pa­rameters from two randomly chosen parent solutions from the current population. The crossing is done via cross point so that every parent pair produces a pair of children. The cross point is chosen randomly and the children are gener­
[P bCP 
ated in the following sequence C1 = , CP, P aCP ]
12 
and C2 =[P bCP , CP, P aCP ], where Cn is the child being 
21 
generated, CP is the cross point parameter, P aCP are all 
n of the parents parameters that are after the CP and P bCP 
n 
are all of the parents parameters that are before the CP . 
The last operator in every generation is the self adapt­ing mutation[23] operator which .nalizes the individuals in the new population. The mutation operates in the fol­lowing manner: in the randomly chosen individual, a ran­dom number of parameters are chosen to be changed (mu­tated) which is done by adding a randomly chosen value for da1 = da2 = da3 = {-0.01, -0.009, -0.008,..., 0.01}, for db1 = db2 = db3 = {-0.25, -0.24, -0.23,..., 0.25}and dc1 = dc2 = dc3 = {-2.5, -2.4, -2.3,..., 2.5} to the current parameter value. 
The whole algorithm then begins with the generation and calculation of the initial population (the zero population). Next it sorts the population entities from the .ttest to the least .t and applies weights to them. After the sorting pro­cess the algorithm generates with the crossover operator the next generation, which is then submitted to mutation with the adaptive mutation operator. When the new gener­ation is fully formed the algorithm begins the process from the point of selection. It continues to do so until the last generation is .nalized. In order to assure comparable run­ning times, the number of generations to be generated is calculated as the quotient of the maximal number of iter­ations minus the population size and the population size NG =(Tmax - NP )/NP . Where NG stands for number of generations, Tmax for the total number of iterations and NP for the number of individuals in each generation (pop­ulation size). 

6.2 Hybrid Genetic Algorithm (HGA) 
To test our theory of an advanced genetic algorithm we al­tered the standard genetic algorithm in a way that we in­fused a local optimization as an operator in every genera­tion. We call the modi.ed algorithm hybrid genetic algo­rithm (HGA). The hybrid genetic algorithm works in the same way as the standard one but with an extra operator before the crossover. It starts with generating the initial so­lution and sorts the entities in the current solution from the .ttest to least .t. Then instead of directly cross breading the new generation it .rst runs the iterative improvement with .xed neighborhood algorithm on 10 best entities of the current generation which in turn get locally optimized (en­hanced) for a number of iterations. After that the HGA fol­lows the same path as the standard genetic algorithm does. For the number of generations to be executed on HGA algo­rithm, the formula is a bit more complicated, because it has to include the iterations of the local search. The formula can be written as NG =(Tmax - NP )/(NP + 10 * Nlo). Where the additional parameter Nlo stands for number of local search iterations. The result has to be rounded, be­cause the algorithm cannot stop in the middle of a gener­ation evaluation. For example if you would calculate the number of generations for the HGA13 with the above for­mula you would get 9.972 which gets rounded to 10. This is the reason for the minor deviation (divmax =2, 5%) of the overall Tmax on the HGA algorithms. 
7 	Parameters of the Genetic Algorithms 
In order to enable fair comparison among various heuris­tics, the same runtime was given to all competitors. As the wallclock runtime can depend heavily on particular im­plementation, we measure runtime in so called basic time steps. One step of local search algorithm is naturally de­.ned as a generation (and handling) of one feasible solu­tion. For the genetic algorithms, time needed for the basic operations is estimated in terms of local search basic steps. This is explained in detail in the .rst subsection. Genetic algorithms are divided into four groups depending on the time allowed for local search improvement of the members of population. 
We .x the length of the local search runs and then look for most suitable parameters of the particular HGA version. 
Informatica 39 (2015) 147–159 153 
Table 5: Parameter combinations for SGA* 
Algorithm  # pop.  # gen.  # LS iter.  
SGA 1  1000  3999  NA  
SGA 2  5000  799  NA  
SGA 3  10000  399  NA  
SGA 4  50000  79  NA  
SGA 5  100000  39  NA  

Table 6: Parameter combinations for HGA*1 
Algorithm  # pop.  # gen.  # LS iter.  
HGA 1 1  1000  40  10000  
HGA 2 1  5000  38  10000  
HGA 3 1  10000  36  10000  
HGA 4 1  50000  26  10000  
HGA 5 1  100000  20  10000  

This gives rise to four groups of algorithms: SGA, HGA*1, HGA*2, and HGA*3. Tables 5, 6, 7, and 8 give different parameter combinations for the genetic algorithms. 
Tuning of other parameters of genetic algorithms is ex­plained in detail below. 
7.1 Runtime 
To be able to compare the genetic algorithm performance to the local search algorithms we locked the total amount of computation iterations (one computation iteration in our case is the evaluation of the RMS error at the given coef­.cient values) on the genetic algorithms to four million on the long run and 1.2 million on the short runs, as it was in the local search algorithms. We then chose different pop­ulation sizes and calculated the number of generation and local search iterations needed to achieve the desired four million calculation iterations as close as possible (minor deviations can occur due to the restriction that we are al­ways evaluating a whole generation). 
7.2 Parameter Tuning 
In order to perform the .nal experiment we .rst have to choose the algorithms that would be competing in the ex­periment. As it would be unfeasible to compare all of the possible variations of the multi start genetic algorithms, we formed four groups, as presented in the previous section. We applied the statistical test on these groups and accord-
Table 7: Parameter combinations for HGA*2 
Algorithm  # pop.  # gen.  # LS iter.  
HGA 1 2  1000  20  20000  
HGA 2 2  5000  19  20000  
HGA 3 2  10000  19  20000  
HGA 4 2  50000  16  20000  
HGA 5 2  100000  13  20000  


Table 8: Parameter combinations for HGA*3 Table 10: RMS error (best values) after 1.2·106 calculating 
operations for SGA* 

Algorithm # pop. # gen. # LS iter. 
HGA 1 3 1000 10 40000 	Lens/Alg. SGA1 SGA2 SGA3 SGA4 SGA5 
HGA 2 3 5000 10 40000 	C13353 9.3526 8.0242 9.3895 6.3401 5.7489 
HGA 3 3 10000 10 40000 	CA11265 5.2983 5.2999 3.8483 3.3367 3.933 
HGA 4 3 50000 9 40000 	CA11268 4.5748 3.3628 2.7874 2.7845 2.8694 
HGA 5 3 100000 8 40000 	CA11483 3.8848 3.8262 3.8775 3.6547 3.6344 CA11525 4.0658 1.6667 2.2655 2.0129 2.0875 CA11934 2.5642 3.3076 3.3842 2.4779 2.8535
Table 9: RMS error (best values) after 4 · 106 calculating 
CA12392 3.493 2.5458 2.376 2.3139 2.5325

operations for SGA* 
CA13013 2.9739 1.1658 1.4496 1.2493 1.4786 CP12632 4.3657 4.5959 5.9514 4.4332 4.807

Lens/Alg. SGA1 SGA2 SGA3 SGA4 SGA5 CP12633 4.447 3.3542 2.855 2.6167 2.5485
C13353 9.3526 8.0242 9.3895 4.8163 4.848 CP12634 5.7747 5.7038 5.6663 5,7493 5.7481
CA11265 5.2983 5.2999 3.8483 3.0568 2.8673 CP12636 4.2818 4.1577 3.8485 3,5941 3.8838
CA11268 4.5748 3.3628 2.7874 2.7845 2.5248 Median 4.3238 3.5945 3.6163 3.0606 3.2519
CA11483 3.8848 3.8262 3.8775 3.6547 3.5613 
CA11525 4.0658 1.6667 2.2655 2.0129 1.2501 Table 11: Asymptotic signi.cances of Wilcoxon signed 
CA11934 2.5642 3.3076 3.3842 1.8616 2.1933 CA12392 3.493 2.5458 2.376 2.3139 2.4982 rank test for results of SGA* at 4 · 106 calculating oper­
ations

CA13013 2.9739 1.1658 1.4496 1.2332 1.128 CP12632 4.3657 4.5959 5.9514 4.4332 4.4827 
SGA*CP12633 4.447 3.3542 2.855 2.5418 2.5485 

1CP12634 5.7747 5.7038 5.6663 5.7493 5.6712 

2CP12636 4.2818 4.1577 3.8485 3.5941 3.491 

3Median 4.3238 3.5945 3.6163 2.9207 2.7079 

4 

ing to the results chose the one that would advance into the 
7.5 

.nal experiment. 
12 3 4 5 
0.034 0.071 0.004 0.004 
0.937 0.019 0.002 
0.005 0.005 
0.272 
HGA*2 Test 

In the HGA*2 group the HGA12 signi.cantly differs from the rest of the group in both runs, with the worst median of Md =3.1785. The rest of the algorithms perform pretty 

7.3 SGA* Test 
much the same, so there is no visible difference between them in the long run and a slight inconclusive difference in When observing the test results presented in Table 11 and the short run. Therefore we once again chose the algorithm Table 12 we see that SGA1 in both the long and short runs to be advanced to the .nal experiment based on the minimal statistically differs from the other four algorithms. It also median value. The HGA42 algorithm has the best overall has the worst median where Md =4.3238. We could not median value of Md =2.7805 and consequently is the one reject the null hypothesis in the case of SGA2 and SGA3 on which represents this group in the .nal experiment. 

the long run and on the short run between SGA2, SGA3 and SGA5. The long run shows us shared leadership between SGA4 and SGA5, but in the long run the SGA5 prevailes 7.6 HGA*3 Test with the overall best median of Md =2.7079. This leads The last group’s results are similar to the previous two. On us to ultimately choose the SGA5 as the representative of the long run the HGA13 algorithm shows no signi.cant dif­the standard genetic algorithms in the .nal experiment. 
ference from the others. There is also no signi.cant statis-
Table 12: Asymptotic signi.cances of Wilcoxon signed 

7.4 HGA*1 Test 
rank test for results of SGA* at 1.2 · 106 calculating op­erationsThe statistical results show that there in no signi.cant sta­

tistical difference between the HGA*1 algorithms (the null 
hypothesis could not be rejected). Because of that we have 	SGA* 1
to take a look at the median values. In both runs HGA41 had the best median value which was a bit lower on the long 2 run Md =2.5840. Hence we chose the HGA41 algorithm 3 from this group to advance in the .nal experiment. 4 Table 15: Asymptotic signi.cances of Wilcoxon signed rank test for results of HGA*1 at 4 · 106 calculating op­erations 
1  2  3  4  5  
0.034  0.071  0.004  0.015  
0.937  0.019  0.117  
0.005  0.158  
0.084  


1 0.060 0.347 0.136 0.239  
Table 13:  RMS erro r (best va lues) after  4 · 106 calculating  2 0.117 0.347 0.814  
operations  for  HGA*1  3 0.099 0.695  
Lens/Alg.  HGA11  HGA21  HGA31  HGA41  HGA51  4 0.117  
C13353 CA11265 CA11268  5.9871 2.9985 2.5578  3.8263 2.8531 2.3826  3.8339 3.025 2.5418  3.5616 2.824 2.5907  5.4869 2.8131 2.3417  Table 16: Asymptotic signi.cances of Wilcoxon signed rank test for results of HGA*1 at 1.2 · 106 calculating op­erations  
CA11483  3.1709  3.1571  3.2288  3.2077  3.6024  
CA11525  1.4021  1.3965  1.5203  1.5096  1.4318  HGA*1 1 2 3 4 5  
CA11934  3.0945  2.1895  2.5715  1.8058  1.9126  1 0.239 0.583 0.158 0.099  
CA12392  2.3158  2.365  2.3345  2.038  2.0077  2 0.308 0.136 0.48  
CA13013  1.3588  1.0952  1.2681  0.9672  1.1257  3 0.041 0.239  
CP12632  4.38  4.3803  4.3495  4.3852  4.402  4 0.583  
CP12633  3.1962  2.5102  2.8362  2.5773  2.6248  
CP12634  5.7581  5.6829  5.6919  5.7342  5.7757  

HGA*1 
12 3 4 5 

CP12636  2.288  3.0882  2.3877  2.5551  2.8276  tical difference between HGA23 and HAGA53. The best  
Median  3.0465  2.6816  2.7038  2.5840  2.7189  two on the long run are HGA33 and HGA43, comparing  
the medians. The short run results con.rm the picture we  
get from the long run. The HGA13 and HGA23 do not sig­ 
ni.cantly differ, and are the worst in the group. Also there  
is no signi.cant difference between HGA33, HGA43 and  
HGA53. As above we choose the winner based on the me­ 
dian values. The best median value result was obtained by  
the HGA43 algorithm Md = 2.6589.  
8  Final Experiment  
Based on statistical tests on four groups of genetic algo- 
Table 14: RMS error (best values) after 1.2·106 calculating  rithms we acquired four winning algorithms, with seem- 
operations for HGA*1  ingly best tuned parameters inside each group.  The .nal  
experiment will compare those four algorithms with the  
Lens/Alg.  HGA11  HGA21  HGA31  HGA41  HGA51  best local search algorithm IF. Table 25 show the lowest  
C13353  6.1767  4.3002  4.3135  3.8082  5.4869  
CA11265  3.2252  3.3075  3.4191  3.1401  2.8131  
CA11268  2.5578  2.3826  2.5418  2.6582  2.3417  Table 17: RMS error (best values) after 4 · 106 calculating  
CA11483  3.2573  3.4174  3.3694  3.2284  3.6024  operations for HGA*2  
CA11525  1.4021  1.4084  1.5203  1.5822  1.4318  
CA11934  3.0945  2.4831  3.0026  1.8058  1.9126  Lens/Alg.  HGA12  HGA22  HGA32  HGA42  HGA52  
CA12392  2.4432  2.365  2.3345  2.038  2.0077  C13353  4.7054  4.2986  4.1253  3.3723  3.4518  
CA13013  1.5113  1.3467  1.4484  1.1819  1.1989  CA11265  3.796  2.9932  2.9978  3.0308  3.0097  
CP12632  4.5072  4.5397  4.3849  4.3852  4.5023  CA11268  2.5909  2.5584  2.4421  2.6774  2.5356  
CP12633  3.2601  2.6138  3.0894  2.5944  2.8204  CA11483  3.2911  3.2074  3.1022  3.3761  3.1906  
CP12634  5.8198  5.7005  5.6919  5.9876  5.7757  CA11525  1.6185  1.4497  1.5384  1.356  1.5003  
CP12636  2.288  3.0882  2.9039  2.5551  2.8276  CA11934  3.0659  2.7858  2.9067  2.5005  2.2335  
Median  3.1599  2.8509  3.0459  2.6262  2.8167  CA12392  2.3725  2.0664  2.3302  2.2183  2.4562  
CA13013  1.1562  1.2387  0.9072  1.0672  1.1936  
CP12632  4.4391  4.5954  5.1023  4.4996  4.3441  
CP12633  2.7446  2.5122  2.6415  2.5266  2.576  
CP12634  5.7346  5.7481  5.7068  5.7009  5.8207  
CP12636  4.2152  3.1483  3.2091  2.8835  4.1649  
Median  3.1784  2.8895  2.9522  2.7804  2.7929  

156 Informatica 39 (2015) 147–159  D. Kaljun et al.  
Table 18: RMS error (best values) after 1.2·106 calculating  Table 21: RMS error (best values) after 4 · 106 calculating  
operations for HGA*2  operations for HGA*3  
Lens/Alg. HGA12 HGA22 HGA32 HGA42 HGA52  Lens/Alg. HGA13 HGA23 HGA33 HGA43 HGA53  
C13353 5.3734 4.4111 4.1253 3.3723 3.7201  C13353 4.7929 7.4605 4.1179 3.673 3.3926  
CA11265 4.061 2.9932 3.3168 3.1738 3.2775  CA11265 3.5219 2.8982 3.2986 3.3426 3.4874  
CA11268 2.7644 2.5751 2.4421 2.7098 2.5356  CA11268 2.4712 2.6368 2.6012 2.4733 2.5077  
CA11483 3.7451 3.2304 3.7034 3.3761 3.2599  CA11483 3.1692 3.9305 3.3611 3.5551 3.3291  
CA11525 1.6185 1.5703 1.5384 1.356 1.5177  CA11525 1.924 1.7202 1.3482 1.6992 1.8313  
CA11934 3.3468 2.7858 3.0841 3.0173 3.1701  CA11934 2.7798 3.1422 2.9557 2.1928 3.1123  
CA12392 2.4229 2.1467 2.4255 2.2183 2.4562  CA12392 2.4091 2.379 2.3175 1.9166 2.3143  
CA13013 1.1562 1.2387 1.1321 1.1125 1.1936  CA13013 1.4897 1.2844 1.1117 1.031 1.7124  
CP12632 4.4624 4.5954 5.3015 4.6303 4.356  CP12632 4.5923 4.8078 4.4902 4.424 4.5191  
CP12633 2.7446 2.5122 2.8733 2.5266 2.588  CP12633 2.7836 2.6338 2.5233 2.4582 2.7832  
CP12634 5.7346 5.7481 5.7068 5.7586 5.8207  CP12634 5.806 5.7097 5.8002 5.8213 5.9054  
CP12636 4.2152 3.4343 3.2091 2.8969 4.2121  CP12636 2.617 4.2187 3.3837 2.8446 3.2144  
Median 3.5459 2.8895 3.1466 2.9571 3.2149  Median 2.7817 3.0202 3.1271 2.6589 3.1633  
Table 19: Asymptotic signi.cances of Wilcoxon signed  Table 22: RMS error (best values) after 1.2·106 calculating  
rank test for results of HGA*2 at 4 · 106 calculating op­ operations for HGA*3  
erations  
Lens/Alg. HGA13 HGA23 HGA33 HGA43 HGA53  
HGA*2 1 2 3 4 5  C13353 5.6511 7.8509 5.0076 4.6041 3.3926  
1 0.019 0.023 0.019 0.023  CA11265 3.5219 3.4391 3.3527 3.5108 3.4874  
2 0.638 0.239 0.937  CA11268 2.7614 2.761 2.6012 2.4733 2.5077  
3 0.182 0.754  CA11483 3.48 3.9305 3.4179 3.8551 3.3291  
4 0.754  CA11525 2.1053 1.773 1.6147 1.6992 1.8313  
CA11934 3.0145 3.4084 2.9764 2.4433 3.1123  
CA12392 2.5199 2.4228 2.6216 1.9166 2.3143  
RMS errors at four million calculating iterations for ev­ CA13013 1.7247 1.5402 1.1117 1.031 1.7124  
ery competing algorithm. Table 26 show the lowest RMS  CP12632 4.7738 4.8078 4.4902 4.7755 4.5191  
errors at bout one million calculating iterations for every  CP12633 3.5175 2.6338 2.5233 2.4582 2.7832  
competing algorithm. The asymptotic signi.cances of re­ CP12634 5.9669 5.7097 5.8002 5.8213 5.9054  
lated samples Wilcoxon signed rank test for results are  CP12636 2.6872 4.2187 3.4892 3.2777 3.2144  
shown in Table 27 and Table 28.  Median 3.2472 3.4237 3.1646 2.8755 3.1634  
We can see a signi.cant deviation in the result of the  
SGA5 algorithm which is the worst of the algorithms hav­ 
ing the median value of Md = 3.2519 on the short run.  the algorithms HGA41 and IF which share the leadership.  
In short runs SGA5 signi.cantly differs to all other algo- Finally, as the HGA41 has a slightly lower median value  
rithms, hence is clearly the worst among the competitors.  Md = 2, 6263 than the IF, the overall winner of the exper- 
However, there is no signi.cant difference between algo­ iment is the HGA41 algorithm. (Note however that the last  
rithms HGA41, HGA42, HGA43 and IF. Simply counting  conclusion is not con.rmed with statistical test.) Conver­ 
the number of emphasized results (best two on particular  gence of the best two algorithms on an instance is shown  
instance) favourizes IF (8 + 8 = 16), followed by HGA41  on Figure 2.  
(6 + 7 = 13). The other two only have eight emphasized  
results: HGA42 (3 + 5 = 8) and HGA43 (4 + 4 = 8). So we conclude that on the dataset used here the best are  9 Conclusions  
Table 20: Asymptotic signi.cances of Wilcoxon signed rank test for results of HGA*2 at 1.2 · 106 calculating op­erations  An experimental comparison of several heuristics on an en­gineering problem has been carried out. Among several local search heuristics, a version of iterative improvement on a suitably de.ned neighborhood was chosen, based on  
HGA*2 1 2 3 4 5 1 0.015 0.084 0.008 0.019 2 0.182 0.875 0.347 3 0.019 0.814 4 0.209  statistical test. A standard genetic algorithm, and three ver­sions of hybrid genetic algorithms, in which members of population were improved by short runs of local search were considered. Parameters of the algorithms were tuned by running the algorithms on the dataset with several ver­sions of parameters and the best combination of parame­ 


Figure 2: The convergence curves of the winning algorithms approximating the C13353 lens. 
Table 23: Asymptotic signi.cances of Wilcoxon signed rank test for results of HGA*3 at 4 · 106 calculating op­erations 
HGA*3  1  2  3  4  5  
1  0.308  0.347  0.084  0.48  
2  0.034  0.023  0.53  
3  0.099  0.347  
4  0.041  

Table 24: Asymptotic signi.cances of Wilcoxon signed rank test for results of HGA*3 at 1.2 · 106 calculating op­erations 
HGA*3  1  2  3  4  5  
1  0.814  0.05  0.05  0.05  
2  0.015  0.012  0.099  
3  0.433  0.814  
4  0.53  

ters was selected. It may be interesting to observe that in hybrid genetic algorithms, versions with the shortest lo­cal searches were selected in all cases, meaning there is substantial number of generations possible within the run­time limit. Similarly, the standard genetic algorithm per­formed best when many generations were allowed and con­sequently, the population was smaller. 
Table 25: RMS error (best values) of the .nal experiment after 4 · 106 calculating operations 
Lens/Alg. SGA5 HGA41 HGA42 HGA43 IF 
C13353 4.848 3.5616 3.3723 3.673 4.9422 CA11265 2.8673 2.824 3.0308 3.3426 2.5374 CA11268 2.5248 2.5907 2.6774 2.4733 2.4788 CA11483 3.5613 3.2077 3.3761 3.5551 3.3951 CA11525 1.2501 1.5096 1.356 1.6992 1.0365 CA11934 2.1933 1.8058 2.5005 2.1928 1.4969 CA12392 2.4982 2.038 2.2183 1.9166 1.5905 CA13013 1.128 0.9672 1.0672 1.031 0.9042 CP12632 4.4827 4.3852 4.4996 4.424 4.3207 CP12633 2.5485 2.5773 2.5266 2.4582 2.936 CP12634 5.6712 5.7342 5.7009 5.8213 5.6363 CP12636 3.491 2.5551 2.8835 2.8446 3.08 
Median 2.7078 2.5840 2.7804 2.6589 2.7367 
In the .nal experiment, the best local search and the ver­sions of genetic algorithms selected after tuning the main parameters were compared. Interesting enough, all three versions of the hybrid genetic algorithm performed better than the standard genetic algorithm, and that conclusion is supported by statistical tests. On the other hand, there is no statistically signi.cant differences among the versions of the hybrid algorithm and the local search IF. Looking at the results closer, we conclude that a version of hybrid genetic Table 26: RMS error (best values) of the .nal experiment after 1.2 · 106 calculating operations 

Lens/Alg. SGA5 HGA41 HGA42 HGA43 IF 
C13353 5.7489 3.8082 3.3723 4.6041 5.0976 CA11265 3.933 3.1401 3.1738 3.5108 2.5389 CA11268 2.8694 2.6582 2.7098 2.4733 2.4797 CA11483 3.6344 3.2284 3.3761 3.8551 3.4077 CA11525 2.0875 1.5822 1.356 1.6992 1.0381 CA11934 2.8535 1.8058 3.0173 2.4433 1.5547 CA12392 2.5325 2.038 2.2183 1.9166 1.5924 CA13013 1.4786 1.1819 1.1125 1.031 0.9043 CP12632 4.807 4.3852 4.6303 4.7755 4.3292 CP12633 2.5485 2.5944 2.5266 2.4582 2.9366 CP12634 5.7481 5.9876 5.7586 5.8213 5.638 CP12636 3.8838 2.5551 2.8969 3.2777 3.0801 
Median 3.2519 2.6263 2.9571 2.8755 2.7377 
Table 27: Asymptotic signi.cances of Wilcoxon signed rank test for results of the .nal experiment at 4 · 106 calcu­lating operations 
SGA5  HGA41  HGA42  HGA43  IF  
SGA5  0.060  0.583  0.239  0.034  
HGA41  0.099  0.099  0.814  
HGA42  0.814  0.308  
HGA43  0.347  

algorithm performs slightly better. 
While the comparison here is based on extensive experi­ments and the conclusions are supported by statistical tests, there are obvious reasons that relativize the conclusions. Namely, the experiment was run on a realistic dataset that is relevant for the engineering application, and con.rmed the hypothesis that these type of problems can successfully be solved by the heuristics used. On the other hand, the dataset used is relatively small, and hence the observations can be generalized only conditionally. Further work on other datasets and related optimization problems is planned. 
Nevertheless, we believe that inclusion of a carefully chosen local search into a genetic algorithm is a good idea, and the present experimental study proves that. 
Table 28: Asymptotic signi.cances of Wilcoxon signed rank test for results of the .nal experiment at 1.2 · 106 cal­culating operations 
SGA5  HGA41  HGA42  HGA43  IF  
SGA5  0.006  0.008  0.01  0.005  
HGA41  0.638  0.136  0.48  
HGA42  0.347  0.239  
HGA43  0.099  

References 
[1] E. H. L. Aarts, J. K. Lenstra (1997) Local Search Al­gorithms, John Wiley & Sons. 
[2] I. Ashdown 	(2001) Thinking Photometrically Part II., Proceedings of the Pre-Conference Workshop (LIGHTFAIR). 
[3] T. 	Bäck (1996) Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms, Oxford Univer­sity Press. 
[4] M. Cof.n, M. J. Saltzman (2000) Statistical Analysis of Computational Tests of Algorithms and Heuristics, INFORMS Journal of Computing, vol. 12, pp. 24–44. 
[5] Cree, 	Inc. XLamp XP-E, www.cree. com/led-components-and-modules/ products/xlamp/ discrete-directional/xlamp-xpe 
[6] M. Dorigo, M. Birattari, T. Stützle (2006) Ant colony optimization, IEEE Computational Intelligence Mag­azine, vol. 1, no. 4, pp. 28–39. 
[7] G. Dueck, T. Scheuer (1990) Threshold Accepting: A General Purpose Optimization Algorithm Appearing Superior to Simulated Annealing, Journal of Compu­tational Physics, vol. 90, pp. 161–175. 
[8] P. Festa, M. G. C. Resende (2002) GRASP: An an­notated bibliography, Essays and Surveys on Meta­heuristics, C. C. Ribeiro, P. Hansen (eds.), Kluwer Academic Publishers, pp. 325—367. 
[9] I. Fister Jr., S. Fong, J. Brest, I. Fister (2014) A Novel Hybrid Self-Adaptive Bat Algorithm, The Scienti.c World Journal, vol. 2014, Article ID 709738. 
[10] I. Fister, M. Perc, S. M. Kamal, I. Fister (2015) A re­view of chaos-based .re.y algorithms: Perspectives and research challenges, Applied Mathematics and Computation, vol. 252, pp. 155—165. 
[11] F. 	Glover, M. Laguna (1997) Tabu Search In M. Panos, Handbook of Combinatorial Optimization, New York, Springer US, pp. 2093-2229. 
[12] R. L. Haupt, S. E. Haupt (2004) Practical Genetic Al­gorithms, 2nd Ed., John Wiley & Sons. 
[13] D. Kaljun, J. Žerovnik (2014) Local Search Optimiza­tion of a Spatial Light Distribution Model, Proceed­ings of the Student Workshop on Bioinspired Opti­mization Methods and their Applications (BIOMA), pp. 81–91. 
[14] D. Kaljun, J. Žerovnik (2014) Function .tting the symmetric radiation pattern of a LED with attached secondary optic, Optics Express, vol. 22, pp. 29587– 29593. 

[15] D. Kaljun, J. Žerovnik (2014) On local search based heuristics for optimization problems, Croatian Oper­ational Research Review, vol. 5, no. 2, pp. 317–327. 
[16] S. Kennedy (2005) Escaping the bulb culture: the fu­ture of leds in architectual illumination. LEDs maga­zine, vol. 1, pp. 13–15. 
[17] P. J. Laarhoven, E. H. Aarts (1987) Simulated anneal­ing: theory and applications, Mathematics and Its Ap­plications, M. Hazewinkel (ed.), Springer, pp. 7–15. 
[18] Ledil Oy., www.ledil.com/. 
[19] L. Lobachinsky, A. Bahabad (2014) Using Particle Swarm Optimization to Design Broadband Optical Nano-antennas for Nonlinear Optics, Frontiers in Op­tics, Optical Society of America, paper FTh4E.3. 
[20] M. Mitchell (1999) An Introduction to Genetic Algo­rithms, 5th Ed., The MIT Press. 
[21] N. Mladenovi´c, P. Hansen, J. Brimberg (2013) Se­quential clustering with radius and split criteria, Cen­tral European Journal of Operations Research, vol. 21, suppl. 1, pp. 95–115. 
[22] I. Moreno, C.-C. Sun (2008) Modeling the radiation pattern of leds, Optics Express, vol. 16, pp. 1808– 1819. 
[23] D. Simon (2013) 	Evolutionary Optimization Algo­rithms, John Wiley & Sons. 
[24] C.-C. Sun, T.-X. Lee, S.-H. Ma, Y.-L. Lee, S.-M. Huang (2006) Precise optical modeling for led light­ing veri.ed by cross correlation in the mid.eld re­gion, Optics Letters, vol. 31, pp. 2193–2195. 
[25] E.-G. Talbi (2009) Metaheuristics: From Design to Implementation, John Wiley & Sons. 
[26] The Millennium Prize Problems are seven problems in mathematics that were stated by the Clay Math­ematics Institute in 2000, www.claymath.org/ millenium-problems/p-vs-np-problem. 
[27] The Subcommittee 	on Photometry of the IESNA Computer Committee (2002) Iesna standard .le for­mat for the electronic transfer of photometric data and related information, Technical Report ANSI IESNA LM-63-02, Illuminating Engineering Society of North America. 
[28] M. Tuba, N. Bacanin (2014) Improved seeker opti­mization algorithm hybridized with .re.y algorithm for constrained optimization problems, Neurocom­puting, vol. 143, pp. 197–207. 
[29] A. Vesel, J. Žerovnik (2000) How well can ants colour graphs?, CIT. Journal of Computing and Information Technology, vol. 8, pp. 131–136. 
Informatica 39 (2015) 147–159 159 
[30] F. Wilcoxon (1945) Individual comparisons by rank­ing methods, Biometrics, vol. 1, pp. 80–83. 

Implicit and Explicit Averaging Strategies for Simulation-Based Optimization of a Real-World Production Planning Problem 
Juan Esteban Diaz and Julia Handl Manchester Business School, The University of Manchester, Booth Street West M15 6PB Manchester, United Kingdom E-mail: juan.diaz@postgrad.mbs.ac.uk, j.handl@manchester.ac.uk 
Keywords: discrete event simulation, failure-prone manufacturing, genetic algorithms, noise handling, production plan­ning, simulation-based optimization, uncertainty 
Received: December 1, 2014 
In this study, we explore the impact of noise handling strategies on optimization performance in the context of a real-world production planning problem. Uncertainties intrinsic to the production system are captured using a discrete event simulation (DES) model, and the production plan is optimized using an evolutionary algorithm. The stochastic nature of the .tness values (as returned by the DES simulation) may impact on optimization performance, and we explore explicit and implicit averaging strategies to address this issue. Speci.cally, we evaluate the effectiveness of different strategies, when a limited budget of evaluations is available. Our results indicate a general advantage of implicit averaging in this setting, and a good degree of robustness with regard to population size. On the other hand, explicit averaging is found to be non-competitive, due to the cost of repeat-evaluations of the same solution. Finally, we explore a hybrid approach that uses explicit averaging to re.ne .tness estimates during .nal solution selection. Under increasing levels of .tness variability, this hybrid strategy starts to outperform pure implicit and explicit averaging strategies. 
Povzetek: V študiji smo raziskali vpliv strategij za ravnanje s šumom na uspešnost optimizacije v okviru realnega problema naˇ
crtovanja proizvodnje. Negotovosti, ki se pojavljajo v proizvodnem sistemu so bile zajete z modelom simulacije diskretnih dogodkov (DES), proizvodni naˇ
crt pa je bil optimiran z uporabo evolucijskega algoritma. Ker stohastiˇca DES) lahko 
cna narava vrednosti kriterijske funkcije (kot jo vraˇ
vpliva na uspešnost optimizacije, smo raziskali eksplicitne in implicitne strategije povpreˇ

cenja za reševanje tega problema. Natanˇcinkovitost razliˇ
cneje, oceniti smo uˇcnih strategij v primerih, ko je na voljo omejeno število ocenitev kriterijske funkcije. Rezultati v splošnem kažejo na prednost implicitnega povpreˇ
cenja in dobro stopnjo robustnosti glede na velikost populacije. Po drugi strani pa smo za eksplicitno povpreˇ
cenje 

ugotovili, da ni konkurenˇcno zaradi stroškov veˇckratnih ovrednotenj iste rešitve. Konˇcno, raziskali smo 
hibridni pristop, ki uporablja eksplicitno povpreˇcni
cenje za izpopolnitev ocene kriterijske funkcije pri konˇizbiri rešitev. S poveˇcne hibridna strategija prekašati 
cano stopnjo spremenljivosti kriterijske funkcije zaˇ
ˇ

cisti, eksplicitni in implicitni strategiji. 

1 Introduction 
Optimization problems that include uncertainty pose chal­lenges that are dif.cult to address using standard optimiza­tion methodologies. While a portion of the optimization literature is concerned with the development of method­ologies capable of identifying optimal solutions to prob­lems with uncertainty, the application of these methods often requires stringent assumptions and / or simpli.ca­tions that are necessary to satisfy relevant optimality con­ditions. Those methods are often insuf.ciently powerful to accurately incorporate the full complexity and uncertainty intrinsic to real-world problems into the problem formu­lation, even when their consideration is essential for the generation of reliable and feasible solutions. For this rea­son, solutions obtained from traditional approaches to op­timization under uncertainty (such as fuzzy, stochastic and stochastic dynamic programming) may often be of limited value in producing realistic solutions for real-world prob­lems. 
Simulation-based optimization constitutes an interesting alternative in situations where the high level of complex­ity precludes a complete analytic formulation of a prob­lem [7] and where uncertainty needs to be considered [8]. Simulation-based optimization involves the development of a detailed simulation model, which is then coupled with an optimizer in a black-box fashion. In other words, the op­timizer operates on a (sub-)set of model parameters and the optimization process is based exclusively on the (usually stochastic) simulation responses. Evolutionary algorithms (EAs) are well-suited to black-box optimization settings, as highlighted by their wide application to real-world op­timization problems that cannot be handled by analytical approaches [14]. The feasibility and reliability of solutions become the primary consideration in such settings [8], and the EAs’ .exibility in this respect typically offsets its dis­advantages (speci.cally, the lack of guaranteed optimality of its identi.ed solutions). 

When EAs are employed as optimizers of simulation-based optimization models, .tness values become sub­ject to the variability arising from the stochastic responses within the simulation model. The resulting noisy nature of the .tness values poses a challenge to the evolutionary opti­mizer, for it may mislead selection procedures [1] and lead to the propagation of inferior individuals or to the elimina­tion of superior ones, thereby undermining algorithm per­formance [17]. Under these circumstances, noise handling strategies can play an important role in compensating for the impact of noise on the optimizer, and, speci.cally, in helping the optimizer to identify solutions that exhibit low .tness variability and give rise to high average .tness. Mul­tiple studies have analysed situations in which noise causes perturbations during .tness evaluation, thus generating dis­crepancies between the observed and “true" .tness [14]. We refer the reader to [10] for a comprehensive survey of noise handling strategies proposed in the existing literature. 
Implicit and explicit averaging are the two strategies most commonly employed to reduce the in.uence of noise in evolutionary optimization under noise. Implicit averag­ing relies on the EA mechanism itself to compensate for the impact of noise. Speci.cally, it assumes that the use of suf.ciently large populations will ensure that individu­als from promising regions of the search space are sampled repeatedly [10, 17], and the impact of noise can be reduced in this manner. On the other hand, explicit averaging strate­gies ensure that individuals are evaluated using average .t­ness values obtained across a speci.c number (n) of .tness evaluations (replicates). Statistically, this approach ensures that the expected error of .tness estimates (i.e. the differ­ence between the observed and the “true" .tness mean) re­
. 

duces with a factor of n [10]. 
Both implicit and explicit averaging strategies incur ad­ditional .tness evaluations due to (i) the increase in popula­tion size and due to (ii) the increase in the number of trials, respectively. Fitness evaluations present an important con­sideration in simulation-based optimization, as each repli­cation of a simulation is time-consuming and the number of these replications may be limited by available computa­tional time. Here, we investigate the ef.ciency and effec­tiveness of different noise-handling strategies in a realistic simulation-based optimization setting, in which the compu­tational budget available for the optimization (and, there­fore, the overall number of simulation replicates) is lim­ited. Speci.cally, we compare explicit averaging against implicit averaging strategies for two different population sizes. Finally, we investigate a hybrid scheme that aims to combine the strengths of both approaches. 
The remainder of this paper is organized as follows. Sec­tion 2 introduces the real-world optimization problem con­sidered and the corresponding simulation-based optimiza-
J. E. Diaz and J. Handel 
tion model developed in this study. Explicit averaging, im­plicit averaging and a hybrid strategy combining both ap­proaches are described in Section 3. Section 4 presents de­tails about the comparative analysis, and empirical .ndings are presented in Section 5. Overall conclusions, as well as the limitations of this study and future research directions, are discussed in Sections 6 and 7, respectively. 
2 	Simulation-Based Optimization Model 
In this study, a simulation-based optimization approach based on the integration of discrete event simulation (DES) and a genetic algorithm (GA) is applied to address the pro­duction planning problem of a real manufacturing company presented by Diaz Leiva and Handel [6] with the difference that, here, the objective is to achieve pro.t maximization. Additional modi.cations made to the original DES and op­timization models presented in [6] are stated in Sections 2.1 and 2.2, respectively. 
This problem corresponds to a big bucket, multi-product, multi-level (sub-products), capacitated (constraints are considered) production planning problem of a failure-prone manufacturing system, consisting of multiple work centres with insuf.cient capacity to fully cover demand require­ments. 
The DES model was developed in SimEvents. (The MathWorks, Inc., 2014) and Matlab’s GA (MI-LXPM) im­plementation [5] was used as the optimizer. This is the de­fault MATLAB. R2014a’s (The MathWorks, Inc., 2013) implementation for solving integer and mixed integer prob­lems with GA. The GA employs Laplace crossover, power mutation and binary tournament selection as operators. The truncation procedure, which ensures compliance with in­teger constraints after crossover and mutation is described in [5]. The inbuilt constraint-handling method is the param­eter free penalty function approach proposed by Deb [4]. 
All computations were executed in parallel on a 12 core Intel(R) Xeon(R) CPU L5640 @ 2.27GHz with 24 GB of RAM running Scienti.c Linux, release 6.2. 
2.1 Simulation Model 
The DES model employed in this study is a modi.ed ver­sion of the model presented by Diaz Leiva and Handel [6]. 
The DES model represents the production of 31 prod­ucts k within 7 work centres l. A work centre corresponds to the set of resources (e.g., machines, people, etc.) needed to manufacture certain products. Given that some prod­ucts can be manufactured in several work centres a total of 41 processes j are considered in the DES model. A pro­cess j includes all series of events involved in the initial­ization of orders of a product k, its manufacturing in a spe­ci.c work centre l and its storage in an speci.c sink s (with s =1, 2,..., 41). 

This model intends to capture the delays (.l) caused by work centre failures and provides the stock of products manufactured during a production period of one month (24 days composed of 3 shifts of 8 hours each). The total stock of a speci.c product (Sk) corresponds to the sum of lots manufactured across the different work centres as shown by the following equation: 
7
Sk = Sk,l, (1) 
l=1 

where Sk,l is the stock of product k manufactured in work centre l. 
The .rst modi.cation to the original model is that in­stead of using probability distribution functions (PDFs) to represent the time required to manufacture a speci.c pro­duction lot (Tj), constant values are assigned to those at­tributes according to speci.cations provided by the com­pany. 
Moreover, unlike the original DES model, the probabil­ity of occurrence of a work centre failure during the manu­facturing of a product lot is here denoted as Pl and is mod­elled as attributes assigned to each work centre (rather than to each process). The probabilities used for Pl, as well as the PDFs employed to represent the delays caused by those failures (.l), are summarized in Table 2. 
Finally, an additional server was added to each process, so that the .rst lot of every process is processed by this server during the entire duration of each simulation repli­cation (24 days). This modi.cation was made to allow de­cision variables to take values equal to zero, a possibility not accounted for in the original model [6]. 
2.2 Optimization Model 
The objective here is to generate production plans that try to maximize the expected sum of contributions to pro.t gen­erated from processes undertaken by a failure-prone man­ufacturing system. The expected sum of contributions to pro.t is later referred to as “pro.t" for simpli.cation pur­poses. 
A total of 41 decision variables (xj) are considered, which correspond to the number of lots to be produced in each process j, and are constrained to be non-negative inte­gers. Those decision variables, speci.ed by the GA, con­stitute the input to the DES model and the responses Sk obtained from the DES model are used for computing the value of the .tness function. 
The .tness value f is calculated across n independent simulation replicates for each individual x as follows: 
1  n.  
maximize  f(x) = —c =  cm,  (2)  
n  

m=1 

where the value of n varies depending on the strategy ap­plied (see Section 4 for details about n). 
Informatica 39 (2015) 161–168 163 
For each replication m, the responses (Sk) of the DES model are used to calculate cm as follows: 
31
cm = Pk, (3) k=1 
where the total pro.t derived from product k is de.ned as: 
Pk = Sk × .k, (4) 
where .k denotes the contribution margin per lot of product 
k. 
Additional constraints are imposed in the form of Equa­tion 5 to avoid production levels greater than the maximum demand, to represent the requirement of sub-products and labour needed to undertake each process. 
41
ai,j × xj . bi (i =1, 2,..., 44), (5) j=1 
where bi denotes the magnitude of constraint i and ai,j cor­responds to the amount deployed from bi by manufacturing one lot in process j. 
3 Noise Handling Strategies 
In this study we focus exclusively on implicit and ex­plicit averaging strategies, as these are straightforward to implement in any EA and present the approaches most commonly employed in practice. Other noise handling strategies, such as averaging by means of approximated models [3, 12, 16] and modi.cations of the selection scheme [2, 15] have been proposed in the literature, but are not considered here. 
An explicit averaging strategy uses a .xed number n of simulation replicates to obtain an average .tness value for each individual, as described in Equation 2. These average .tness estimates are then used to inform selection proba­bilities in the evolutionary algorithm. Therefore, under the explicit averaging strategy (ES) here analysed, .tness of an individual x is computed across 10 independent .tness evaluations (n = 10) and the population size employed cor­responds to 50 individuals. n is set at this relatively small level because a limited computational budget of 25,000 .t­ness evaluations is available, and assigning higher values to n or using a larger population size would reduce the num­ber of generations that can be executed under ES. 
In contrast, an implicit averaging strategy uses a sin­gle simulation replicate (n =1) to evaluate the quality of an individual. Increased robustness towards noise is then achieved by increasing population size relative to standard settings of this parameter. Consequently, under the implicit averaging strategy (IS), a single .tness evaluation (n =1) is used to compute .tness and a population size of 100 in­dividuals is applied. Additionally, a baseline strategy (BS) is also analysed in order to evaluate the performance of im­plicit averaging when the population size is the same as in ES, and therefore, twice as many generations are evolved compared to IS. 

Moreover, we further describe a hybrid strategy (HS) that attempts to combine aspects of implicit and explicit averaging. This strategy applies implicit averaging (n =1 and a population size of 100 individuals) throughout the evolution process, but switches behaviour towards the end of the optimization: instead of choosing the .nal solution based on a single .tness value, we propose to select from the .nal population the feasible individual with the best av­erage .tness. Consequently, we implement a mechanism that computes the average .tness of every feasible indi­vidual of the .nal population across a number . of .tness evaluations, generated from independent simulation repli­cations. The value of . depends on the number of feasible individuals (.) in the .nal population and on the computa­tional budget available for this last step (E), which in this case corresponds to 1000 .tness evaluations. . is computed as follows: 
E 
. = . 	(6)
. 

Therefore, having a population size of 100, if every in­dividual in the .nal population is feasible, 10 .tness eval­uations are used to compute the average .tness of each in­dividual. However, if infeasible individuals are present in the .nal population, those 1000 .tness evaluations are dis­tributed amongst feasible individuals only. 
Parameters speci.ed for all four noise handling strate­gies introduced in this section are presented in Table 1. 
4 Comparative Analysis 
In order to test the effectiveness of the strategies under dif­ferent levels of .tness variability, the following compar­ative analysis is undertaken for two different problem in­stances. Table 2 shows the different levels of uncertainty incorporated into each instance of the problem. 
The performance of the EA under the proposed HS is compared with the performance observed for IS, ES and BS. In order to provide a fair comparison of the four strate­gies analysed, the stopping criteria selected to terminate the optimization procedure is the number of .tness evaluations. As mentioned in Section 3, a total budget of 25,000 .tness evaluations is allocated for every strategy as shown in Ta­ble 1. 
The simulation-based optimization model is run 60 dif­ferent times for each strategy; in each run, the best solution (based on last .tness evaluations) is selected from the .nal population. Consequently, 240 production plans are gen­erated per problem instance. The precise quality of each of these plans is evaluated using extensive simulation: av­erage pro.t, measured in United States Dollar (USD), is computed for every production plan across 1000 pro.t val­ues obtained via stochastic simulation. 
J. E. Diaz and J. Handel 
Subsequently, the four sets of average pro.t values are depicted as cumulative distribution functions (CDFs) and stochastic dominance criterion [18] is applied to determine whether or not the optimization performance, as measured in average pro.t values, differs between strategies. 
Furthermore, Mann-Whitney U test [11] is then con­ducted for paired comparisons to test whether the optimiza­tion performance achieved under the different strategies is statistically signi.cant, expressed in the form of the follow­ing hypotheses: 
– 	
Ho : stochastic homogeneity of CDFs of average pro.t values obtained under both strategies 

– 	
Ha : average pro.t values obtained under one strategy are stochastically smaller than the ones obtained under the other strategy 


Mann-Whitney U test is employed instead of t-test, since distributions of the samples analysed do not ful.l the nor­mality assumption. 
5 Results 
Descriptive statistics (means, minimum values, maximum values and standard deviations) of the average pro.t values (as computed across 1000 independent replications) ob­tained under the four strategies as well as the corresponding average computation times are presented in Tables 3 and 4 for problem instance 1 and 2, respectively. 
Since our intention is to test the hypothesis presented in section 4 among the four strategies, homogeneity of vari­ances of the ranked values across the different samples is a necessary condition for Mann-Whitney U test to be a reli­able test [9]. Therefore, non-parametric Levene tests [13] were performed on every combination of samples in both problem instances. In both problem instances, results from these tests indicate that variances did not differ signi.cantly (p> 0.05) between the samples of ranks analysed, con­.rming the suitability of Mann-Whitney U test to evaluate the hypothesis above mentioned (Section 4). 
Figures 1 and 2 illustrate as CDFs the 60 average pro.t values obtained with production plans generated under each strategy in problem instance 1 and 2, respectively. Both .gures clearly show that the CDFs of average pro.t obtained under BS, IS and HS dominate the CDF of pro.t obtained under ES (.rst-order stochastic dominance). Fur­thermore, results from Mann-Whitney U test statistically show that average pro.t values obtained under ES are stochastically smaller (p< 0.01) than the ones obtained under BS, IS and HS in both problem instances, as shown in Tables 5 and 6. 
These results demonstrate that ES is an inadequate noise handling strategy in our setting: this result is likely to be driven by the limited computational budget available, and a stronger performance of ES may potentially be achieved when considering performance upon convergence. 

Table 1: Parameters used for baseline, implicit averaging, explicit averaging and hybrid strategies 
BS  IS  ES  HS  
n  1  1  10  1  
P opulationSize  50  100  50  100  
Generations  500  250  50  240  
Fitness evaluations  25,000  25,000  25,000  . 25,000  

Table 2: Probabilities for Pl and PDFs for .l 
Instance 1 Instance 2 Instance 1 and 2 
.l

Work centre (l) Pl Pl 
PDF Mean (d) 

1 0.00 0.00 — — 

2 0.10 0.30 Exponential 0.0847 

3 0.25 0.45 Exponential 0.0935 

4 0.15 0.35 Exponential 0.1338 

5 0.00 0.00 — — 

6 0.00 0.00 — — 

7 0.00 0.00 — — 


Table 3: Descriptive statistics of average pro.ts and average computation times per strategy in problem instance 1 
BS  IS  ES  HS  
Mean (USD)  703,689  715,376  555,932  707,503  
Minimum (USD)  550,417  484,229  460,416  550,014  
Maximum (USD)  793,316  795,716  689,685  792,696  
Std dev (USD)  55,539  60,416  47,322  60,754  
Average computation time (s)  1256  1144  1235  1369  

Table 4: Descriptive statistics of average pro.ts and average computation times per strategy in problem instance 2 
BS  IS  ES  HS  
Mean (USD)  707,600  703,420  543,140  723,460  
Minimum (USD)  536,840  538,990  437,930  561,550  
Maximum (USD)  785,810  783,040  656,800  790,460  
Std dev (USD)  60,137  61,744  43,412  51,258  
Average computation time (s)  1284  1108  1214  1265  

No dominance (neither .rst nor second degree stochas­tic dominance) can be determined among the CDFs of av­erage pro.t obtained under BS, IS and HS in problem in­stance 1, where all three strategies appear equally compet­itive. These results are in accordance with results from Mann-Whitney U test, which indicate stochastic homo­geneity (p> 0.05) among samples obtained under BS, IS and HS in problem instance 1. It is interesting that popula­tion size (i.e. different setups of implicit averaging) has no signi.cant effect in this setting as evidenced under BS and IS. 
In problem instance 2, results from Mann-Whitney U tests also indicate stochastic homogeneity (p> 0.05) among samples obtained under BS, IS and HS. However, the CDFs of average pro.t obtained under HS dominates the CDFs of average pro.t obtained under IS and under BS, respectively (.rst and second-order stochastic dominance). These results indicate that, even in a setting with a lim­ited evaluation budget, the accurate .tness estimates from explicit averaging can be bene.cial to optimization. It is clear that the difference between IS and HS arises during the .nal stages of the optimization only, while IS contin­ues optimization during additional generations, HS focuses resources on explicit averaging across its last population. When seen in combination with the poor performance of ES, our results suggest that the trade-off between improved exploration (from evaluating more individuals) and accu­rate .tness evaluations (through simulation replicates for the same individual) needs to be carefully balanced in this setting. This .nding appears in line with previous research that has delivered contradictory results regarding the rela­tive performance of explicit and implicit averaging. 
Table 5: Values for Mann-Whitney U statistic obtained in problem instance 1 
BSIS HS ES 

BS — 1525 1705 107** 
IS — — 1662 117** 
HS— — — 113** 
ES——— — 
** p< 0.01 

Table 6: Values for Mann-Whitney U statistic obtained in problem instance 2 
BSIS HS ES 

BS — 1719 1549 103** IS — — 1485 96** HS—— — 36** ES——— — 
** p< 0.01 
J. E. Diaz and J. Handel 
6 Conclusion 
Implicit averaging strategies reduce the impact of noise by having a suf.ciently large population, which ensures that individuals from promising regions of the search space are sampled repeatedly [10, 17]. On the other hand, explicit averaging strategies use average .tness values, obtained across a speci.c number of .tness evaluations, to ensure that evaluation of individuals is based on .tness estimates that are more robust to noise [10]. 
One of the key .ndings of this study was that, in the con­text of our real-world problem, a noise-handling strategy based on explicit averaging did not provide a competitive performance. More generally, this points to the fact that the computational costs incurred by simulation replicates may be problematic in constrained time settings. 
Furthermore, we found that implicit averaging per­formed robustly for both of the population sizes used. The performance of our hybrid strategy does indicate that some effort towards explicit averaging may become important with increasing variability. Under low levels of .tness variability, the hybrid strategy, implicit averaging and our baseline showed a comparable performance. This situation changed with increasing levels of .tness variability, when HS started to enhance overall performance. 
Compared to a pure, implicit averaging strategy, the hy­brid strategy misses out on the last few generations of op­timization. Our results show, however, that this disadvan­tage is more than counter-balanced by the bene.ts from an accurate .nal selection step that reduces the likelihood of choosing an inferior individual (in terms of average .tness) as the .nal solution. 
7 Limitations and Future Research 
The relevance of obtaining more reliable .tness estimates increases with the level of variability, since there is a higher risk of choosing an inferior solution. It is, therefore, intu­itive that the .nal selection mechanism implemented in HS would be more bene.cial in such circumstances. But at the same time, the number of .tness evaluations needed to obtain reliable estimates is expected to raise with higher .tness variability, leaving to the evolutionary process a smaller share of the computational budget. Therefore, fur­ther research may focus on investigating the right trade-off between exploration (IS) and accurate .tness evaluation (ES). In this sense, the application of different sampling techniques (e.g., Latin Hypercube) during the .nal selec­tion mechanism might be worthy of future investigation, as it may allow a reduction in the number of .tness evalua­tions required in this last step. 
Our results underline issues around the computational cost of explicit averaging, but also highlight that sporadic use of this strategy may, nevertheless, be bene.cial. In this study, the use of explicit averaging in the hybrid strategy was limited to the .nal selection step only. Future research Figure 1: CDFs of average pro.t values obtained with production plans generated under the four different strategies in problem instance 1. 



Figure 2: CDFs of average pro.t values obtained with production plans generated under the four different strategies in problem instance 2. 
may consider the possibility of using explicit averaging at earlier points during the optimization process. 
Acknowledgement 
Juan Esteban Diaz expresses his gratitude to the Secretariat for Higher Education, Science and Technology of Ecuador, who has supported him with a doctoral scholarship. 
References 
[1] M. Bhattacharya, R. Islam, A. N. Mahmood (2014) Uncertainty and evolutionary optimization: A novel approach, Proceedings of the 9th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 988–993. 
[2] J. Branke, C. Schmidt (2003) Selection in the pres­ence of noise, Proceedings of the Genetic and Evolu­tionary Computation Conference (GECCO), pp. 766– 
777. 
[3] J. Branke, C. Schmidt, H. Schmec (2001) Ef.cient .t­ness estimation in noisy environments. Proceedings of the Genetic and Evolutionary Computation Con­ference (GECCO), pp. 243–250. 
[4] K. 	Deb (2000) An ef.cient constraint handling method for genetic algorithms, Computer methods in applied mechanics and engineering, vol. 186, no. 2, pp. 311–338. 
[5] K. Deep, K. P. Singh, M. Kansal, C.Mohan (2009) A real coded genetic algorithm for solving integer and mixed integer optimization problems, Applied Math­ematics and Computation, vol. 212, no. 2, pp. 505– 
518. 
[6] J. E. Diaz Leiva, J. Handl (2014) Simulation-based ga optimization for production planning, Proceedings of the Student Workshop on Bioinspired Optimization Methods and their Applications (BIOMA), pp. 27–39. 
[7] C. Ehrenberg, J. Zimmermann (2012) Simulation-based optimization in make-to-order production: scheduling for a special-purpose glass manufacturer, 
Proceedings of the Winter Simulation Conference (WSC), pp. 1–12. 
[8] G. 	Figueira, B. Almada-Lobo (2014) Hybrid simulation-optimization methods: A taxonomy and discussion, Simulation Modelling Practice and Theory, vol. 46, pp. 118–134. 
[9] M.-R. Ghasemi, J. Ignatius, A. Emrouznejad (2014) A bi-objective weighted model for improving the dis­crimination power in MCDEA, European Journal of Operational Research, vol. 233. no. 3, pp. 640–650. 
J. E. Diaz and J. Handel 
[10] Y. Jin, J. Branke (2005) Evolutionary optimization in uncertain environments -a survey, IEEE Transactions on Evolutionary Computation, vol. 9, no. 3, pp. 303– 
317. 
[11] H. B. Mann, D. R. Whitney (1947) On a test of whether one of two random variables is stochasti­cally larger than the other, The annals of mathemati­cal statistics, vol. 18, no. 1, pp. 50–60. 
[12] F. Neri, X. del Toro Garcia, G. L. Cascella, N. Salva­tore (2008) Surrogate assisted local search in pmsm drive design, COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering, vol. 27, no. 3, pp. 573–592. 
[13] D. W. Nordstokke, B. D. Zumbo (2010) A new non-parametric levene test for equal variances, Psicológ­ica, vol. 31, no. 2, pp. 401–430. 
[14] C. Qian, Y. Yu, Y. Jin, Z.-H. Zhou (2014) On the ef­fectiveness of sampling for evolutionary optimization in noisy environments, Lecture Notes in Computer Science, vol. 8672, pp. 302–311. 
[15] G. Rudolph (2001) A partial order approach to noisy .tness functions, Proceedings of the Congress on Evolutionary Computation (CEC), vol. 1, pp. 318– 
325. 
[16] Y. Sano, H. Kita, I. Kamihira, M. Yamaguchi (2000) Online optimization of an engine controller by means of a genetic algorithm using history of search, Pro­ceedings of the 26th IEEE Annual Conference of the Industrial Electronics Society (IECON), vol. 4, pp. 2929–2934. 
[17] A. Syberfeldt, A. Ng, R. I. John, P. Moore (2010) Evolutionary optimisation of noisy multi-objective problems using con.dence-based dynamic resam­pling, European Journal of Operational Research, vol. 204, no. 3, pp. 533–544. 
[18] S. Yitzhaki (1982) Stochastic dominance, mean vari­ance, and gini’s mean difference, The American Eco­nomic Review, vol. 72, no. 1, pp. 178–185. 

Data Mining-Assisted Parameter Tuning of a Search Algorithm 
Jurij Šilc Computer Systems Department, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia E-mail: jurij.silc@ijs.si 
Katerina Taškova Institute of Computer Science, Johannes Gutenberg University Mainz, Staudingerweg 9 55128 Mainz, Germany E-mail: ktaskova@uni-mainz.de 
Peter Korošec Computer Systems Department, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia, and Faculty of Mathematics, Science and Information Technologies, University of Primorska Glagoljaška 8, SI-6000 Koper, Slovenia E-mail: peter.korosec@ijs.si 
Keywords: data mining, differential ant-stigmergy algorithm, low-discrepancy sequences, meta-heuristic optimization, parameter tuning 
Received: December 1, 2014 
The main purpose of this paper is to show a data mining-based approach to tackle the problem of tuning the performance of a meta-heuristic search algorithm with respect to its parameters. The operational behavior of typical meta-heuristic search algorithms is determined by a set of control parameters, which have to be .ne-tuned in order to obtain a best performance for a given problem. The principle challenge here is how to provide meaningful settings for an algorithm, obtained as result of better insight in its behavior. In this context, we discuss the idea of learning a model of an algorithm behavior by data mining analysis of parameter tuning results. The study was conducted using the Differential Ant-Stigmergy Algorithm as an example meta-heuristic search algorithm. 
Povzetek: Osnovni namen ˇ

clanka je pokazati, kako se lahko z uporabo tehnik podatkovnega rudar­jenja lotevamo problema uglaševanja sposobnosti metahevristˇ
cnega iskalnega algoritma z vidika njegovih parametrov. Delovanje znaˇcnega iskalnega algoritma je doloˇ
cilnega metahevristiˇceno z naborom njegovih krmilnih parametrov, ki morajo biti za dosego najboljših sposobnosti pri danem problemu dobro uglašeni. Temeljni izziv je kako zagotoviti najboljšo nastavitev algoritma, ki bo rezultat vpogleda v njegovo vedenje. V zvezi s tem razpravljamo o ideji uˇcenja modela za obnašanje algoritma na osnovi analize podatkovnega rudarjenja rezultatov uglaševanja njegovih parametrov. Študija je narejena z uporabo Diferencialnega al­goritma s stigmergijo mravelj, kot primera metahevristiˇ
cnega iskalnega algoritma. 

1 Introduction 
The research interest for meta-heuristic search algorithms has been signi.cantly growing in the last 25 years as a result of their ef.ciency and effectiveness to solve large and complex problems across different domains [2]. The state-of-the-art nature-inspired meta-heuristic algorithms for high-dimensional continuous optimization include also algorithms inspired from the collective behavior of social organisms [14]. 
One such algorithm, which we will address in this paper, is the Differential Ant-Stigmergy Algorithm (DASA) ini­tially proposed by Korošec [6], and further improved in [8]. DASA is inspired by the ef.cient self-organizing behav­ior of ant colonies emerging from a pheromone-mediated communication, known as stigmergy [3]. One of the .rst stigmergy-based algorithms designed for continuous func­tion optimization was Multilevel Ant Stigmargy Algorithm [7]. 
Naturally, DASA can be classi.ed within the Ant Colony Optimization (ACO) framework. However, the use of a continuous pheromone model in the form of Cauchy prob­ability distribution with representation of the search space in the form of the so-called differential graph makes DASA dissimilar from the original ACO paradigm. The rationale behind DASA is in memorizing (via the pheromone model updates) the “move” in the search space that improves the current best solution and using it in further search. As it is the case with most of the meta-heuristic algorithms, the operational behavior of DASA is determined by a set of control parameters. In practice, these parameters have to be .ne-tuned in order to obtain best performance of the al­gorithm. It can be quite inconvenient for the users as: 

– 	
they usually do not have insight into the behavior of the algorithm, and 

– 	
even if a default setting exists, it may not be adequate for a speci.c instance or type of a problem. Moreover, parameter tuning is computationally expensive task. 



The principle challenge here is how to provide meaning­ful (default) settings for DASA, obtained as result of better insight into the algorithm’s behavior. Furthermore, can we .nd optimal regions in DASA parameter space by analyz­ing the patterns in the algorithm’s behavior with respect to the problem characteristics? Related to this, we discuss the preliminary .ndings based on data mining analysis of pa­rameter tuning results. More precisely, the parameter tun­ing task is approached by two-step procedure that combines a kind of experimental design with data mining analysis. 
We use Sobol’ sequences [10] for even sampling of the algorithm parameter space to generate a large and diverse set of parameter settings. These are used as input to DASA to be tuned on a given function optimization problem. The performance of DASA on the given function optimization problem, in terms of function error, is captured at different time points for all sampled parameter settings. The data collected in the .rst step, DASA performance with corre­sponding parameter settings, is subject for intelligent data analysis, i.e., multi-target regression with Predictive Clus­tering Trees [1]. 
Parameter sampling combined with regression has been already used by Stoean et al. [11] for tuning meta­heuristics: Latin hypercubes parameter sampling is com­bined with single-target regression with Support Vector Machines. Our approach modi.es the former by replacing the Latin hypercube sampling by Sobol’ sequences, as the former is best suited in the case when a single parameter dominates the algorithm’s performance, while it should be used with care if there are interactions among the sampled parameters [9]. Moreover, we de.ne the regression task as multi-target regression, taking into account more than one target (in this case the function error at few time points) with the goal to .nd parameter settings for the given al­gorithm that will not only solve the problem but will also solve the optimization problem fastest. 
The reminder of this paper is structured as follows. Sec­tion 2 introduces the differential ant-stigmergy algorithm. Then, Section 3 addresses the parameter tuning task and Section 4 presents the experimental evaluation with the re­sults. After that, Section 5 discusses the idea of post-hoc analysis of parameter tuning by data mining. Finally, Sec­tion 6 summarizes this study and outlines possible direc­tions for further work. 
2 	The Differential Ant-Stigmergy Algorithm 
The version of DASA used in our experimental evaluation is described in details by Korošec et al. [8] (see Figure 1). 
J. Šilc et al. 
DASA introduces the concept of variable offsets (re­ferred as to parameter differences) for solving the continu­ous optimization problems. By utilizing discretized offsets of the real-valued problem parameters, the continuous op­timization problem is transformed to a graph-search prob­lem. More precisely, assuming a multidimensional param­eter space with xi being the current solution for the i-th 
 
parameter, we de.ne new solutions xas follow: 
i 
 
x	= xi + ..i, (1)
i 
where .i is called the parameter difference and is selected from the following set: 
.i =.- .{0}. .+ i , (2)
i 
where 
.- = {.- | .- = -bk+Li-1,k =1, 2,...,di} (3)
i i,ki,k 
and 
.+ 
= {.+ | .+ = bk+Li-1,k =1, 2,...,di}. (4)
i i,ki,k 
Here, di = Ui - Li +1, (5) 
Li = llogb(Ei)J, (6) 
Ui = llogb(max(xi) - min(xi))J, (7) 
i =1, 2,...,D, D is dimension of the problem, b is the discretization base, E is the maximal computer arithmetic’s precision, and the weight . = Random_Integer(1,b - 1) is added to enable a more .exible movement over the search space. 
In principle, DASA relies on two distinctive characteris­tics, differential graph and continuous pheromone model. Here, we will brie.y discuss these two characteristics and outline the main loop of the DASA search process. 
First, DASA transforms the D-dimensional optimization problem into a graph-search problem. The corresponding differential graph is a directed acyclic graph obtained by .ne discretization of the continuous parameters’ offsets. The graph has D layers with vertices, where each layer corresponds to a single parameter. Each vertex corresponds to a parameter offset value that de.nes a change from the current parameter value to the parameter value in the next search iteration. Furthermore, each vertex in a given layer is connected with all vertices in the next layer. The set of possible vertices for each parameter depends on the pa­rameter’s range, the discretization base and the maximal computer arithmetic’s precision, which de.nes the minimal possible offset value. Ants use these parameters’ offsets to navigate through the search space. At each search itera­tion, a single ant positioned at a certain layer moves to a speci.c vertex in the next layer, according to the amount of pheromone deposited in the graph vertices belonging to this layer. 


Figure 1: The Differential Ant-Stigmergy Algorithm 
Second, DASA performs pheromone-mediated search that involves best-solution-dependent pheromone distribu­tion. The amount of pheromone is distributed over the ver­tices according to the Cauchy Probability Density Function (CPDF) [9]. DASA maintains a separate CPDF for each parameter. Initially, all CPDFs are identically de.ned by a location offset set to zero and a scaling factor set to one. As the search process progresses, the shape of the CPDFs changes: CPDFs shrink and stretch as the scaling factor de­creases and increases, respectively, while the location off­sets move towards the offsets associated with the better so­lutions. The search strategy is guided by three user-de.ned real positive factors: the global scale increase factor, s+, the global scale decrease factor, s-, and the pheromone evaporation factor, .. In general, these three factors de­.ne the balance between exploration and exploitation in the search space. They are used to calculate the values of the scaling factor and consequently in.uence the dispersion of the pheromone and the moves of the ants. 
Finally, the main loop of DASA consists of an iterative improvement of a temporary-best solution, performed by searching appropriates paths in the differential graph. The search is carried out by m ants, all of which move simul­taneously from a starting vertex to the ending vertex at the last level, resulting in m constructed paths. Based on the found paths, DASA generates and evaluates m new can­didate solutions. The best among the m evaluated solu­tions is preserved and compared to the temporary-best so­lution. If it is better than the temporary-best solution, the latter is replaced, while the pheromone amount is redis­tributed along the path corresponding to the path of the pre­served solution and the scale factor is accordingly modi.ed to improve the convergence. If there is no improvement over the temporary-best solution, then the pheromone dis­tributions stay centered along the path corresponding to the temporary-best solution, while their shape shrinks in or­der to enhance the exploitation of the search space. If for some .xed number of tries all the ants only .nd paths com­posed of zero-valued offsets, the search process is restarted by randomly selecting a new temporary-best solution and re-initializing the pheromone distributions. 
3 Parameter Tuning 
To obtain the best possible performance on a given prob­lem, one should consider a task speci.c tuning of the pa­rameter setting for the optimization algorithm used. Deter­mining the optimal parameters is an optimization task in it­self, which is extremely computationally expensive. There are two common approaches for choosing parameters val­ues: parameter tuning and parameter control. The .rst ap­proach selects the parameter settings before running the op­timization algorithm (and they remain .xed while perform­ing the optimization). The second approach optimizes the algorithm’s parameters along with the problem’s parame­ters. Here, we will focus on the .rst approach, parameter 
J. Šilc et al. 
tuning. 
A detailed discussion and survey of parameter tuning methods is given by Eiben and Smit [4]. According to this survey, one way to approach parameter tuning is by sampling methods. Sampling methods reduce the search effort by decreasing the number of investigated parameter settings as compared to the full factorial design: the ba­sic full factorial design investigates 2k parameter settings, subject to k parameters, each of which have 2 possible val­ues; in the more general case, parameters can have arbitrary number of values; moreover, an increase in the number of investigated parameters means an exponential increase in the number of parameter settings to be tested. Two widely used sampling methods are Latin-squares [9] and Taguchi orthogonal arrays [12]. However, these are not the most ro­bust sampling techniques, e.g., Latin-squares or Latin hy­percube sampling is good in the case where one of the pa­rameters dominates the algorithm’s performance, while it should be used with care if there are interactions among the parameters. 
Ultimately, we would like to .nd a sampling schema that will be able to detect the interactions among the parame­ters, will be independent from user-speci.ed information regarding the particular parameter values to be considered (typical for factorial design), and will deliver small but rep­resentative sample of the parameter search space. The .rst two requirements are satis.ed by the pure random sam­pling, but the last is not, as random sampling does not guar­antee that the sampled values are evenly spread across the entire domain. The so-called low-discrepancy sequences were specially designed to ful.ll all three requirements. Therefore, Sobol’ sequences, a representative variation of low-discrepancy sequences introduced by Sobol’ [10], was considered for sampling the parameter space of DASA in this study. 
Sobol’ sequences, sampled from a D-dimensional unit search space, are quasi-random sequences of D-tuples that are more uniformly distributed than uncorrelated random sequences of D-tuples. These sequences are neither ran­dom nor pseudo-random: they are cleverly generated not to be serially uncorrelated by taking into account which tu­ples in the search space have already been sampled. For a detailed explanation and overview of the schemas for gen­erating Sobol’ sequences, we refer to [9]. The particular implementation of Sobol’ sampling used in our analysis is based on the Gray code order [5]. 
4 Experimental Evaluation 
Since data mining methods can only discover patterns actu­ally present in the data, the dataset subject to analysis must be large enough and informative enough to contain these patterns, i.e., to describe different types of algorithm’s be­havior. For this reason, we decided to use a simple test function, which matched this requirement and was used for building an example case model. 

Table 1: Parameter settings for DASA* and DASA. 
Algorithm DASA. DASA* 
Parameter D = 20 D = 40 D = 20 D = 40 
m 10105 7 . 0.2 0.2 0.324 0.388 s+ 0.02 0.02 0.201 0.136 s- 0.01 0.01 0.289 0.344 b 10106 8 
Table 2: Median values of the function errors for the Sphere function 
Algorithm  DASA.  DASA*  
FEs  D = 20  D = 40  D = 20  D = 40  
25 × D  16.7  18.3  2.53  9.31  
250 × D  0.0003  0.0021  0  0  
2500 × D  0  0  0  0  
25000 × D  0  0  0  0  

Therefore, the performance of DASA was evaluated on the Sphere function: 
f(x)= |z|2 + f(xopt), (8) 

where z = x - xopt and xopt is optimal solution vector, such that f(xopt) is minimal. Function f(x) is de.ned over D-dimensional real-valued search space x and is scalable with the dimension D. It has no speci.c value of its optimal solution (it is randomly shifted in x-space) and has an arti.­cially chosen optimal function value (it is randomly shifted in f-space). In this study, we considered the Sphere func­tion with respect to two dimensions, D = 20 and D = 40. 
The performance of DASA is dependent on the values of .ve parameters: three real-valued parameters that directly in.uence the search heuristic (s+, s-, and .) and two integer-valued parameters (m and b). Therefore, we con­sidered all of them for tuning DASA performance on the Sphere function for both search space dimensions, D = 20 and D = 40. Using the Gray-code-based Sobol’ genera­tor we generated 5000 parameter settings (5-tuples). Note that the Sobol’ sampling generates numbers on the unit in­terval: in order to obtain the true parameter settings, we mapped these values on the prede.ned search range of pa­rameter values. The latter for each of the .ve tuned param­eters was de.ned as follows: 4 . m . 200, 0 . . . 1, 0 . s+ . 1, 0 . s- . ., and 2 . b . 100. Moreover, the mapped values for the integer-valued parameters m and b were rounded to the closest integer value. Finally, due to implementation reasons, the upper bound of the global scale decrease factor s- was actually limited by the value of the evaporation factor .. 
In the next step, the performance of the Sobol’ sampled parameter settings were tested on the Sphere benchmark function. Due to the stochastic nature of DASA, every parameter setting was used in a multiple-run experimental evaluation. Each run included 25000 × D function evalua­tions (FEs). The number of runs was set to 15. The results gathered by the parameter tuning process are most often subjected to ordinal data analysis, which includes ranking of the different sampled parameter sets according to some calculated statistics, e.g., best or mean performance of the algorithm in some prede.ned number of runs [13]. In this case, performance of the algorithm is expressed in terms of the function error, i.e., the difference between the ob­tained and optimal function value. In order to .nd a setting that will be satisfactory in terms of convergence speed, we captured the error values at four different time points, cor­responding to 25 × D, 250 × D, 2500 × D, and 25000 × D FEs. 
The optimal performing parameter setting was chosen based on the median best performance over all runs aggre­gated over all time points for a given dimension (D = 20 and D = 40). A common approach is to use the mean performance, but we took the median in order to avoid the problems that the mean has when observing large vari­ance in the function values across the runs. More precisely, given a function, an individual rank is assigned to every setting (out of 5000) for the four time points. A single .­nal rank is calculated by ranking the sum of the four in­dividual rankings assigned to the parameter settings. The best-ranked parameter setting for a given dimension de.nes instance of DASA referred to as DASA*. 
The results of DASA tuning subject to ordinal data anal­ysis are presented in Tables 1 and 2. Table 1 reports the tuned parameter settings for both DASA* instances. 
In addition, the default parameter setting for DASA from 
[8] is given as a reference for comparison. The correspond­ing instance is referred to as DASA. . 
Results in Table 2 represent the median values of the function errors, at the four time points, obtained by DASA* and DASA. instances for both dimensions. Note, that er­ror value below 10-8 was treated as zero. The table clearly shows that DASA* is better than DASA. . 


Figure 2: Median error distributions for the Sphere func­tion in the case of D = 40. 
5 Data Mining Analysis 
Parameter tuning of an algorithm leads to a better per­formance, however it is a computationally expensive and problem-dependent task. Considering this, the idea is to extend the simple tuning that delivers a single parameter set and analyze the gathered data in an intelligent way. The intelligent analysis can extract patterns (regularities) in the explored parameter space that de.ne a speci.c behavior of DASA. To this end, data mining methods for automated discovery of patterns in data can be used. As data mining methods can only discover patterns that are present in the data, the dataset subject to analysis must be large enough and informative enough to contain these patterns, i.e., to describe different types of algorithm’s behavior. Related to this, we considered a data mining approach on a represen­tative example, i.e., error model of the Sphere function. 
To begin with, consider the graph in Figure 2 that visu­alizes the Sphere function error distribution. The graph de­picts the distribution of the median error values obtained by 5000 parameter settings at four different points for D = 40. As we are more concerned with the practical signi.cance between a large and a small error value than the statisti­cal signi.cant difference between two actual error values, the error values are discretized into nine intervals, each of which is represented with a color chosen according to the error magnitude between black (error below 10-8) and white (error above 102). The graph clearly shows that the sampled settings determine different DASA performance. As evident, there is a big cluster of parameter settings that solve this function to the aimed accuracy (error below 10-8) in the given time budget (106 FEs). Moreover, subset of this cluster solves the function for an order of magnitude less FEs. Our aim, therefore, is to .nd a (common) descrip­tion of this cluster, in terms of DASA parameter relations, 
J. Šilc et al. 
that represents a good behavior of DASA (as well as what parameter relations lead to a bad DASA performance). 
For this purpose, we formulated the problem as a predic­tive modeling task using decision trees to model the func­tion error values in terms of the values of DASA parame­ters. Since the function error variables are continuous, the task at hand is a regression task. Furthermore, as our goal is to model the behavior of DASA at all time points simul­taneously, the problem at hand is then a multi-target re­gression. To this end, we used Predictive Clustering Trees (PCTs) which are implemented in Clus system [1]. In this case, the median error at the four time points de.ne the four target attributes considered for modeling (with the PCT) in dependence from the descriptive attributes, i.e., the func­tion dimension and the .ve DASA parameters. The re­sulting dataset is composed of 10002 rows described with the 5001 parameter settings (including the default setting) of DASA applied to the two different dimensions, and 10 columns corresponding to the 10 attributes. 
Figure 3 presents a PCT model for the Sphere function error. Each internal node of the tree contains a test on a descriptive attribute, while the leaves store the model pre­dictions for the function error, i.e., a tuple of four values. The predictions are calculated as the mean values of the corresponding error values for the data instances belonging to the particular leaf (represented by a box). In fact, each leaf identi.es one cluster of data instances (the size of the cluster is the value in the small box). The predictive perfor­mance of the model was assessed with 10-cross-validation. Note that this particular model was learned on the com­plete dataset subject to constraints on the maximal tree size of 25 nodes. We did this because the original model con­tained 1643 nodes (of which 822 leaves) and despite its better predictive performance, both training and testing, it was not comprehensible; aiming for an explanatory model, small and comprehensible, we considered the smaller tree obtained with the limitation of the size. The predictive per­formance of both models in terms of Root Relative Mean Squared Error (RRMSE) and Pearson Correlation Coef.­cient (PCC) are given in Table 3. Note that RRMSE rep­resents the relative error with respect to the mean predictor performance, while PCC represents the linear correlation between the data and the model predictions. Good models have RRMSE values closer to 0 and PCC closer to 1. 
The model in Figure 3 outlines 13 clusters of data in­stances, of which two (depicted with light-gray boxes) represent a good DASA performance. According to this model, the number of ants, m, is the most important DASA parameter for its performance on the Sphere func­tion. More precisely, if m> 83, independent of the values of the other parameters, DASA solves the 20-dimensional functional problem for the given time budget. Moreover, if m . 83 another DASA parameters become important as well. For example, if 43 <m . 83 and s+ > 0.009 and D = 20 then DASA solves the function with error 3 × 10-6, while the pattern m . 43 and s.0.040 and s- > 0.656 describes a poor DASA performance regard­


Figure 3: Predictive clustering tree representation of the error model for the Sphere function. 
less of the function dimension. An interesting fact is that, the evaporation factor is not essential for DASA perfor­mance on the Sphere function. Moreover, the model also shows that is more dif.cult to describe the behavior of DASA for the 40-dimensional function problem than the 20-dimensional one. 
Finally, note that the training performance (learned on the complete dataset) of the model in terms of the error and the correlation coef.cient is best for the .rst target, while it gets worse with respect to the other three targets (see Ta­ble 3). This is especially signi.cant if we take into account the testing performance of the model estimated with 10­cross-validation. However, the training performance is ac­ceptable in our case, as we are interested in understanding the behavior of DASA and not aiming to obtain a model for prediction. 
6 Conclusion 
The principle challenge of meta-heuristic design is provid­ing a default algorithm con.guration, in terms of parameter setting, that will perform reasonably well in general (prob­lem) case. However, while it is a good initial choice, the default algorithm con.guration may result in low quality solutions on a speci.c optimization problem. In practice, the algorithms parameters have to be .ne-tuned in order to obtain best algorithm’s performance for the problem at hand, leading to the computational expensive task of pa­rameter tuning. So, if the tuning task is unavoidable, the question is: can we use the results from the parameter tun­ing to extract some knowledge about the algorithm’s be­havior? 
Related to this, the study focused on the problem of tun­ing the performance of the Differential Ant-Stigmergy Al­gorithm (DASA) with respect to its parameters. As it is the case with most of the meta-heuristic algorithms, the oper­ational behavior of DASA is determined by a set (.ve) of control parameters. The existing default setting of DASA parameters [8] is obtained by experimentation with both real and benchmark optimization problems, but not as a result of some systematic evaluation. Furthermore, there is no deeper understanding of the impact of a particular parameter or parameters relations on the performance of DASA. In this context, we performed a systematic evalua­tion of DASA performance obtained by solving the Sphere function optimization problem with 5000 Sobol’ sampled DASA parameter settings regarding two dimensions, 20 and 40. 
Furthermore, we discussed the idea of learning a model of DASA behavior by data mining analysis of the parame­ter tuning results. In this context, we formulated the prob­lem as multi-target regression and applied predictive clus­tering trees for learning a model of DASA behavior with respect to the function error performance. The obtained model revealed that the parameter denoting number of ants is the most important parameter for DASA performance on the 20-dimensional function problem. On the other hand, the evaporation factor is not essential for DASA perfor­mance on the Sphere function. 
Further work will focus on additional experimental eval­uation and data mining analysis of data with respect to more complex functions problems. This idea can be fur­

Table 3: Model performance with respect to RRMSE and PCC 
Measure  RRMSE  PCC  
Training  1643 nodes  25 nodes  1643 nodes  25 nodes  
25 × D  0.166  0.408  0.986  0.913  
250 × D  0.373  0.679  0.928  0.734  
2500 × D  0.487  0.562  0.873  0.827  
25000 × D  0.472  0.546  0.882  0.837  
Mean  0.396  0.557  0.843  0.689  
Testing  1643 nodes  25 nodes  1643 nodes  25 nodes  
25 × D  0.219  0.450  0.976  0.893  
250 × D  0.638  0.817  0.777  0.584  
2500 × D  1.019  1.039  0.304  0.246  
25000 × D  1.053  1.055  0.234  0.218  
Mean  0.806  0.875  0.426  0.312  

ther extended to building models of DASA behavior that will include the optimization problem characteristics (such as multimodality, separability, and ill-conditioning) as de­scriptive attributes as well. The latter can provide insights on how to con.gure DASA performance with respect to the type of the optimization problem. Moreover, these in­sights can serve as a valuable information for improvement of DASA design. 
References 
[1] H. Blockeel, J. Struyf (2002) Ef.cient Algorithms for Decision Tree Cross-validation, Journal of Machine Learning Research, vol. 3, pp. 621–650. 
[2] C. Blum, A. Roli (2003) Metaheuristics in Combina­torial Optimization: Overview and Conceptual Com­parison, ACM Computing Surveys, vol. 35, no. 3, pp. 268–308. 
[3] E. Bonabeau, M. Dorigo, G. Theraulaz (1999) Swarm Intelligence: From Natural to Arti.cial Systems, Ox­ford University Press. 
[4] A. E. Eiben, S. K. Smit (2011) Parameter Tuning for Con.guring and Analyzing Evolutionary Algorithms, Swarm and Evolutionary Computation, vol. 1, no. 1, pp. 19–31. 
[5] S. Joe, F. 	Y. Kuo (2008) Constructing Sobol Se­quences with Better Two-dimensional Projections, SIAM Journal on Scienti.c Computing, vol. 30, no. 5, pp. 2635–2654. 
[6] P. 	Korošec (2006) Stigmergy as an Approach to Metaheuristic Optimization, Ph.D. dissertation, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia. 
[7] P. Korošec, J. Šilc (2008) Using Stigmergy to Solve Numerical Optimization Problems, Computing and Informatics, vol. 27, no. 3, pp. 341–402. 
[8] P. Korošec, J. Šilc, B. Filipiˇc (2012) The Differential 
Ant-stigmergy Algorithm, Information Sciences, vol. 
192, no. 1, pp. 82–97. 
[9] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery (1992) Numerical Recipes, Cambridge Uni­versity Press. 
[10] I. M. Sobol’ (1967) Distribution of Points in a Cube and Approximate Evaluation of Integrals, USSR Com­pututational Mathematics and Mathematical Physics, vol. 7, no. 4, pp. 86–112. 
[11] R. Stoean, T. Bartz-Beielstein, M. Preuss, C. Stoean (2009) A Support Vector Machine-Inspired Evolu­tionary Approach for Parameter Setting in Meta­heuristics, CIOP Technical report 01/09, Faculty of Computer Science and Engineering Science, Cologne University of Applied Science, Germany. 
[12] G. Taguchi, T. Yokoyama (1993) Taguchi Methods: Design of Experiments, ASI Press. 
[13] E.-G. Talbi (2009) Metaheuristics: From Design to Implementation, John Wiley & Sons. 
[14] X.-S. Yang (2008) Nature-Inspired Metaheuristic Al­gorithms, Luniver Press. 

A High Resolution Clique-based Overlapping Community Detection Algorithm for Small-world Networks 
András Bóta 
University of Szeged, Institute of Informatics Address, 

P. O. Box 652., 6701 Szeged, Hungary E-mail: bandras@inf.u-szeged.hu 
Miklós Krész University of Szeged, Juhász Gyula Faculty of Education, Boldogasszony bvd. 6, 6720 Szeged, Hungary E-mail: kresz@jgypk.u-szeged.hu 
Keywords: network science, community detection, overlapping communities 
Received: June 24, 2013 
In this paper we propose a clique-based high-resolution overlapping community detection algorithm. The hub percolation method is able to .nd a large number of highly overlapping communities. Using different hub-selection strategies and parametrization we are able to .ne tune the resolution of the algorithm. We also propose a weighted hub-selection strategy, allowing the algorithm to handle weighted networks in a natural way, without additional .ltering. We will evaluate our method on various benchmarks, and we will also demonstrate the usefulness of our algorithm on a real-life economic case-study. 
Povzetek: Predstavljena je nova hevristika za reševanje evklidskega BDMST problema. Primerjalni testi pokažejo prednosti pred obstojeˇ
cimi metodami. 

1 Introduction 
One of the landmarks in graph theory was the introduc­tion of small-world networks by Watts and Strogatz [31]. They have observed, that in real-life networks, the typical distance between two randomly chosen nodes grows pro­portionally to the logarithm of the number of nodes in the network. Since then, several other properties of real-life networks was discovered. The degree distribution of these networks follows a power-law [2], and the edge distribution is not only globally, but also locally inhomogeneous. This latter feature is called community structure [11]. The goal of community detection is the discovery of this structure. While the phenomenon of communities is well observed, an exact de.nition is dif.cult to .nd. 
In recent years, a large number of community detection algorithms have been proposed. Most of these consider communities to be disjoint vertex sets, and adopt the fol­lowing intuition: They are looking for a partitioning of the nodes, which maximizes the number of edges between the nodes inside the sets, and minimizes them between the sets. It is also a goal to .nd meaningful communities, i.e. they discard trivial solutions of the problem (like a single com­munity containing all of the vertices). Newman proposed modularity [27] as an ef.cient way to measure the good­ness of disjoint communities. A comprehensive review of community detection can be found in [10]. 
The traditional de.nition of community allows disjoint vertex sets only. Based on the observation that in real-life networks, nodes can belong to multiple communities, Palla et al. introduced the concept of overlapping community de­tection and proposed the clique percolation method [29] as a solution. The idea of .nding maximal cliques and joining them according to some criteria is the basis of several over­lapping community detection algorithms [17, 21]. Other approaches are based on block models [12, 7], edge cluster­ing [9, 1], label propagation [13] or optimization according to some .tness function [25, 20]. 
Measuring the goodness of overlapping community de­tection algorithms is complicated, since there is no agree­ment on the de.nition of an overlapping community. The speci.cations of different applications depend mainly on the ratio of overlaps between communities: several ap­proaches require only a loose relaxation of the original ”non-overlapping” de.nition in such way that occurence of nodes belonging to multiple communities is strongly restricted [13]; other concepts prefer highly overlapping community structure [20]. The resolution of the methods are closely tied to the ratio of overlaps. A highly overlap­ping community structure is often associated with a large number of relatively small communities, however the op­posite is not always true. Hierarchical or multiresolutional methods combine these approaches. 
Corresponding to this, the output of the above mentioned algorithms can be fundamentally different. There are basi­cally two types of evaluation in the literature: one can use some kind of benchmark network like in [18, 22, 26, 32], and compare the results to the already known community structure of the network [14, 28]. Another option is using a real-life application as an example, similar to a case-study. 

In this paper, the authors propose the hub percolation overlapping community detection method. A node, that is a member of many adjacent cliques is considered more im­portant. We refer to these nodes as hubs. We expand and join cliques if they contain the same hubs. One of the ad­vantages of this method is, that both the hub selection and the joining criteria is adjustable. This allows us to discover different kinds of community structures from large, loosely overlapping groups to ones with a dense, highly overlap­ping structure. We also propose a hub-selection strategy able to handle weighted networks in a natural way with­out the need for .ltering or pruning edges. Finally, we will rely on the framework proposed by Pluhár et al. in [3], and show how several popular algorithms can be represented in it. 
We will use well-known benchmark networks [29, 26, 11] to demonstrate the difference between hub selection strategies in terms of community sizes, the size of over­laps and the number of singletons: nodes without com­munities. Then we will evaluate the performance of our method in two different ways. We will use the community based graph generator of Lancichinetti and Fortunato [18] to compare the results of our method to the OSLOM al­gorithm of the same authors [20], the COPRA method of Gregory [13], and the clique percolation method of Palla et al. [29]. We will also present a case study: we will exam­ine the communities of an economic network constructed from the Hungarian company register. We will focus our attention on three aspects of the companies: the geograph­ical location of them, the industrial sector they belong to and the age of the companies. 
2 General framework 
The authors of [3] described a general framework for over­lapping community detection. In this section we summa­rize their approach. Here, and throughout the paper, by a graph G we mean an undirected simple graph with vertex (or node) set V (G) and edge set E(G). Edges might have arbitrary weight. 
According to the framework introduced in [3], most community detection algorithms consist of two phases. Taking an input graph G, the .rst phase constructs a hyper-graph F =(V, H), where V (F)= V (G), and H. 2V . The elements of H are considered the building blocks of communities. The second phase adds a distance function d to set H, creating a metric space M =(H,d). Using function d, a clustering algorithm creates a set of clusters 
C. Finally, the arising clusters are associated to the subsets of V such that Ki = .H.Ci.CH, where Ki, the ith com­munity corresponds to Ci, the ith cluster and Ki is just the union of the vertex set of those hyperedges that belong to Ci. 
It is easy to show, that this framework applies to most 
A. Bóta et al. 
community detection algorithms. In the case of the clique percolation method [29], H contains the k-cliques1 of the original graph, and function d is: 
1, if |Ki . Kj| = k - 1,d(Ki,Kj )= 
., otherwise 
In the same paper [3], the authors have proposed the N++ community detection algorithm with a general distance function, where H is the same as above and d(Ki,Kj)=1 only if |Ki . Kj|. c, where c is a param­eter of the algorithm. In other cases d(Ki,Kj)= .. This method has proven its usefulness in applications [6, 16]. 
It is also possible to describe non-clique based methods using this formulation. In the case of COPRA [13], each element of H initially only contains one unique vertex v . V (G). In the second phase, these are joined according to a belonging coef.cient. A threshold is introduced to provide a lower bound for community membership. 
3 The hub percolation method 
The motivation for creating an advanced community de­tection algorithm came from our previous work with the general framework of community detection [3]. Our aim was to create a .exible clique-based method taking into consideration our experiences with the clique percolation method [29] and the N++ method [3]. Much of the details of the algorithm described in this section comes from expe­riences gained during test runs on well-known benchmark networks like [32, 29, 26, 22]. 
The hub percolation method has two simple ideas at its core. A natural property of most approaches for overlap­ping community detection2 is that cliques (fully connected subgraphs) are considered to be the purest communities. Therefore our method uses cliques at the beginning of the building process. An important observation on real-life net­works is, that inside a community some members are more important than others with respect to the role of the nodes in connecting different communities. We will denote these nodes as hubs. In the building process the cliques of the graph are extended according to a limited percolation rule: two k-cliques are joined if they share k - 1 vertices. As a result of this process, the set of extended cliques consists of the building blocks of community detection. The joining phase of our method merges these extended cliques if they share the same hubs. Considering these ideas, an outline of the hub percolation algorithm is as follows: 
1. 
Find the set C of all maximal cliques of size greater or equal than 3 on graph G. 

2. 
Select the set of hubs H. 

3. 
Create the set of extended cliques C. . 


1Fully connected subgraphs containing exactly k nodes. 
2In the following chapters, we will refer to overlapping community detection simply as community detection. 

4. Compute the set of communities K: Take the union of extended cliques if one of them contains all the hubs in the other one. 

Finding the set of all maximal cliques in a graph is a well-studied NP-hard problem of graph theory. Unfortu­nately an n-vertex arbitrary graph may contain i3n/3 maxi­mal cliques in the worst case [24]. Because of their unique structure, this number is signi.cantly lower in small world networks allowing algorithms like in [30, 4, 8] to list the set of maximal cliques in reasonable time even for large net­works. In this work we used the modi.ed Bron-Kerbosch algorithm described in [8]. 
The hub selection strategy is an important part of the algorithm. Hubs represent the locally important nodes in the network. As a consequence, whether a node is a hub should depend on the t-neighborhood3 of the given node, where t is a small number. In our interpretation hubs con­nect communities, therefore the deciding factor in hub se­lection should be the number of cliques the vertex belongs to. Each node v is assigned a hub value hv according to the above rule, then some of them are selected if their value is higher than the average or median hub values in their t-neighborhood. It is also possible to extend the selection strategy to weighted networks. We will discuss hub selec­tion in the next subsection. 
In our method, cliques of the network are extended with a a one-step percolation rule, then merged if they share the same hubs. Introducing the .ltering parameter k . 2, let us consider all cliques of size equal to k on the subgraph induced by the set of hubs H. We will denote the set of k-cliques on G[H] as CH . Then, we expand the elements of CH according to a one-step percolation rule. Let Ce denote the set of merged cliques ce = cH .c0 .···.c. with cH . CH , c0,...,c. . C and |c0 . cH |. 2,..., |c. . cH |. 2 
4 
. 

The last step of our method corresponds to the joining phase of the community detection framework. We merge elementary communities if they contain the same hubs, more precisely, we take the union of two elementary com­munities ce0 and ce1 if ce0 . H . ce1 . H. We iterate this process by adding the new merged clique to Ce and removing the original ones. At the end of the process Ce contains the communities of graph G. Note, that depend­ing on the hub selection strategy Ce may contain duplicate members, the merging process eliminates this problem as well. Each element of Ce is a union of the cliques of G and contains at least k hubs. We will refer to the members of Ce as elementary communities. 
The hub percolation method 
Input : Graph G, parameter k 
Informatica 39 (2015) 177–187 179 
1. 
Find all maximal cliques of graph G using any exact algorithm or heuristic. Let C denote the set of cliques. 

2. 
For all v . V (G), let hv = |Hv|, Hv = {h|v . h, h . C}. 

3. 
Select the set of hubs H according to the hub selection strategy. 

4. 
Let CH denote the set of k-cliques on the subgraph induced by the hubs G[H]. 

5. 
Create the set of extended cliques Ce according to the following rule: for all cH . CH .nd all cliques c . C where |c . cH |. 2. Let c0,...,c. denote the cliques satisfying this criterion. Create the union of cliques ce = cH . c0 .· ··. c., and add ce to Ce. 

6. 
For all ce0 ,ce1 . Ce add the union of them to Ce if ce0 . H . ce1 . H, and remove ce0 and ce1 from Ce. Iterate until there are no more merges. 

7. 
The set Ce contains the communities of graph G. 



Figure 1: The communities of Zachary’s karate club net­work [32]. Hubs are marked as diamond shapes. Nodes with multiple colors indicate overlapping nodes. The me­dian hub selection strategy was used with k =2. Nodes 9, 3, 33 form an additional community and node 9 belongs to three communities. 
It is easy to see, that the general framework proposed in applies to the hub percolation method. The edges of the hypergraph correspond to the extended communities in Ce, while the distance function is 
3A t-neighbothood of a vertex v is the set of vertices N

tv
, where u .

. ..
if Ki . H . Kj . H or

N
t. 
tv
only if the length of the shortest path from u to v is less or equal then 
1, 
d(Ki,Kj)= Kj . H . Ki . H, 
4Our experiances on various benchmark networks indicate that this 

..
value gives the best performance. 
otherwise
. 

The community structure of Zachary’s karate club net­work [32] can be seen on Figure 1. This network is a well-known social network, that represents friendships between the members of the club. Our method identi.es .ve com­munities5, the most interesting ones being the green and blue ones, as well as the one represented by the red triangle. During Zachary’s observation the club split into two parts, because some friendships were broken. Most community detection algorithms are able to identify these subgroups even before the actual split. In our case the borders of the green and blue communities represent the borders between the two subgroups, and the red group identi.es the edge that was broken when the split occurred. 
3.1 Hub selection strategies 
Hub selection is a crucial part of the algorithm. As we have mentioned before, each node in the graph is assigned a hub value based on the number of cliques it belongs to. Based on this value the selection strategy chooses the set of hubs H. Hubs represent ”locally important” nodes so the criterion of the hub property of nodes or “hubness" should depend only on the tight neighborhood of the node. In our interpretation, this criterion depends on some simple sta­tistical property of the .rst or second neighborhood of the given node, namely the average or median of neighboring hub values. 
At the beginning of our work on several famous bench­mark networks [26, 32] we have quickly found out, that the 2-neighborhood strategies are often not robust enough to select the appropriate hubs: hubs were relatively rare, which resulted in small overlaps and a larger than accept­able number of nodes without community memberships. A general experience was, that hubs should be ”common enough”, so that most of the nodes have one or more in their direct neighborhood. 
Considering this, the 1-neighborhood median selection strategy provided the best community structure on these benchmark networks. This may not be the case, however, with other real-life networks. In order to extend our algo­rithm to handle different kinds of networks, we can gener­alize suggest another hub selection rule. Still considering only the direct neighborhood of nodes, we calculate the av­erage hub value and multiply it with a parameter q> 0. If the hub value of the node is higher than the mean, we select it as a hub. This approach makes hub selection more .ex­ible, allowing the algorithm to adapt to different require­ments. A small value of q selects higher number of hubs re­sulting in larger communities with greater overlaps, while increasing q has the opposite effect. This also allows the al­gorithm to discover several layers of community structure on the network. 
Finally, hub selection can be extended to weighted graphs in a natural way. As before, the hub value of a node is the number of cliques it belongs to. Then the values are 
5The median hub selection strategy was used with k =2, see subsec­tion 3.1. 
A. Bóta et al. 
multiplied with the strength6 of the node. After this, the process is the same as in the previous strategies. 
In summary we propose the following hub selection strategies: 
– 	
1-neighborhood median: A node is selected as a hub if its hub value is greater than the median of hub values in its one-neighborhood. 

– 	
1-neighborhood mean with multiplier: A node is se­lected as a hub if its hub value is greater than the mean of hub values in its one-neighborhood multiplied with a parameter q> 0. 

– 	
1-neighborhood weighted mean with multiplier: The hub values are multiplied with the strength of the nodes. Beside this, the strategy is the same as above. 


As a recommendation, the median strategy should be tried .rst, and if it does not give satisfactory results the average strategy should be used with q =1 initially, de­creasing or increasing its value in small steps depending on the requirements. In practice 0 <q< 2 seems to hold. 
3.2 Implementation 
The bottleneck of the algorithm is .nding all maximal cliques in graph G. A general graph with n vertices may contain up to 3n/3 maximal cliques. In correspondence, the original algorithm of Bron and Kerbosch has a worst-case running time of O(3n/3). In small-world networks however, the number of maximal cliques is smaller by magnitudes, decreasing the running time of the algorithm. Furthermore, re.nements of the Bron-Kerbosch algorithm have been published in recent years, enabling the use of this method on large sparse networks [30, 8]. In cases when even faster computation is required, there are existing heuristics for clique search [5]. 
The hub value of each node can be calculated in a single pass on the set C of cliques. All of the hub selection strate­gies suggested in the previous section have a local fashion: they can be computed in a single pass on the vertices and their one-neighborhoods. 
The computation of CH does not require a repeated run of the Bron-Kerbosch algorithm on G[H], since the cliques of G contain the cliques of G[H] as subsets. Therefore it is enough, that for each c . C, if |c . H|. k, simply add all k-combinations of c to CH . Depending on the size of the network and the hub selection strategy, H may be quite large, but the use of .ags on the nodes of the graph G to signal the hub property can reduce the computation of this step to a single pass on C. The percolation step can be executed by computing the 1-neighborhood of each 
+ 
cH . CH . Let c contain all of the direct neighbors of 
H 
vertex set cH , and initially let ce ‹ cH . For all nodes 
+ 
v . c , if |{v}+ . cH |. 2 add v to ce. Again this step 
H 
can be computed by making a single pass on CH . 
6The sum of the weights on all adjacent edges. 

In order to make the joining step, the computation of the hubs of each elementary community is required: for each ce . Ce let cHe = ce . H. Let CHe denote the sets of hubs of the elementary communities. An important remark is, that CHe .CH since in the previous step additional hubs 
= may have been added to the elements of CH . Removing the ”sub-hubs” (hubs being contained in other elements of CHe ) can be executed in quadratic time in worst case. In general, performance can be improved by sorting CHe in descending order according to the sizes of cHe . CHe . After this, starting from the .rst element, remove all the sets of vertices from CHe which are subsets of the .rst one, then repeat for the second, third, ... until no more vertex sets can be removed from CHe . Finally, the elements of Ce and CHe must be compared: for all cHe . CHe .nd all ce . Ce where ce . H . cHe and take the union of these vertex sets. 
We can conclude, that the two most time-consuming steps of the method is the computation of C and CH , all other operations take at most quadratic time7. The algo­rithm of Eppstein and Strash [8] is able to list all maximal cliques in large sparse networks in reasonable time. For faster computation heuristics [5] or the use of quasi-cliques 
[23] can be applied. The size of CH depends on two fac­tors: the size of H and k. The former is governed by the hub selection strategy, the latter is a parameter of the al­gorithm. Choosing a different hub selection strategy, that produces a smaller number of hubs, or decreasing k may speed up computations. 
4 Sensitivity to parameters 
We have created the hub percolation method with the intent to provide a versatile tool for community detection. There­fore, an important question arises: how does the hub selec­tion strategy and the .ltering parameter in.uence the com­munity structure found by the algorithm? For the purpose of examining their effect, we will use several well-known benchmark networks including the word association graph of Palla et al.[29], a scienti.c collaboration network [26] and a graph of American football games [11]. 
The .rst network we will examine was created by New-man [26] on the condensed matter archive at www.arxiv.org based on preprints posted to the archive between January 1, 1995 and March 31, 2005. The graph is undirected, un­weighted and contains 39540 nodes and 175683 edges. We will evaluate the median and average hub selection strate­gies and we will also experiment with different values for 
k. We will measure the number of communities, the av­erage overlap8, the number of singletons9 and hubs in the network. We will also present the community size distribu­tion for each selection strategy. 
7We have also conducted experiments, and found that clique detection may take from 60% up to 95% of the running time. 
8The sum of the cardinalities of all community divided by the number of nodes. 
9Nodes without communities. 
Informatica 39 (2015) 177–187 181 


Figure 3: Upper left: Number of communities for the aver­age hub selection strategy with q =0.3,..., 1.1 and k =2; Upper right: The percentage of nodes without communi­ties; Lower left: The average overlap; Lower right: The percentage of hubs in the network. 
On Figure 2 we have compared four different hub selec­tion properties: the median strategy and the average strat­egy with values q =1, 0.8, 0.5. The number of commu­nities is the greatest and the number of singletons is the lowest with the median strategy and the average strategy with q =0.5; these strategies provide the greatest cover on the network. The number of hubs is also the greatest with these strategies: roughly one in three nodes, this con­.rms our expectations, that hubs should be ”common”. We can see, that the average overlap and the number of sin­gletons increases with k, while the number of communi­ties does not change. The reason for the above fact is that by increasing k, the nodes are concentrated in highly over­lapping communities keeping the number of communities constant, while many nodes are left out of the community building process. 
We will further examine the average hub selection strat­egy with k =2 on Figure 3. The main observations remain the same with higher values for k. As before, the num­ber of hubs and communities grows inversely proportional 




number of communities with size greater than 150. 


Figure 5: The community size distribution of the average hub selection strategy with different values of q, k =2. Left: The number of communities with size below 150. Right: The number of communities with size above 150. 
with q, while the number of singletons grows proportion­ally with it. The average overlap slowly decreases when q is increased, indicating that decreasing the number of hubs causes communities to become smaller and scarcer. 
We can see, that the community size distributions follow a power-law on Figures 4 and 5. The median hub selection strategy is depicted with different values for k. Increasing k results in much larger communities: With k =2 , The largest community had 255 members, with k =4 the maxi­mum was 869. This con.rms our previous observation, that increasing k creates a highly overlapping community struc­ture. Similar observations can be made with the average hub selection strategy. Increasing q decreases the number of communities evenly among the community sizes, even the size of the largest communities does not change much. 
A strict requirement for all community detection algo­rithms should be, that the number of nodes left without community memberships should be minimized. Therefore we can conclude, that the .ltering parameter should be kept as low as possible, and the ratio of hubs should be above 30%. 
We have measured the running time of our method as well10. The results for the average hub selection strategy with different values of q and k can be seen on Figure 6. We have seen before, that decreasing q increases the num­ber of hubs – the size of H and CH . This directly increases the computational time of the joining phase. The .ltering parameter k also has an impact on the running time of the method, since it in.uences the size of CH . As a conclu­sion we can say, that the .ltering parameter should be kept as low as possible, and the average hub selection strategy 
10We have implemented our method in JAVA, and we have used a com­puter with an Intel i7-2630QM processor, and 8 gigabytes of memory. 
Figure 6: The running time measured in seconds with the average hub selection strategy and different values of q and 
k. 
should be used to further re.ne the results of the algorithm. 
We can draw similar conclusions on the other two net­works, with a few exceptions. The relationship between the ratio of hubs, the community size and the average overlap is the same in all networks. The ratio of singletons shows a similar behavior as it grows inversely proportional to the ratio of hubs. There is a difference however; the graph of football games contains no singletons for the majority of the parameter con.gurations, while the ratio of singletons never goes below 30% in the word association network. This can be explained by the difference in the structure of the networks. The graph of football games is an union of cliques by de.nition, while word associations do not have this property. Since our method is clique-based, it is able to cover all nodes of the former test set, while in the lat­ter case nodes not part of any triangles are left out of the building process. 
The relationship between the ration of hubs and the hub selection strategies is also similar, that is for the average se­lection strategy increasing q decreases the number of hubs. However, the exact pairs of these values change together with the networks. For example setting q =0.5 results in 35% of nodes being selected as hubs on the collaboration network, 21% on the word association network and 90% on the graph of football games. Therefore in any application, it is important to .nd the hub selection strategy that pro­duces the ratio of hubs so that the number of communities, the size of the overlaps and the number of singletons move according to the speci.cations of the application. 
We have previously concluded that the .ltering param­eter should be kept as low as possible to reduce both the ratio of singletons and the computation speed. As we will see below, there are some situations where a higher value is desirable. In the next chapter we are going to examine networks with a large number of highly overlapping com­munities. 
5 	Performance on benchmark networks 
For the purpose of evaluation, we have used benchmark networks created with the graph generator of Lancichinetti Figure 7: The performance of hub percolation compared to CPM and OSLOM with µt =0.1. 




and Fortunato [18]. We have generated both weighted and unweighted networks, with the following parameters: 
– 	
We have created undirected graphs with |V (G)| = 1000 

– 	
The average degree was 15 

– 	
The maximum degree was 50 

– 	
The exponent of the degree distribution was -2 

– 	
The minimum community size was 3 

– 	
The maximum community size was 25 

– 	
The exponent of the community size distribution was -1 

– 	
The mixing parameter µt was between 0.1 and 0.2 

– 	
The fraction of overlapping nodes on was between 0.3 and 0.9 



A detailed description of the used model and its parame­ters can be found in [18]. We have selected the parameters above, because they are close to the recommendations of Lancichinetti and Fortunato, yet they provide a challenge to our method. Again following the recommendations of the above authors, we have used mutual information [19] to measure the similarity between the communities given by our method and those of the benchmark. Because of the probabilistic nature of the benchmark we have generated 10 different networks for each parameter con.guration and averaged the similarity measurements. 
We have compared the performance of our method to that of the clique percolation method and OSLOM. We have tried several values for k-clique percolation, and have found that k =4 clearly provides the best results, therefore we have used this parameter setting for comparison. We have also made comparisons with COPRA but found, that the above methods are clearly superior on these benchmark networks, so we have omitted these results from the .gures. 
On Figures 7 and 8 we can see that the best results were provided by the 1-neighborhood average hub selec­tion strategy with low q values and k =4. We can also see, Figure 9: The performance of hub percolation compared to CPM and OSLOM with µt = µw =0.1. 

that our method reaches peak performance at q =0.1, but the selection of q has little in.uence on the results. The me­dian selection strategy performs poorly on these networks, and increasing or decreasing k worsens performance. If we compare our method to CPM and OSLOM, we can con­clude, that hub percolation gives better results on networks with a high number of overlapping nodes. 
Our observations remain the same when using the weighted benchmarks of the same authors with the recom­mended parameters µt = µw and ß =1.5. Low values of q and k =4 gives the best results for hub percolation, and 4-clique percolation with a weight threshold l =1.5 is the best for CPM. As before, hub percolation gives bet­ter results on networks with a high number of overlapping nodes. 
6 Case-study: an economic network 
In this section, we will examine the community structure of a speci.c economic network constructed from the Hun­garian company register. We will consider a network of companies: each vertex is a special type of company (Ltd.), and the companies are connected if they share a common owner (or member in the case of Ltd.’s). We will call this network as an intersection network, because two vertices are connected, if the sets of owners associated with them have a non-empty intersection. Due to the changes in the regulations governing the company register’s construction, there are large amounts of missing and erroneous data. The register’s sometimes has unordered structure so the identi.­cation of the companies, owners and the construction of the graph itself required the application of several data mining methods, data cleaning and .ltering. 



The resulting graph is not connected, there is a high number of small disconnected components in it, but for­tunately it contains a giant component as well. The small components often cannot be divided into two or more com­munities, thus they do not provide useful information about the structure of the graph. Therefore, in our analysis we will consider only the giant component. This graph is a small-world network with the previously mentioned prop­erties. It has 239685 vertices and 1423080 edges. Depend­ing on the hub-selection strategy, our method was able to discover the community structure of this network in 5-7 hours11. We have considered comparing our method with CPM on this dataset, but the publicly available12 imple­mentation was unable to produce results. 
There are several points of interests regarding the com­munity structure of the network. In this paper, we are going to focus on three of them. The .rst one is the geographical 
11On the same hardware as above. 
12We have used CFinder [29] downloaded from http://c.nder.org/ 

A. Bóta et al. 
location of these groups and companies inside them. Our main question is, are the communities of the graph local in a geographical sense? Using the register, we can as­sign zip-codes to the companies, and by counting the num­ber of different zip-codes inside the community – the fre­quency of individual zip-codes, we can easily address the above question. We can further divide the frequency of the most frequent location by the size of the community, and by averaging this fraction over all of the communities we can represent the locality of these communities as a simple number. The structure of the zip code also allows us to .ne-tune the resolution of the analysis. The Hungarian zip-code contains four digits: the .rst one divides the country into nine large regions, the .rst two identi.es 80 sub-regions. On Figure 11 we can see the computed average locality of the communities. Both the accurate locations – all digits of the zip-code – and the sub-region classi.cation system is used. We can conclude, that the communities are local indeed; in average 77% of companies inside communities belong to the same sub-region, and even in the case of the accurate locations, this percentage does not go below 55%. This implies, than companies owned by the same people tend to stay in the same geographical area. It is important to emphasize, that we are observing a special form of compa­nies: the Ltd.’s. Our results makes sense, because this com­pany form is popular for small companies, that do not have the resources to cover a large area. On Figure 11 we can also see a comparison between the different hub-selection strategies13. Even though the number of communities and the size of overlap changes according to the observations in the previous section, all strategies gave a similar stable performance. 
We can perform the same analysis considering the indus­trial sectors the companies belong to. Do the communities of the graph belong to similar industrial sectors? The sec­tor classi.cation numbers for the individual companies are available, but due to changes in regulation it is impossi­ble perform a high-resolution scan. On the other hand we can make use of a rough classi.cation system containing 118 different industrial sectors. The method is the same as before: we compute the most frequent sector for each community, and we average the relative frequencies over all communities. As a result we can say, that in average 84% of the companies inside the communities belong to the same industrial sector. The communities are even more “local” to the industrial sector most of their members be­long to, than to their geographical location. The reason for this is similar to the previous one: small companies tend to specialize, and it is rare for an owner to have an inter­est in multiple sectors. Again, we have compared different hub-selection strategies and found, that they have similar performance. 
We can see a small example of this behavior on Figure 
12. The whole economic graph is too large to visualize, so we are going to take a look at a small subgraph of three communities. The red community contains companies fo­
13k =2 was used in all experiments. 


cusing on printing services, the blue one on distributing in general, while the companies in the yellow one are cen­tered around public utilities in real estate and engineering. A huge overlap can be seen between the red and yellow companies: vertices in the non-overlapping part of the red groups are focused on distributing, while the companies in the overlap are either copy shops, smaller publishing com­panies or hardware and electronics retail shops. 
Our last point of interest is the age of the companies. Since the date of establishment is available for all compa­nies, we can ask the question: Were the companies inside the communities of the graph established in a short time pe­riod? We can answer this question by computing the stan­dard deviation of the dates of establishments for all com­panies. The expected value of the standard deviation inside the communities is 5 years. This relatively large value in­dicates, that the establishment of these companies is spread in time over a considerable interval. 
As a conclusion we can say, that our method is capable of identifying communities that share a common geographical location and industrial sector. 
7 Conclusions 
In this paper, we have introduced the hub-percolation method: a clique-based high-resolution overlapping com­munity detection algorithm. This method is based on two observations: cliques are the most natural representations of communities, and some vertices are crucial in the birth of communities: they connect different sub-communities together by forming bridges between them. There are mul­tiple ways to select these vertices; that authors have sug­gested several hub-selection strategies, some of them have tunable parameters. The method also has a .ltering param­eter k which in.uences the size and structure of the over­laps between the communities. Adjusting k and the hub-selection strategy allows the user to apply this method to a variety of small-world networks with different densities. It also allows the user to discover several layers of com­munity structure on the same network. We have examined the effect of different parameter choices on several well-known benchmark networks. We have concluded, that the selection strategy should be chosen so that 30-50% of the 
Informatica 39 (2015) 177–187 185 
vertices are selected as hubs and k should be kept low to minimize the number of singletons. 
We have shown two ways to measure the goodness of the hub-percolation algorithm. One of them was an eco­nomical case-study, a network where the vertices represent companies, or more precisely Ltd.’s, and the companies are connected if they share one or more members. Our method is able to identify communities, that are geographically lo­cal and belong to the same industrial sectors in reasonable time considering the size of the network. We have also used benchmark graphs created with the graph generator of Lancichinetti and Fortunato [18]. Using these networks, we have compared our method with the well-known clique percolation algorithm of Palla et al. We have concluded, that in average the two methods have similar performance, but hub-percolation gives better performance on networks with a high number of overlapping nodes. 
Finally, a slight adjustment in the hub-selection strategy allows us to handle weighted networks without the need to .lter the graph edges according to some possibly non­trivial weight limit. Using the previously mentioned graph generator, we have created weighted benchmark graph, and compared the goodness of our method with those of the weighted clique percolation algorithm. Our conclusions were the same as before: similar performance in average, better results with high overlaps. 
Acknowledgement 
The .rst author was supported by the European Union and the European Social Fund through project FuturICT.hu (grant no.: TÁMOP-4.2.2.C-11/1/KONV-2012-0013) 
The second author was supported by the European Union and co-funded by the European Social Fund. Project title: “Telemedicine-focused research activities on the .eld of Mathematics, Informatics and Medical sciences.” Project number: TÁMOP-4.2.2.A-11/1/KONV-2012-0073 . 
References 
[1] Y-Y. Ahn, J. P. Bagrow, S. Lehman, Link communi­ties reveal multiscale complexity in networks. Nature, 466(7307):761–764, 2010. 
[2] A-L. 	Barabási, R. Albert, Emergence of Scal­ing in Random Networks. Science, 286(5439):509– 512,1999. 
[3] A. Bóta, L. Csizmadia, A. Pluhár, Community detec­tion and its use in Real Graphs. Proceedings of the 2010 Mini-Conference on Applied Theoretical Com­puter Science, 95–99, 2010. 
[4] F. Cazals, C. Karande, A note on the problem of re­porting maximal cliques, Theoretical Computer Sci­ence, 407(1):564–568, 2008. 

[5] James Cheng, Linhong Zhu, Yiping Ke, and Shumo Chu, Fast algorithms for maximal clique enumeration with limited memory. Proceedings of the 18th ACM SIGKDD. ACM, New York, 1240–1248, 2012. 
[6] A. Csernenszky, Gy. Kovács, M. Krész, A. Pluhár, T. Tóth, The use of infection models in accounting and crediting. Challenges for Analysis of the Economy, the Businesses, and Social Progress Szeged (2009) pp. 617–623.. 
[7] A. Decelle, F. 	Krzakla, C. Moore, L, Zdeborova, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E, 84(6):066106, 2011. 
[8] D. Eppstein, D. Strash, Listing all maximal cliques in large sparse real-world graphs. Experimental Algo­rithms, Springer Berlin Heidelberg, 364–375, 2011. 
[9] T. 	Evans, R. Lambiotte, Line Graphs, Link Parti­tions and Overlapping Communities, Phys. Rev. E, 80(2):016105, 2009. 

[10] S. Fortunato, Community detection in graphs. Physics Report, 486(3):75–174, 2010. 
[11] M. Girvan, M. E. J. Newman, Community structure in social and biological networks. Proc. Natl. Acad. Sci., 99(12):7821–7826, 2002. 
[12] P. K. Gopalan, D. M. Blei, Ef.cient discovery of over­lapping communities in massive networks. PNAS, 110(36):14534-14539, 2013. 
[13] S. 	Gregory, Finding overlapping communities in networks by label propagation. New J. Phys., 12(10):103018, 2010. 
[14] E. Griechisch, A. Pluhár, Community Detection by using the Extended Modularity. Acta Cybernetica, 10:69–85, 2011. 
[15] D. Kempe, J. Kleinberg, E. Tardos, In.uential Nodes in a Diffusion Model for Social Networks. Pro­ceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP), Springer-Verlag (2005) 1127–1138. 
[16] M. Krész and A. Pluhár, Economic Network Analysis based on Infection Models. To appear in Encyclope­dia of Social Network Analysis and Mining, Springer (2014). 
[17] J. M. Kumpula, M. Kivela, K. Kaski, J. Saramaki, Se­quential algorithm for fast clique percolation. Phys. Rev. E, 78(2):026109, 2008. 
[18] A. Lancichinetti, S. Fortunato, Benchmarks for test­ing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E, 80(1):016118, 2009. 
A. Bóta et al. 
[19] A. Lancichinetti, S. Fortunato, J. Kertész Detecting the overlapping and hierarchical community struc­ture in complex networks. New Journal of Physics, 11(3):033015, 2009. 
[20] A. Lancichietti, F. Radicchi, J. J. Ramasco, S. Fortu­nato, Finding statistically signi.cant communities in networks. PLoS One, 6(4):e18961, 2011. 
[21] C. Lee, F. Reed, A. McDaid, N. Hurley, Detecting highly overlapping community structure by greedy clique expansion. Preprint, 2010. 
[22] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, S. M. Dawson, The bottlenose dolphin com­munity of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54(4):396-405, 2003. 
[23] V. Maniezzo, R. Battiti, J-P Watson, On Effectively Finding Maximal Quasi-cliques in Graphs. In Learn­ing and Intelligent Optimization, Lecture Notes in Computer Science (5313) 41–55, Springer Berlin Heidelberg, 2008. 
[24] J. Moon, L. Moser, On cliques in graphs. Israel Jour­nal of Mathematics, 3(1):23-28, 1965. 
[25] T. Népusz, A. Petróczi, L. Négyessy, F. Bazsó, Fuzzy communities and the concept of bridgeness in com­plex networks. Phys. Rev. E, 77(1):016107, 2008. 
[26] M. E. J. Newman, The structure of scienti.c collabo­ration networks. PNAS, 98(2):404–409, 2001. 
[27] M. E. J. Newman, M. Girvan, Finding and evaluat­ing community structure in networks. Phys. Rev. E, 69(2):026113, 2004. 
[28] V. 	Nicosia, G. Mangioni, C. Carchiolo, M. Mal­geri, Extending the de.nition of modularity to di­rected graphs with overlapping communities. Jour­nal of Statistical Mechanics: Theory and Experiment, 3:P03024, 2009. 
[29] G. Palla, I. Derényi, I. Farkas, T. Vicsek, Uncovering the overlapping community structure of complex net­works in nature and society. Nature, 435(7043):814– 818, 2005. 
[30] E. Tomita, A. Tanaka, H. Takahashi, The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Com­puter Science, 363(1):28–42, 2006. 
[31] D. J. Watts, S. H. Strogatz, Collective dynamics of small-world networks Nature, 393(6684):440–442, 1998. 
[32] W. W. Zachary, An information .ow model for con­.ict and .ssion in small groups. Journal of anthropo­logical research, 452–473, 1977. 

History-based Approach for Detecting Modularity Defects in Aspect Oriented Software 
Hanene Cherait and Nora Bounour LISCO Laboratory, BadjiMokhtar – Annaba University P.O. Box 12, 23000 Annaba, Algeria E-mail: {hanene_cherait, nora_bounour}@yahoo.fr 
Keywords: modularity defects, aspect oriented programming, crosscutting concerns, frequent itemset mining, logical coupling, refactoring. 
Received: May 4, 2014 

The evolution of Aspect oriented (AO) software would degrade and modify its structure and its modularity. In this scenario, one of the main problems is to evaluate the modularity of the system, is the evolved AO software still has a good modularity or not? Unfortunately, this research area is not explored yet. This paper presents a history-based approach that detects modularity defects in evolved AO software. It is a two-step automated approach: 1) in the first step, it applies data mining over an AO software repository in order to detect logical couplings among its entities. It analyses fine-grained logical couplings between AO software entities as indicated by common changes. 2) These last are then analysed to detect modularity defects in the AO software system. The approach focuses on the evaluation of an AO system’s modularity and points out potential enhancements to get a more stable one. We provide a prototype implementation to evaluate our approach in a case study, where modularity defects are detected in 22 releases of three well-known AspectJ systems: Contract4J, Health-Watcher and Mobile-Media. The results show that the approach is able to detect logical couplings among aspect entities, as well as modularity defects that are not easily (or not) detectable using static source code analysis. 
Povzetek: Članek se ukvarja z zaznavanjem defektnosti modulov med evolucijo programov. 
Introduction 
Aspect-oriented programming (AOP) [11] allows a developer to modularize a crosscutting concern’s implementation by introducing a new kind of module called “Aspect”. This last encapsulates crosscutting concerns and thus improves modularity, understandability, and evolvability of the code. As any software system, AO systems are continuously modified and increase in size and complexity. After many enhancements and other evolution activities, the AO software modularity can be violated, and modifications become hard to do. The insufficient modularity of crosscutting concerns complicates AO software evolution and reduces crosscutting concern reusability. Therefore, methods and techniques are needed to detect modularity defects in AO software, in order to improve its decomposition and enhance its modularity. 
To detect modularity defects in AO software, we need to understand the relationships among entities that belong to software aspects, more specifically, to the crosscutting concerns of the system. However, many works [2, 5, 20] have proved with empirical evidence the ripple effects in AO software i.e. changes are propagated to unrelated entities in the program. So, it is difficult to detect modularity defects in AO software through a static analysis of the source code (e.g. [26]). In reality, two crosscutting concerns that are supposed to be independent statically may frequently change together. 
This paper presents a history-based approach to detect modularity defects in AO software. It consists of two main steps: first, an AO software repository is mined to detect logical couplings between the aspect’s entities of the system—how entities actually change together. In our approach, we don’t detect coupled aspects only. But, we can extract the aspect entities related to this coupling. Second, the resulted logical couplings are analysed to detect modularity defects. We identify modularity defects by external logical couplings i.e. if two entities always change together to accommodate modification requests, but they belong to two independent aspects; we consider this as a modularity defect. These last can be used to improve the AO software modularity in order to prevent it from decay. For example, the detected defects could be removed or minimized by using appropriated refactorings to change the AO software decomposition. 
The rest of the paper is organized as follows: in the next section we give the background used in this paper. We describe our approach in section 3; where we present the relationship between logical couplings and modularity defects, and how this relationship can be used to detect modularity defects in AO software. The tool chain is presented in section 4. Our approach is applied on a case study in section 5. Section 6 summarizes related work. Finally, section 7 closes with conclusions and future work. 

2 Background 
In this section, we first introduce definitions of important concepts related to our proposal. Then, we give a brief description of the AspectJ language. 
2.1 Modularity defects 
The IEEE Standard Glossary of Software Engineering 
Terminology (IEEE, 1990) defines modularity as “the 
degree to which a software system is composed of discrete components such that a change to one component has minimal impact on other components”. So, it allows each part to be modified, substituted or deleted with minimal impact on the rest of the system. Modularity has been playing a pervasive role in the context of software development and evolution. It can be considered a fundamental engineering principle as it allows: 
- to develop different parts of the same system by  
distinct people;  
- to test systems in a simultaneous fashion;  
- to substitute or repair defective parts of a system  
without affecting with other parts;  
- to reuse existing parts in different contexts; and  
- to restrict change propagation.  

In reality, however, during software evolution two modules that are supposed to be independent may always change together, due to unwanted side effects caused by quick and dirty implementation [24]. When such couplings exist, the software can deviate from its designed modular structure, which is called a modularity defect (violation). Such modularity defects could cause modularity decay over time and may require expensive system-wide refactorings. Detecting and fixing modularity defects make programs easier to understand and to evolve. 

2.2 Logical coupling 
Semantically coupled software entities may not structurally depend on each other i.e. different entities of a software system may be related to each other although this relationship is not easily detectable in the software source code. When different entities of a software system change together (as the system evolves) their common behavior is referred to as logical coupling [7]. Recently, researchers have used revision histories to more effectively identify semantically coupled components by checking how components historically change together [9, 10]. 
Logical couplings detection extract interesting dependencies between software entities that is not possible with the analysis of a single version. So, based on the historical data we can detect logical couplings between the entities of a software system. In this last, two entities are coupled whenever a change in an entity A implies a change in another entity B—one says that B depends on 
A. 
The logical couplings have been used for different purposes: to identify hidden architectural dependencies, to point developers to possible places that need change, or to use them as change predictors. In our context, we use such dependencies to evaluate the modularity of an AO software system. 
H. Cherait et al. 


2.3 AspectJ 
AOP is a new paradigm introduced by Kiczales et al. [11] that provides separation of crosscutting concerns. It modularizes the crosscutting concerns in a clear-cut fashion, yielding a system architecture that is easier to implement, and to evolve. With AOP, a program is composed with a set of aspects, and a base code describing the core modules. An aspect weaver, which is a compiler-like entity, composes the final system by combining the core and crosscutting modules through a process called weaving [12]. 
AO languages offer abstractions for the implementation of crosscutting concerns whose modularization cannot be achieved by using traditional programming languages. During the last decade, a considerable number of AO languages have been introduced. AspectJ [12] has been the pioneer of the AO languages, and it is still one of the most relevant frameworks supporting the AOP methodology. For the remaining of this paper, we will use AspectJ as our target language, although the observations made are also valid for other currently available AspectJ-like languages. AspectJ defines two types of crosscutting: dynamic crosscutting and static crosscutting. 
Dynamic crosscutting: is the weaving of new behaviour into the execution of a program using: join point, pointcut and advice. We briefly introduce each of these constructs as follows: 
-Join Point: denotes points at which crosscutting code can be executed. The join point is a well-defined “point” in the dynamic execution flow of an application. For instance, in object oriented languages, join points may refer to passing messages and writing on instance variables. 
-Pointcut: is a program element that picks out join points and exposes data from the execution context of those join points. The pointcut language of AspectJ offers a set of primitive pointcut designators, like call specifying method call or get/set specifying field access. These primitive pointcut designators can be combined using logical operators (and “&&”, or “||”, not “!”). 
-Advice: represents a program module which is to be executed at the designated join points. There are three types of advices before, after and around, which correspond to the program modules to be executed prior, after or instead of the designated events, respectively. It is defined in terms of pointcuts. The code of a piece of advice runs at every join point picked out by its pointcut. 
Static crosscutting: is the weaving of modifications into the static structure—the classes, interfaces, and aspects—of the system. By itself, it does not modify the system behavior, but it operates over the static structure of type hierarchies. AspectJ provides inter-type member declarations (introductions) and other declare forms. It makes static changes to the modules of the system, for example, we can add a method or field to a class. 
Finally, an Aspect is a modular unit designed to implement a crosscutting concern. It contains the code that expresses the weaving rules for both dynamic and static crosscutting. An aspect may also incorporate member variables, methods, etc., just like a normal class Java. 
3 Our approach 
3.1 Basic idea 
To understand better our contribution, it is important to define clearly the relationship between logical coupling detection and modularity defects. In this section, we present the utility of logical couplings in the detection of software modularity defects. And, we explain how this idea can be used in the context of AO software. 
There is a strong correlation between modularity defects and logical couplings. Some modularity defects are not easily detectable by static or dynamic software analysis. Fluri et al.’s [7] study shows that a large number of change coupling relationships are not entailed by structural dependencies. 
Extracting logical couplings and analysing them can help in detecting modularity defects in a software system. The basic idea is that we can distinguish two types of logical couplings, as depicted in Figure 1: internal and external logical couplings. A logical coupling is an internal logical coupling, if it relates two entities that belong to the same module in the software decomposition. On the other hand, an external logical coupling relates two entities that belong to two different (independent) software modules. 
The last type is the most important in our context; because the existence of external logical couplings in a software system presents possible modularity defects in that system. Two modules that are supposed to be changed independently are changed together i.e. a change in an entity that belong to a specific module will necessitate changes in other(s) entity(s) that belong to other module(s). So, we call such logical couplings “negative logical couplings” or “modularity defects”. 
In AOP, the crosscutting concerns are modularized by identifying a clear role for each one in the system, implementing each role in its own module, and loosely coupling each module to only a limited number of other modules [12]. Unfortunately, these systems need to evolve continually in order to cope with ever-changing software requirements. Empirical results show that AO software is not immune from the negative side effects of software evolution [2, 5, 20]. This fact harms the modularity of the AO program, hinders the concerns encapsulation and reduces the aspect reusability. To overcome this problem is a hot topic. 
Our research question is: how we can detect efficiently the modularity defects in AO software? To this end, we use the idea described above to achieve our goal. In this context, software modules (Figure 1) are the crosscutting concerns (Aspects) of the system, and the modularity defects are considered as external logical 

Figure 1: Types of logical couplings. 
couplings among these aspects. So a modularity defect in AO software can be defined as follows: 
Definition of a modularity defect: let A and B two independent aspects. A modularity defect (x, y) is a logical coupling between the two entities x and y, where x.A and y.B. 
To resume up, just how well does the AO software system evolution justify its best modularity? The existence of modularity defects (external logical couplings) in an AO system shows that the separation of crosscutting concerns (modularity) into that system is violated. The coupled crosscutting concerns are candidates for restructuring or refactoring. Here, the detected modularity defects are used to guide improvement efforts; in order to get a more stable decomposition with very little dependencies i.e. an ideal situation would allow changing each crosscutting concern independently of the others. This is very useful to reconstruct a best modularization for the AO software system and a good reusability of their crosscutting concerns. 
3.2 Approach overview 
The purpose of this paper is to present an approach to uncover modularity defects in AO software by analysing its evolution history. As depicted in Figure 2, our approach consists of two complementary steps, which form an integrated approach for detecting modularity defects: 1) An AO software repository is mined to detect logical couplings between the software entities that belong to the different aspects of the system; 2) The resulted logical couplings are then analysed according to the AO software decomposition to detect and locate modularity defects. If two entities x and y are frequently changed together, and they belong to two independent aspects A and B respectively, so the logical dependency (x,y) represents a modularity defect. The main purpose of such modularity defects is to evaluate how modular an AO application is, and to guide improvement efforts i.e. these couplings can be used to guide the software developer during restructuring and refactoring tasks. 


Figure 2: Modularity defect detection. 
So, it is to the developer to examine the corresponding code (the entities that are related to a modularity defect) to improve it and enhance the AO software modularisation. 
3.3 Logical coupling detection 
In this step, a syntactic analysis of the Aspects source code is performed, such that additions and modifications of 
aspect’s entities can be recorded. So, the source data for 
the mining will constitutes of the different building blocks of the software aspects: fields, methods, pointcuts, advices, and introductions. With our Mining approach, we address the following questions: 1) what are the coupled aspect entities in the AO system? and 2) what are the strengths of these couplings? 
3.3.1 Coupled aspect entities 
As we are mining for entities that are frequently changed together, it seems natural to use the technique called frequent itemset mining, which is able to discover interesting relations in a database. Our mining approach follows these steps: it acquires aspects data from a repository and transforms them into change-sets, which consist of the names of the entities added or modified in each transaction. Filtering may help in avoiding irrelevant data at this stage. We aim to track and mine software entities belonging to aspects, so we do not take into account base code entities. We focus on the logical couplings among crosscutting concerns only. So, we keep in every transaction only the aspect entities of an AO system (not base code entities). These are then processed using the Apriori frequent itemset mining algorithm [1]. 
Let .. = {..1, ..2, … , ....| be a set of aspect’s entities 
i.e. entities that belong to the different aspects of the AO software, and .. . .. an entity-set. We define repository .. as a set of transactions: .. = {..1, ..2,…,....|, where .... = {....1, ....2, … , ......| and ...... . ... Also, let ..(..) be the set of transactions that contain entity-set .., formally ..(..) = {.. . ..|.. . ..|. Finally, the support of an entity-set .. is the fraction of transactions in the repository that 
|..(..)|
contain ..: ..............(..) = . Then .. is called a 
|..| 
logical coupling when its support is higher than a given minimum support: ..............(..) . ..................... 
H. Cherait et al. 
Figure 3: Analysing logical couplings. 
3.3.2 Strength of a logical coupling 
The strength of a logical coupling is considered as the support of this logical coupling. So, the strength of a logical coupling {..1, … , ....| where each .... is an aspect entity, is measured by support which is the number (or percentage) of transactions containing the entities ..1, … , .... . 
3.4 Modularity defect detection 
The logical couplings extracted in the above step, are then analysed to detect modularity defects in the AO software system. This analysis is based on the structural decomposition of the AO software as crosscutting concerns (Aspects). In this step, the detected logical couplings are classed into two categories: internal and external logical couplings. As depicted in Figure 3, we define internal coupling as a dependency between two entities that belong to the same Aspect. The couplings between entities of an Aspect and any other entities that belong to other aspects are considered as external couplings. 
These external logical couplings are considered as possible defects in the AO software modularity. Formally, a set of external logical couplings ELC is defined as ELC = {(..i, ....)|..i . A, .... . B}, where A and B are two 
independent aspects. So, the set of modularity defects MD in an AO program P is defined as: 
|......| 
....(..) = . ........ 
..=1 
Since modularity defects are logical couplings, each modularity defect has a strength/support value (the number of transactions that contain the external logical coupling). So, we can say that (x, y) is a modularity defect that occurred once, (y, z) is a modularity defect that occurred twice, and so on. 
3.5 Discussion 
Using the detected logical couplings between aspect entities, we can deduce the coupled crosscutting concerns (aspects) in the AO system. As depicted in Figure 4, if two aspect entities e1 and e2 that belong to the independent aspects A1 and A2 respectively (modularity defect). Then, we deduce automatically that A1 and A2 are coupled aspects. 


Figure 4: Coupled aspects. 
The detected modularity defects can be used to assist restructuring and refactoring tasks. If some aspects change 
at the same time very often over several releases, they can be used to point to candidates for refactoring. 
On the other hand, our results can be used to evaluate the AO software modularity. We can for example define a modularity measure using the detected logical couplings (it can be equal to the number of no coupled aspects, devised by the total number of aspects). Based on this measure we can evaluate the AO system modularity. This measure can be used later to compare many implementations of AO software systems. So, we can answer interesting questions as: Is the AO program P more modularized then the AO program P’? Is the implementation of the crosscutting concern C in program P is much more encapsulated than in program P’? If we detect crosscutting concerns (aspects) that have no coupling to any other crosscutting concerns, these can be a perfect reusable crosscutting concerns. 
Tool chain 
This section describes the tool-chain with which we identify modularity defects in AO programs written in AspectJ [12]. This last is a well-established AOP language. As depicted in Figure 5, the overall process is performed using three main tools: 
The AspectJML Tool: an existing open source proposed by Melo Junior and Mendonça [13]. It is an XML-based markup language for representing source code written in AspectJ. The AspectJ source code is converted in XML (eXtended Markup Language) format 
[21] through the power of AspectJML. This XML-based representation is then used by the other tools in the tool-chain. 
The Mining Tool: We have implemented this tool to extract logical couplings from the AspectJ repository. First, this tool takes change transactions from the repository and filters them to keep just the changed entities belonging to the aspects of the system (not base 

Figure 5: Tool chain. 
code entities) in every transaction. Then, every entity in the transactions is replaced by its identifier. This last is extracted from the XML-based representation of the AspectJ source code. So, the tool gets change transactions of entity identifiers and organizing them in a single XML document. Finally, the transactions are mined using the Apriori algorithm. 
Here, we used the XQuery implementation of the Apriori algorithm proposed by Wan and Dobbie [22]. The output of this tool is the logical couplings in the AspectJ source code that have a support higher than a specific threshold (min support). Every logical coupling is a set of entity identifiers. 
The MDD Tool: a Modularity Defect Detection tool is implemented to filter the logical couplings obtained by the Mining tool. Here, the tool extracts modularity defects by eliminating internal logical couplings. It uses the XML-based representation of the source code to test if the entities that belong to a specific logical coupling are existing in the same aspect, or in different aspects using their identifiers. The results present possible modularity defects. 

Case study 
In order to assess the feasibility and correctness of our approach, this case study uses 22 releases of three well-known AspectJ programs available as open source. These systems were selected because they are rich in kinds of crosscutting concerns. Also they are used as case study in different research works [4, 5, 8, 14]. 
Table 1 describes these systems. It gives the number of versions and aspects of each software system. The first one, called Contract4J, it supports "Design by Contract" programming in Java. We considered the 5 releases of Contract4J in our study. The second is a product line for deriving applications that manipulate photos, videos and music on mobile devices called Mobile Media [6]. We selected its 7 releases in this experimentation. The last system called Health Watcher [19]; is a real Web-based information system that allows citizens to register complaints about health issues in public and heath care institutions to investigate and take the required actions. We selected the 10 releases of Health Watcher in our study. 
After the application of our approach on these systems, we find many internal logical couplings as: “frequently changing a pointcut involves changing its related advices”, “changing a field, involves changing the methods that use this field”, etc. Many modularity defects (external logical couplings) are detected also. Table 2 presents the detected coupled aspects in each system. For each coupled aspects, it gives the number of detected modularity defects (external logical couplings). Besides, it gives the support of each coupling. Here the support is the average of the supports of the related modularity defects 
i.e. the support of a logical dependency between two Aspects A and B is the sum of supports of their related modularity defects, devised by the number of such modularity defects. In the program Contract4J we have detected that the aspects ConstructorBoundaryConditions and Method-BoundaryConditions are tightly coupled with 7 modularity defects. Here the coupled low-level entities with the higher support for these aspects are: the method doTest in the aspect ConstructorBoundaryConditions and 
the method doBeforeTest in the aspect MethodBound­aryConditions. 
Software  #versions  #Aspects  
Contract4J  5  8.14  
Mobile Media  7  4.42  
Health Watcher  10  11.23  

Table 1: Subject programs. 
The aspect ConstructorBoundaryConditions is also coupled with the aspect UsageEnforcement through 3 modularity defects. The modularity defect with the higher support here is between: the advice applied after the pointcut postCtor in the aspect ConstructorBound­aryConditions and the pointcuts preNotInContract, post-NotInContract, and invarNotInContract that belong to the aspect UsageEnforcement. 
In the Mobile Media program, we have detected much more coupled aspects than those detected in the Contract4J program. The aspects DataModelAspectEH and UtilAspectEH are coupled via 2 modularity defects: the pointcuts loadMediaDataFromRMS and readMe­diaAsByteArray belonging to DataModelAspectEH and UtilAspectEH respectively are frequently changed together. Also, the pointcuts getMedias and getBytes-FromMediaInfo are tightly coupled. 
We have also detected that the aspect SortingAspect is coupled with 3 other aspects, which restricts its evolvability and reusability. It is coupled with the aspect FavouritesAspect via 4 modularity defects. Besides, it is coupled with the aspects ControllerAspectE and Copy-PhotoAspect through one modularity defect. The most frequent detected modularity defects here are of the type pointcut duplications. For instance, the pointcuts han­dleCommandAction and appendMedias are duplicated in the aspects FavouritesAspect and SortingAspect. Besides, the pointcut showImage is defined in the aspects Con­trollerAspectEH and SortingAspect. So any modification in such pointcuts implies changes in many aspects. 

Application  Coupled aspects  #Modularity defects  Support  
Contract4J  ConstructorBoundaryConditions MethodBoundaryConditions  7  0,6  
ConstructorBoundaryConditions UsageEnforcement  3  0,6  
Mobile Media  DataModelAspectEH UtilAspectEH  2  0,4  
FavouritesAspect SortingAspect  4  0,4  
ControllerAspectEH SortingAspect  1  0,2  
CopyPhotoAspect SortingAspect  1  0,2  
Health Watcher  .  .  .  

Table 2: Detected modularity defects. 
Finally, in the Health Watcher application we have detected many internal logical couplings, but we do not detect serious modularity defects in that system (except of a few external logical couplings which are detected once). So, in contrast to the above applications (Contract4J and Mobile Media), we can say that Health Watcher has a good modularization, and their crosscutting concerns (Aspects) are good reusable modules. 
Related work 
This section of the paper presents related works discussing the benefits of our proposal in contrast to the other ones. Our work involves the following research areas: 
AO software analysis: Existing approaches for detecting dependencies among AO software generally use static analysis [15, 17, 25, 26]. Such approaches are mainly based on an instruction-level to analyse the evolution of an AO software system: the source code is analysed and source code slicing is used to perform change impact analysis. We may say that such code-based approaches reveal syntactic dependencies and what we are really interested in is logical dependencies among AO software concerns. On the other hand, the information is derived using analysis of textual software artefacts that are found in a single version of the software. In contrast, our approach is based on an empirical observation of AO system structural modifications. We treat the whole evolution history to detect the modularity defects. 
Mining AO software repositories: There are many approaches and techniques for detecting logical couplings in OO software [9, 10]. These works prove that such historical analysis is often able to capture couplings among software entities that cannot be captured by static and dynamic analysis. But this research area still not enough explored for AO software. Few works are dedicated to mine AO software repositories. For instance, Qian et al. [16] treat the detection of change patterns in AspectJ programs. They analyse the successive versions of an AspectJ program, and then decompose their differences into a set of atomic changes. Finally, they employ the Apriori data mining algorithm to generate the most frequent item-sets. In [3], we have also detected change patterns in AspectJ software by Mining a rewriting rule-based repository. In this paper, our goal is different, as we aim at identifying logical couplings between the aspect entities instead of change patterns. 
Detecting software modularity defects: Many works prove the benefits of analysing the OO software evolution history for assessing its modularity. In [18] the authors state that to improve current modularity views, it is important to investigate the impact of design decisions concerning modularity in other dimensions, as the evolutionary view. They propose the ModularityCheck tool to assess package modularity using co-change clusters, which are sets of classes that usually changed together in the past. 
Wong et al. [23, 24] presented CLIO, a tool that detects and locates modularity violations. CLIO compares how components should co-change according to the modular structure and how components usually co-change 
Informatica 39 (2015) 187–194 193 
retrieving information from version history. A modularity violation is detected when two components usually change together but they belong to different modules, which are supposed to evolve independently. We use the same idea to detect modularity defects in AO software. However, these works extract couplings at a file level; in contrast, we detect logical couplings at entity level. Our detected fine-grained logical couplings can be very useful for restructuring and refactoring tasks. 
7 	Conclusion 
Unintended modularity defects of AO software may not be easily detectable by static or dynamic analysis techniques, but could cause modularity decay and bad separation of crosscutting concerns. To detect such modularity defects, we suggested a history-based approach based on the logical couplings in the AO software system. 
Our approach applies frequent itemset mining over an AO software repository in order to detect logical couplings among its entities. The extracted logical couplings are then analysed to detect modularity defects in the AO software system. Many case studies are experimented to demonstrate the feasibility of our approach. The results show that the approach is able to detect logical couplings among aspect entities, as well as modularity defects. The approach leads naturally to an evaluation of AO system’s modularity. The results of our approach can be used for reducing the dependencies between AO software Aspects and consequently promoting its modularity. 
The same idea can be used for detecting other types of AO software defects. For example, we can analyse the AO software evolution history for detecting bad smells, anti-patterns, etc. 
References 
[1] 	R. Agrawal, and R. Srikant (1994). Fast algorithms for mining association rules in large databases. In J. 
B. Bocca, M. Jarke, and C. Zaniolo, editors, Proceedings of 20th International Conference on Very Large Data Bases, Santiago, Chile, pp. 487– 
499. 
[2] 	R.T. Alexander, J. M. Bieman, and A. A. Andrews (2004). Towards the Systematic Testing of Aspect-Oriented Programs. Report CS-04-105, Colorado State University, Fort Collins-USA. 
[3] 	H. Cherait, and N. Bounour (2014). Detecting Change Patterns in Aspect Oriented Software Evolution: Rule-based Repository Analysis. International Journal of Software Engineering and Its Applications (IJSEIA), Vol. 8, No. 1, pp. 247­
266. 
[4] 	R. Dyer, H. Rajan, and Y. Cai (2012). An Exploratory Study of the Design Impact of Language Features for Aspect-oriented Interfaces. In 
Proceedings of AOSD’12, Potsdam, Germany. 
[5] 	F. Ferrari, R. Burrows, O. Lemos, A. Garcia, E. Figueiredo, N. Cacho, F. Lopes, N. Temudo, L. 
Silva, S. Soares, A. Rashid, P. Masiero, T. Batista, and J. Maldonado (2010). An Exploratory Study of Fault-Proneness in Evolving Aspect-Oriented Programs. In Proceedings of ICSE '10, Cape Town, South Africa, ACM press, pp. 65 – 74. 
[6] 	E. Figueiredo, N. Cacho, C. Sant’Anna, M. Monteiro, U. Kulesza, A. Garcia, S. Soares, F. Ferrari, S. Khan, F. Castor Filho, and F. Dantas (2008). Evolving software product lines with aspects: an empirical study on design stability. In Proceedings of ICSE’08. 
[7] 	B. Fluri, H. C. Gall, and M. Pinzger (2005). Fine-grained analysis of change couplings. In Proceedings of 5th WICSA’05, pp. 66–74. 
[8] 	P. Greenwood, T. T. Bartolomei, E. Figueiredo, M. Dósea, A. F. Garcia, N. Cacho, C. Sant’Anna, S. Soares, P. Borba, U. Kulesza, and A. Rashid (2007). On the impact of aspectual decompositions on design stability: An empirical study. In Proceedings of ECOOP, pp. 176–200. 
[9] 	A. E. Hassan (2008). The road ahead for mining software repositories. In Frontiers of Software Maintenance, pp. 48–57. 
[10] H. Kagdi, M. L Collard, and J. I. Maletic (2007). A Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution. Journal of Software Maintenance and Evolution: Research and Practice, Vol. 19, No. 2, pp. 77-131. 
[11] 
G. Kiczales, J. Lamping, A. Menhdhekar, C. Maeda, 

C. 
Lopes, J. M. Loingtier, and J. Irwin (1997). Aspect-oriented programming. In Proceedings of 11th European Conference on Object-Oriented Programming, Springer-Verlag, LNCS Vol. 1241, pp. 220–242. 


[12] R. 	Laddad (2003). AspectJ in Action: Pratical Aspect-Oriented Programming. Manning Pubblications Company. 
[13] L. S. Melo Junior, 	and N. C. Mendonça (2005). AspectJML: A Markup Language for AspectJ. In Proceedings of the 2nd Brazilian Workshop on Aspect Oriented Software Development, Uberlândia, MG, Brazil. 
[14] A. C. Neto, M. Ribeiro, M. Dosea, R. Bonifacio, P. Borba, and S. Soares (2007). Semantic Dependencies and Modularity of Aspect-Oriented Software. In Proceeding of the First International Workshop on Assessment of Contemporary Modularization Techniques (ACoM'07). 
[15] E. K. Piveta , M. Hecht , M. S. Pimenta , and R. T. Price (2006). Detecting bad smells in AspectJ. Journal of Universal Computer Science. 
[16] Y. Qian, S. Zhang and Z. Qi (2008). Mining Change Patterns in AspectJ Software Evolution. In Proceedings of the International Conference on Computer Science and Software Engineering, pp. 108-111. 
[17] M. Rinard, A. 	Salcianu, and S. Bugrara (2004). A classification system and analysis for aspect-oriented programs. In Proceedings of FSE’04, pp. 147–158. 
H. Cherait et al. 
[18] L. L. Silva, D. Félix, M. T. Valente, M. de A. Maia (2014). ModularityCheck: A Tool for Assessing Modularity using Co-Change Clusters. In Proceedings of the Brazilian Conference on Software: Theory and Practice (CBSoft’14) -Tool Session. 
[19] S. 	Soares, E. Laureano, and P. Borba (2002). Implementing distribution and persistence aspects with AspectJ. In Proceedings of the 17th OOPSLA. 
[20] F. 	Steimann (2006). The Paradoxical Success of Aspect-Oriented Programming. In Proceedings of OOPSLA’06, pp. 481-497. 
[21] J. Suzukiand, and Y. Yamamoto (1998). Managing the software design documents with xml. In Proceedings of the 16th annual international conference on Computer documentation, ACM Press: New York, pp. 127-136. 
[22] J. W. W. 	Wan, and G. Dobbie (2003). Extracting Association Rules from XML Documents using 
XQuery. In Proceedings of WIDM’03, New Orleans, 
Louisiana, USA, pp. 94-97. 
[23] S. Wong, Y. Cai, and M. Dalton (2009). Detecting Design Defects Caused by Design Rule Violations. Report DU-CS-09-04, Drexel University. 
[24] S. Wong, Y. Cai, M. Kim, and M. Dalton (2011). Detecting software modularity violations. In Proceedings of 33rd International Conference on Software Engineering (ICSE’11), pp. 411–420. 
[25] G. Xu, and A. Rountev (2008). AJANA: A General Framework for Source-Code-Level Interprocedural Dataflow Analysis of AspectJ Software. In 
Proceedings of AOSD’08, Brussels, Belgium. 
[26] J. Zhao (2002). Change Impact Analysis for Aspect-Oriented Software Evolution. In Proceedings of the 5th International Workshop on Principles of Software Evolution, Orlando, Florida, pp. 108-112. 

Towards Crafting an Improved Functional Link Artificial Neural Network Based on Differential Evolution and Feature Selection 
Ch. Sanjeev Kumar Dash and Ajit Kumar Behera Silicon Institute of Technology, Silicon Hills Patia, Bhubaneswar-751024, Odisha, India E-mail: sanjeev_dash@yahoo.com, ajit_behera@hotmail.com 
Satchidananda Dehuri Department of Systems Engineering Ajou University, San 5, Woncheon-dong Yeongtong-gu, Suwon-443-749, South Korea E-mail: satchi@ajou.ac.kr 
Sung-Bae Cho Soft Computing Laboratory Department of Computer Science, Yonsei University 134 Shinchon-dong, Sudaemoon-gu Seoul 120-749, South Korea E-mail: sbcho@yonsei.ac.kr 
Gi-Nam Wang Department of Industrial Engineering Ajou University, San 5, Woncheon-dong Yeongtong-gu, Suwon-443-749, South Korea E-mail: gnwang@ajou.ac.kr 

Keywords: differential evolution, functional link artificial neural networks, classification, feature selection, genetic algorithms 
Received: October 11, 2014 

The proposed work describes an improved functional link artificial neural network (FLANN) for classification. The improvement in terms of classification accuracy of the network is realized through differential evolution (DE) and filter based feature selection approach. Information gain theory is used to filter out irrelevant features and provide relevant features to the functional expansion unit of FLANN as an input, which in turn maps low to high dimensional feature space for constructing an improved classifier. To fine tune the weight vector of the given network, differential evolution is used. The work is validated using skewed and balanced dataset retrieved from the University of California Irvine (UCI) repository. Our systematic experimental study divulges that the performance of the differential-evolution trained FLANN is promising than genetic algorithm trained FLANN, ISO-FLANN, and PSO-BP. 
Povzetek: Predstavljena je gradnja klasifikacijske nevronske mreže, ki doseže boljše performanse z več novimi pristopi. 
Introduction 

Recently it is noticed that classification of big data [4] step towards handling big data, which has been plagued has demanded a great deal of attention. In this task, it is with many local optimal solutions and highly non-linear. required to predict the value (the class label) of a user Although, we have carried out our experimentation with specified attribute based on the values of other predicting the dataset obtained from the University of California, attributes. Although the task has been studied for many Irvine (UCI) repository [18] for validation, but it can be decades by the machine learning, statistics, and data extended to handle big data. mining communities but the complexity and shear size of Over the decades, neural networks [60] have been the dataset creates lots of avenues in pursuit of used as an alternative tool for approximating non-linearly perfection. Hence, an effort towards developing a separable boundary of classes in a classification problem. smooth, accurate, and scalable classifier is always Pao et al. [41], have shown that FLANN may be encouraging; it can face the challenge posed by the big conveniently used for function approximation and can be data analysis. In this context, we urged that this work is a extended for classification with faster convergence rate and lesser computational load vis-a-vis multi-layer perceptron (MLP) structure. With this motivation, several classifiers such as adaptive Particle Swarm Optimization-Back-propagation (PSO-BP) learning [13], improved swarm optimized FLANN (ISO-FLANN) Dehuri, et al. [15] have already been developed with certain efficacy. The FLANN is basically a flat network and the need of the hidden layer is removed and hence the learning algorithm used in this network becomes very simple. The functional expansion effectively increases the dimensionality of the input vector and hence the hyper planes generated by the FLANN provide greater discrimination capability in the input pattern space. 
Feature selection can be broadly classified into two categories: i) filter approach (it depends on generic statistical measurement); and ii) wrapper approach (based on the accuracy of a specific classifier) [2]. In the proposed work, the feature selection is performed based on information gain theory (entropy) measure with a goal to select a subset of features that preserves as much as possible the relevant information found in the entire set of features. We know that the architectural complexity of FLANN [12] is directly proportional to the number of features and the functions considered for expansion of the given feature value. Therefore, for reducing the architectural complexity, we first select a subset of features (i.e., feature selection) [28, 29] using gain ratio and then apply the usual procedure of function expansion and training by differential evolution [56]. In this work, the remarkable performance of DE as a global optimizer on continuous error function minimization problems has been studied [8] in the classification by effectively learning the FLANN. DE has also become a powerful tool for solving optimization problems that arise in other application areas like finance, medical, image processing [62], automatic clustering of big unlabeled datasets, et cetera. 
This paper is set out as follows. Section 2 gives an overview of FLANN network, feature selection, and differential evolution. In Section 3, the proposed method is discussed. Experimental setup, results, and analysis are presented in Section 4. Section 5 concludes the paper with a future line of research 
2 Background 
The background of the research work is presented in this Section. In Subsections 2.1 and 2.2, literatures study of FLANN as a classifier and predictor is discussed. Feature selection and its importance are the focus of Subsection 
2.3. Differential evolution, a meta-heuristic computing paradigm is discussed in Subsection 2.4. 
2.1 Review of Literature 
FLANNs are higher-order neural networks without hidden units introduced by Klasser and Pao [30]. Despite of their linear nature, FLANNs can capture non-linear input–output relationships, provided that these are fed with an adequate set of polynomial inputs, or the functions might be a subset of a complete set of 
Ch.S.K. Dash et al. 
orthonormal basis functions spanning through n-dimensional representation space, are constructed out of the original input attributes [41]. FLANNs can be used for non-linear prediction and classification. Related to this context, Subsections 2.1.1 and 2.1.2 are briefing out some of the works on FLANNs for classification and non-linear prediction. 
2.1.1 FLANNs for Classification 
In [53] a genetic algorithm used to select an appropriate number of polynomials as a functional input to the 
FLANN has been applied to the classi.cation problem. 
However, their main concern was the selection of the 
optimal set of functional links to construct the classi.er. 
In contrast, the proposed method gives much emphasis on how to develop the learning skill of the classi.er by using filtered feature vectors. Misra and Dehuri [36] have 
used a FLANN for classi.cation problem in data mining with a hope to get a compact classi.er with less 
computational complexity and faster learning. Hu and Tseng [63] have used the functional link net known as 
BpFLANN for classi.cation of bankruptcy prediction. 
With a motivation to restrict certain limitations, Dehuri, et al. [12] have coupled genetic algorithm based feature selection with FLANN (GFLANN). In the sequel, Dehuri and Cho [13] have given a road map on FLANN and designed a new PSO-BP adaptive learning mechanism for FLANN. In [14], Dehuri and Cho have contributed another stimulating work on FLANN [14] in succession with an improved swarm optimized FLANN for classification [15]. 
2.2 FLANNs for Prediction 
Pao et al., have presented a functional link neural network (CoFLANN) in [40] to learn the control 
systems. They have shown several bene.cial properties 
of generalized delta rule network with hidden layer and back-propagation (BP) learning. Haring and Kok [20], have proposed an algorithm (ClFLANN) using 
evolutionary computation (speci.cally genetic algorithm 
and genetic programming) for the determination of functional links (one based on polynomials and another based on expression tree) in neural network. Patra et al. 
[44] have proposed a CeFLANN based on BP learning and applied to the problem of channel equalization in a digital communication channel. Haring et al. [21], have proposed a ClaFLANN to select and transform features using evolutionary computation and showed that this kind of selection of features is a special case of so-called functional links. Hussain et al. [25] have described a new approach for decision feedback equalizer (DFE) based on the functional-link neural network (DfFLANN). The structure is applied to the problem of adaptive equalization in the presence of inter-symbol interference (ISI), additive white Gaussian noise, and co-channel interference (CCI). The experimental results provide signi.cantly superior bit-error rate (BER) performance characteristics as compared to the conventional methods. Chen et al. [6] have presented an adaptive implementation of the functional-link neural network (AFLNN) architecture together with a supervised learning algorithm named Rank-Expansion with Instant Learning (REIL) that rapidly determines the weights of the network. The beauty of their proposed algorithm is one-shot training as opposed to iterative training algorithms in the literature. Dash et al. [9], have proposed an ElfFLANN with trigonometric basis functions to forecast the short-term electric load. Panagiotopoulos et al. [39] have reported better results by applying FLANN for planning in an interactive environment between two systems: the challenger and the responder. Patra et al. [44] have proposed a FLANN with BP learning (SiFLANN) for identi.cation of non­linear dynamic systems. Moreover, Patra et al. [44] have used FLANN to adaptive channel equalization in a digital communication system with 4-QAM signal constellation named as QsFLANN. They have compared the performance of the FLANN with a multilayer perceptron (MLP) and a polynomial perceptron network (PPN) along with a conventional linear LMS-based equalizer for different linear and nonlinear channel models. Out of the three ANN equalizer structures, the performance of the FLANN is found to be the best in terms of MSE level, convergence rate, BER and computational complexity for linear as well as nonlinear channel models over a wide range of SNR and EVR variations. With the encouraging performance of FLANN [47, 48, 49], Patra et al. [45] have further motivated and came up with another FLANN known as IpFLANN with three sets of basis functions such as Chebyshev, Legendre, and power series to develop an intelligent model of the CPS involving less computational complexity. In the sequel, its implementation can be economical and robust. Park and Pao [43] have reported the performance of a holistic-styled word-based approach to off-line recognition of English language script. The authors have combined the practices of radial basis function neural net (RBNN) and the random vector functional-link net approach (RVFLANN) and obtained a method called the density-based random-vector functional-link net (DBRVFLANN). The combination is helpful in improving the performance of word 
recognition. A Chebyshev functional link arti.cial neural 
networks (CFLANN) is proposed by Patra et al. [49] for non-linear dynamic system identi.cation. Sing et al. [54] has estimated the degree of insecurity in a power system by the proposed IeFLANN with a set of orthonormal trigonometric basis functions. An evolutionary search of genetic type and multi-objective optimization [34] such as accuracy and complexity of the FLANN in the Pareto sense is used to design a generalized FLANN (SyFLANN) with internal dynamics and applied to system identi.cation. A reduced-decision feedback functional link arti.cial neural network (RDF-FLANN) structure for the design of a nonlinear channel equalizer in digital communication systems is proposed by Weng et al. [58]. Authors have reported that the use of direct decision feedback can greatly improve the performance of FLANN structures. Weng et al. [57], have proposed a reduced decision feed-back Chebyshev functional link arti.cial neural networks (RDF-CFLANN) for channel 
Informatica 39 (2015) 195–208 197 
equalization. In [46], FLANNs with trigonometric polynomial functions (IsFLNN) are used in intelligent sensors for harsh environment that effectively linearizes the response characteristics, compensates for non­idealises and calibrates automatically. 
Interval regression analysis has been a useful tool for dealing with uncertain and imprecise data. Since the available data often contain outliers, robust methods for interval regression analysis are necessary. Hu [24] has proposed a genetic-algorithm-based method (IraFLANN) for determining two functional-link nets for the robust nonlinear interval regression model: one for identifying the upper bound of data interval, and the other for identifying the lower bound of data interval. 


2.3 FLANNs Classifier 
The FLANN architecture [10, 47, 48, 49] uses a single layer feed forward neural network by removing the concept of hidden layers. The learning of a FLANN may be considered as approximating or interpolating a continuous, multivariate function f(X) by an approximating function . In FLANN a set of basis 
functions 

and a fixed number of weight parameters W are used to represent . With a specific choice of a set of basis functions, the problem is to find the weight parameters W that provides the best possible approximation of ‘f’ on the set of input-output examples. So, the most important thing is that how to choose the basis functions to obtain better approximation. 
Let us consider a set of basis function with the following properties: 
(i) 

=1, (ii) the subset is a linearly 
independent set, i.e., if , then 
for all 
i=1,2,…j, and 
(iii) 

Let be a set of basis functions to be considered for FLANN. Thus, the FLANN consists of N 
basis functions with the following input-output relationship for the jth output: 

, (1) 
where , i.e., is the 
input pattern vector, i.e., is the output vector and is the weight vector 
associated with the jth output of the FLANN. The non­linear function is used to transfer the weighted sum into desired output format of an input pattern. 
Considering the m-dimension output vector, equation 
(1) can be written in matrix notation as follows: 
, (2) where W is an weight matrix of the FLANN given by , is the 
basis function vector, and is a matrix of linear outputs of the FLANN. The m-dimensional output vector 

may be given by 
, (3) The training of the network is done in following way: 
Let ‘k’ patterns be applied to the network in a sequence repeatedly. Let the training sequence be denoted by and the weight of the network be 

, where the ‘k’ is also the iteration. Referring to equation (1) the jth output of the FLANN at iteration k is given by: 
, (4) for all and j=1,2…,m, where 
Ch.S.K. Dash et al. 
scalable. Using filter method, feature selection is done once and then can be provided as input to different classifiers. In this method features are ranked according 
to some criterion and the top k features are selected. 
Wrapper model: This approach uses the method of classification itself to measure the importance of feature sets; hence the selected features depend on the classifier model used [26]. In this method a minimum subset of features is selected without learning performance deterioration. 
Wrapper methods generally result in better performance than filter methods because the feature selection process is optimized for the classification algorithm to be used. However, wrapper methods are too expensive for large dimensional database in terms of computational complexity and time since each feature set considered must be evaluated with the classifier algorithm used. Filter based feature selection methods are in general faster than wrapper based methods. 

. 	2.5 Differential Evolution 
Let the corresponding error be denoted by: 
. 
In words, the weighted sum of the functionally expanded features is fed to the single neuron of the output layer of the FLANN. The weights are optimized by the DE method during the process of training. The set of functions considered for function expansion may not be always suitable for mapping the non-linearity of the complex task. In such cases few more functions may be incorporated into the set of functions considered for expansion of the input data set. 
However, dimensionality of many problems itself is very high and further increasing the dimensionality to a very large extent may not be an appropriate choice. So, this is one of the reasons, why we are carrying out this work. 


2.4 Feature Selection 
Feature selection (FS) [29] is an essential task to remove irrelevant and/or redundant features. In other words, feature selection techniques provide a way to select a subset of potential attributes or variables from a dataset. For a given classification problem, the network may become unbelievably complex if the number of the features used to classify the pattern increases very much. So the reason behind using FS techniques include reducing dimensionality by removing irrelevant and redundant features, reducing the amount of attributes 
needed for learning, improving algorithms’ predictive accuracy, and increasing the constructed model’s comprehensibility. After feature selection a subset of the original features is obtained which retains sufficient information to discriminate well among classes. The selection of features can be achieved in two ways: 
Filter Method: The filter approach is independent of the learning algorithm, computationally simple, fast, and Differential evolution (DE) [50, 55, 57] is a population based stochastic search algorithm. As a stochastic optimizer, it has the capability to handle non-linearity, non-convexity, multi-modality, and even dynamic characteristics of the problem. Unlike canonical GA, it typically operates on real valued individual encodings. Like GAs [35], DE maintains a pool of potential solutions which are then perturbed in an effort to explore yet better solutions to a problem in hand. In GAs, the individuals are perturbed based on crossover and mutation. However in DE, individuals are perturbed based on the difference of different individuals, borrowing ideas from the Nelder-Mead simplex method [50]. One of the advantages of this approach is that the 
resulting  ‘step’  size  and  orientation  during  the  
perturbation  process  automatically  adapts  to  the  
landscape of fitness function.  

There are many variants of DE algorithms developed [8, 50] in past few years, the most classical variants is based on the DE/rand/1/bin scheme [55]. The different variants of the DE algorithm are described using the notation DE/x/y/z, where x specifies how the base vector is chosen (e.g., rand-if it is randomly selected, or best-if the best individual is selected), y is the number of difference vectors used, and z denotes the crossover scheme (bin for crossover based on independent binomial experiments, and exp for exponential crossover). 
A 	pool of n, d-dimensional solution vectors 
is randomly initialized and evaluated using a fitness function f(.). During the process of search, each individual (i) is iteratively refined. The following three steps are iterated one after another till desired optimum is reached. 
i) 	Mutation: Create a donor vector which encodes a solution, using randomly selected members of the population. 
ii) Crossover: Create a trial vector by combining the donor vector with i. 
iii) Selection: By the process of selection, determine whether the newly-created trial vector replaces i in the population or not. 
The pseudo-code of DE is illustrated as follows: 
DE (Scaled Factor fm, Crossover Rate Cr, Pool Size n) 
{ INITIALIZATION: Generate a population of n, d-dimensional solution vectors in the search space. DO 
FOR each i of the Population of individuals MUTATION: Generate a donor vector vi using equation (5). CROSSOVER: Generate a trial vector ui using equation (6). SELECTION: Evaluate the trial vector (ui). if f(ui) > f(xi) (for maximization problem) then replace xi by ui else xi will survive to next generation. 
END FOR 
WHILE (Termination Criterion Met) 
} 

For the mutation of ith individual of the population, three different individuals’ xr1, xr2, and xr3 with r1 . r2 . r3 . i will be randomly chosen from the pool to generate a new vector known as donor vector. The donor vector is described as follows. 
where, the scaling parameter 

called mutation factor and a general setting for the parameter 
is . However, Storn and Price suggest 
as such a setting may result in good optimization effectiveness. 
Selecting three indices randomly imply that all individuals of the current pool have the same chance of being selected, and therefore influencing the creation of the difference vector. The mutation factor controls the amplification of the difference vector and in turn used to avoid stagnation of the search process. There are several alternative versions of the above process for creating a donor vector [for details see [8, 50]. In [62] a self-adaptive DE is used that can tune this scaling factor dynamically. 
After the creation of the donor vector (vi), a binomial crossover (bin) operates on the vector vi and the target vector xi to generate a trial vector in the following way. 
Informatica 39 (2015) 195–208 199 
where xij, vij, and uij are the j-dimensional components of the vectors xi, vi, and ui, respectively; rand is a random number generated in the range (0, 1); is the user-specified crossover constant from the range (0, 1). The resulting trial (child) vector replaces its parent if it has higher fitness (a form of selection); otherwise the parent survives unchanged into the next iteration of the algorithm (shown in equation (7)). 


It is provided a comprehensive comparison of the performance of DE with a range of other optimizers, including GA, and report that the results obtained by DE are consistently as good as the best obtained by other optimizers across a wide range of problem instances in [50]. There are a number of reasons to integrate FLANN with DE. The first reason is to reduce local optimality during the training of FLANN. Although genetic algorithm coupled with FLANN has already been established to reduce local optimality while designing a classifier but DE has some merits over GA. Therefore, we are ignited to carry out this work. Secondly, the real encoding of DE solves the problem of encoding-decoding mapping. Thirdly, it can achieve faster convergence speed. Lastly DE does not undergo any complex process of parameter tuning and works very reliably with excellent overall results. 
3 Proposed Method 
With an objective to design a smooth and accurate classifier, the proposed approach is combining the idea of filter based feature selection and simple FLANN 
classifier [15]. It is a two phase method. In phase one, we 


are selecting a set of relevant features by using the entropy while in the second phase the weights of FLANN are trained using differential evolution. Figure 1 depicts the overall architecture of the approach. 
In the first phase, we rank the features or attributes according to information gain ratio and then delete an 


Figure 1: Architecture of Proposed Method. 

appropriate number of features which have the least gain ratio [2]. The exact number of features deleted varies from dataset to dataset. The expected information needed to classify a tuple in D is given by equation (8), 
, (8) 


where pi is the non-zero probability that an arbitrary tuple in D belongs to class Ci and is estimated by . A 
log function to the base 2 is used, because the information is encoded in bits. Info(D) is the average amount of information needed to identify the class level 
For an attribute ‘A’ entropy 
basis. 
number of repetitions of the Information gain (Gain(A)) is defined as the difference between the original information requirement and new requirement (after partitioning on A) (refer equation 10) 
. (10) 
Information gain applies a kind of normalization to information gain using split information value defined analogously with as equation 11: 
. (11) 
This value represents the potential information generated by splitting the training data set, D, into v partitions, corresponding to the v outcomes of test on attribute A. For each outcome, it considers the number of tuples having the outcome with respect to the total number of tuples in D. The gain ratio is defined as in equation (12). 
. 
(12) 
In summary, the feature selection is done in first phase using information gain ratio and then the dataset with reduced number of features is used for automatic training and determination of the parameters of FLANN using DE in the second phase. 
In the second phase, we are focusing on the learning of the classifier. Here, differential evolution is employed to reveal the weight of the FLANN. This ensures efficient representation of an individual of DE. Since the performances of the FLANN mainly depend on weight; we just encode the weight into an individual for stochastic search. We have chosen a set of trigonometric functions as the basis function for functional expansion. The reason of choosing trigonometric functions for functional expansion is as follows: 
Ch.S.K. Dash et al. 
Without loss of generality, for all the polynomials of Nth 
order with respect to an orthonormal system the best approximation in the metric space L2 is 
given by the Nth partial sum of its Fourier series with respect to the system. Thus, the trigonometric polynomial basis functions provide a compact representation of the function in the mean square sense. However, when the outer product terms were used along with the trigonometric polynomials for functional expansion, better results were obtained in the case of learning the classifier. 
Suppose the maximum number of trigonometric functions used to expand a particular feature is ‘F’ and there are ‘L’ features selected for input to the network, then the size of the weight vector is defined as 
, then the length of the individual is 
. The structure of the individual is represented in Figure 2. 

Figure 2:  Structure of the Individual. 
In other words, each individual has two constituent parts such as weight and bias. 
The fitness function which is used to guide the search process is defined in equation (13). 
, (13) 
where N is the total number of training sample 
is the actual output of ith pattern and 
is the estimated output of FLANN. The error and hence 
root mean square is denoted as e(i) and E respectively. 
The algorithmic framework of FLANN-DE is described as follows: 
Initially, a set of individuals (i.e., i=1,2,..,np) is 
the size of the population) pertaining to networks weights and bias are created. 
where 
and t is the iteration number. At the start of the algorithm this 
set of individuals is initialized randomly and then evaluated using the fitness function f (.). In each iteration, e.g., iteration t, for individual undergoes mutation, crossover, and selection as follows: Mutation: for vector 
a perturbed vector called donor vector is generated according to equation (14): 



remaining instances (i.e., 48) are belong to class 3. None of the attributes contain any missing values. All attributes , (14) are continuous. 
where mf is the mutation factor drawn from (0,2], the indices r1, r2, and r3 are selected randomly from {1,2,3,…,np }. 
Crossover: The trial vector is generated as follows (equation (15)): 
Lymphography: This data set includes 148 instances and each having 19 attributes including the class attribute. The instances are classified into 4 classes (i.e., every instance either belongs to class 1, or 2, or 3, or 4). Class 1 has 2 instances, class 2 contains 81, class 3 contains 61 and remaining instances (i.e., 4) belong to class 4. None of the attributes contain any missing 


4 Experimental Study 
The data set and experimental parameter setting are discussed in Subsection 4.1. Results are analyzed in Subsection 4.2. 
4.1 Description of Dataset Parameters 
The data set used to test the proposed method obtained from the UCI machine learning repository [18]. Four balanced and unbalanced datasets have been chosen to validate the proposed method. The details about the four data sets are given below. 
Iris: This data set includes 150 instances and each having 4 attributes, excluding the class attribute. The instances are uniformly classified into 3 classes (i.e., every instance either belongs to class 1, or 2, or 3). Class 1 has 50 instances, class 2 contains 50, and remaining instances (i.e., 50) are belong to class 3. None of the attributes contain any missing values. All attributes are continuous. 
Wine: This data set includes 178 instances and each having 13 attributes, excluding the class attribute. The instances are classified into 3 classes (i.e., every instance either belongs to class 1 or 2 or 3) in an almost balanced way. Class 1 has 59 instances, class 2 contains 71, and into two mutually exclusive parts: 50% as training sets 
and 50% as test sets. The parameters’ value used for 


independent runs. However, the number of iterations are varies from dataset to dataset. 
Table 1: Parameters used for simulation 
Parameter  Iris  Wine  Lympho -graphy  Stalog (heart)  Pima  
Population  50  50  50  50  50  
Mutation  0.2  0.4  0.3  0.4  0.4  
Crossover  0.8  0.8  0.6  0.8  0.8  

In the case of Iris dataset, an ideal number of iterations is lies within the range of 40~50. However, it varies from 400~500 in the case of Lymphography, Statlog (Heart), and Pima. In the case of Wine the ideal number of iteration can varies from 1000~1200. Similarly, the parameters setting of the methods ISO­FLANN and PSO-BP have been fixed as suggested in the respective literatures. 
4.2 Result Analysis 
The experimental test results are presented in Table 2. The accuracy in terms of percentage of test samples correctly classified using proposed method and method ISO-FLANN, PSO-BP, and method proposed in [12] are given in columns 2, 3, 4, and 5 of Table 2. In all the cases, 1/3rd of the features have been removed. 

In our comparison, it is noticed that the accuracy obtained from our proposed method is better than the method proposed in [12], ISO-FLANN, and PSO-BP, moreover a paired t-test has been performed to judge the performance all the algorithms properly. With the 5% significance level, the critical value of t is 0.43 and it is not coming under the specified range, so the null hypothesis is rejected. 
Table 2: Testing accuracy of proposed method vs. method proposed in ISO-FLANN (Dehuri, et al., 2012), PSO-BP (Zhang, et al., 2007) and (Dehuri, Mishra, and Cho, 2008). 
Dataset  Propose d Method  ISO­FLAN N  PSO-BP  Method Proposed in (Dehuri, Mishra, and Cho, 2008)  
Iris  98.33  97.62  97.12  97.33  
Wine  93.10  92.32  91.54  90.45  
Lymphogra phy  87.50  86.9  85.65  77.08  
Statlog (heart)  86.57  85.46  84.77  84.45  
Pima  79.20  78.86  76.88  72.14  

The features which are identified to remove during the filter process are listed in Table 3. Further, the mapping of numeric and actual name of the features are shown in Table 4 to avoid confusion or ordering. 
Table 3:  Filtered Attributes of Datasets. 
Dataset  Attrib utes of the Dataset  Attributes Removed  
Iris  1~4  2  
Wine  1~13  3,4,5,8  
Lymphography  1~18  1,6,9,12,16,17  
Statlog( heart)  1~13  1,4,6,7  
Pima  1~8  1,3  

Table 4: Features ordering of all datasets. 
Dataset  Attributes Ordering  Name  
1  Sepal Length  
Iris  2  Sepal Width  
3  Petal Length  
4  Petal Width  
1  Alcohol  
2  Malic acid  
3  Ash  
4  Alcalinity of ash  
5  Magnesium  
6  Total phenols  
Wine  7  Flavanoids  


8  Nonflavanoid phenols  
9  Proanthocyanins  
10  Color intensity  
11  Hue  
12  OD280/OD315 of diluted wines  
13  Proline  
Lymphography  1  Lymphatics: normal, arched, deformed, displaced  
2  Block of affere: no, yes  
3  Bl. of lymph. c: no, yes  
4  Bl. of lymph. s: no, yes  
5  By pass: no, yes  
6  Extravasates: no, yes  
7  Regeneration of: no, yes  
8  Early uptake in: no, yes  
9  Lym.nodes dimin: 0-3  
10  Lym.nodes enlar: 1-4  
11  Changes in lym.: bean, oval, round  
12  Defect in node: no, lacunar, lac. marginal, lac. central  
13  Changes in node: no, lacunar, lac. margin, lac. central  
14  Changes in stru: no, grainy, drop-like, coarse, diluted, reticular, stripped, faint  
15  Special forms: no, chalices, vesicles  
16  Dislocation of: no, yes  
17  Exclusion of no: no, yes  
18  No. of nodes in: 0-9, 10­19, 20-29, 30-39, 40-49, 50-59, 60-69, >=70  
Statlog (heart)  1  Age  
2  Sex  
3  Chest pain type (4 values)  
4  Resting blood pressure  
5  Serum cholestoral in mg/dl  
6  Fasting blood sugar > 120 mg/dl  
7  Resting electrocardiographic results (values 0,1,2)  
8  Maximum heart rate achieved  
9  Exercise induced angina  
10  Oldpeak = ST depression induced by exercise relative to rest  
11  The slope of the peak exercise ST segment  
12  Number of major  

vessels (0-3) colored by flourosopy  
13  Thal: 3 = normal; 6 = fixed defect; 7 = reversable defect  
Pima  1  Number of times pregnant  
2  Plasma glucose concentration a 2 hours in an oral glucose tolerance test  
3  Diastolic blood pressure (mm Hg)  
4  Triceps skin fold thickness (mm)  
5  2-Hour serum insulin (mu U/ml)  
6  Body mass index (weight in kg/(height in m)^2)  
7  Diabetes pedigree function  
8  Age (years)  

The error rate obtained from the proposed method varies over a number of iterations of IRIS, Lymphography, WINE, Statlog (Heart), and PIMA are illustrated in Figures 3, 4, 5, 6, and 7. 





Figure 6. Iteration Number vs Error Obtained from Statlog (Heart) Dataset. 



WINE Dataset. 
Further, it is interestingly noticed that if we do not eliminate any features from the dataset then the accuracy of the results hardly makes any difference, except wine dataset. In the case of wine dataset the accuracy obtained with feature selection is poor than without feature selection. With respect to optimal mutation factor the results of proposed method without feature selection for different datasets are described in Table 5. However, it is better than ISO-FLANN, PSO-BP, and method proposed in [12]. 
Name of Data set  Mutati on factor  Accuracy in percentage  
Iris  0.3  98.23  
Wine  0.4  98.51  
Lymphography  0.3  87.50  
Stalog(heart)  0.4  86.56  
Pima  0.4  78.91  



Table 5: Results Obtained from proposed method without feature selection. 
Figures 8, 9, 10, 11, and 12 illustrate the performance of the proposed method without feature selection over different mutation rate. In the case of Iris and Lymphography the accuracy of the proposed classifier dropped after the mutation rate 0.3. Similarly, in the case of Wine, Pima, and Statlog the accuracy dropped after the mutation rate 0.4. Hence, our experimental study recommends the mutation rate setup mentioned in Table 5. 





Figures 13, 14, 15, 16, and 17 demonstrate; how accuracy of the proposed classifier with filter based feature selection varies over different mutation rates. 





In the case of Iris the accuracy of the proposed classifier with mutation rate 0.2 is promising. In the case of Wine and Lymphography the accuracy is dropped after the mutation rate 0.3. Similarly the accuracy of Statlog and Pima has been dropped after the mutation rate 0.4. 
Conclusion 
An integrated framework of differential evolution and FLANN along with a filter based feature selection has been crafted in this work to classify unknown patterns. The experimental results confirm that the combination of filter based feature selection and training of FLANN using differential evolution obtains accurate and smooth classification. Further, it has been observed that if we reduce 1/3rd of total attributes by the process of filter approach, then the network shows an improvement and constancy in accuracy. Finally, we have compared the experimental outcomes obtained from this study with its rival proposed by Dehuri et al. [12], ISO-FLANN, and PSO-BP, we then conclude that the accuracy draws a very sharp edge between the proposed method and the rest of the methods consider for comparison. Our future line of research includes: i) validation of its accuracy and scalability in big data analysis and ii) dealing with noise and uncertainty of the dataset by suitable integration with other soft computing paradigm. 
References 
[1] 	Abu-Mahfouz, I.-A. (2005). A Comparative Study of Three Arti.cial Neural Networks for the Detection and Classi.cation of Gear Faults. 
International Journal of General Systems, 34(3), pp. 261–277. 
[2] 	
Aruna, S., Nandakishore, L. V., and Rajagopalan, S. 

P. 
(2012). A Hybrid Feature Selection Method based on IGSBFS and Naive Bayes for the Diagnosis of Erythemato-Squamous Diseases. 


International Journal of Computer Applications 
(0975-8887), 41(7). 
[3] 	Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. 
Ch.S.K. Dash et al. 
Neural Networks, IEEE Transactions on, 5(4), 537­
550. 
[4] 	Bifet, A. (2013). Mining Big Data in Real Time. Informatica (Slovenia), 37(1), 15-20. 
[5] 	Carvalho, D. R., & Freitas, A. A. (2004). A hybrid decision tree/genetic algorithm method for data mining. Information Sciences, 163(1), 13-35. 
[6] 	Chen, C. P., LeClair, S. R., & Pao, Y. H. (1998). An incremental adaptive implementation of functional-link processing for function approximation, time-series prediction, and system identification. Neurocomputing, 18(1), 11-31. 
[7] 	Das, M., Roy, R., Dehuri, S., and Cho, S.-B. (2011). A New Approach to Associative Classification Based on Binary Multi-objective Particle Swarm Optimization. International Journal of Applied Meta-heuristic Computing, 2(2), pp. 51­
73. 
[8] 	Das, S. and Suganthan, P. N. (2011). Differential Evolution: A Survey of the State-of-the-Art. IEEE Transactions on Evolutionary Computation, 15(1), pp. 4-31. 
[9] 	Dash, P. K., Liew, A. C., and Satpathy, H. P. (1999). A Functional Link Neural Network for Short Term Electric Load Forecasting. Journal of Intelligent and Fuzzy Systems, 7(3), pp. 209–221. 
[10] Dash, 	P. K., Satpathy, H. P., Liew, A. C., and Rahman, S. (1997). Real-Time Short-Term Load Forecasting System Using Functional Link Network. IEEE Transactions on Power Systems, 12(2), pp. 675-68. 
[11] Dash, S. K., Behera, A., Dehuri, S., and Cho, S.-B. (2013). Differential Evolution Based Optimization of Kernel Parameters in Radial Basis Function Networks for Classification. International Journal of Applied Evolutionary Computation, .4(1), pp. 56­
80. 
[12] Dehuri, S., Mishra, B. B., & Cho, S. B. (2008). Genetic feature selection for optimal functional link artificial neural network in classification. In Intelligent Data Engineering and Automated Learning–IDEAL 2008, pp. 156-163. 
[13] Dehuri, S. and Cho, S.-B. (2009). A Comprehensive Survey on Functional Link Neural Networks and an adaptive PSO--BP learning for CFLNN. Neural Computing and Applications, 19(2), pp. 187-205. 
[14] Dehuri, 	S. and Cho, S.-B. (2010). Evolutionary Optimized Features in Functional Link Neural Networks for Classification. Expert Systems with Applications, 37(6), pp. 4379-4391. 
[15] Dehuri, S., 	Roy, R., Cho, S.-B., and Ghosh, A. (2012). An Improved Swarm Optimized Functional Link Artificial Neural Network (ISO-FLANN) for Classification. The Journal of Systems and Software, 85, pp.1333-1345. 
[16] Derrac, 	J., Garcia, S., and Herrera, F. (2010). A Survey on Evolutionary Instance Selection and Generation. International Journal of Applied Meta­heuristic Computing, 1(1), pp. 60-92. 
[17] Forerest, S. (1993). Genetic Algorithms: Principles of Natural Selection Applied to Computation. Science, 261, pp. 872-888. 
[18] Frank, A. 	and Asuncion, A. (2010). UCI Machine Learning Repository http://archive.ics.uci.edu/ml), Irvine, CA: University of California, School of Information and Computer Science. 
[19] Goldberg, 	D. E. (1989). Genetic Algorithm in Search Optimization and Machine Learning, Addison-Wesley. 
[20] Haring, 	B. and Kok, J. N. (1995). Finding Functional Links for Neural Networks by Evolutionary computation. In: van de Merckt, T., et al. (eds) BENELEARN1995, Proceedings of the Fifth Belgian– Dutch Conference on Machine Learning, Brussels, Belgium, pp. 71–78 
[21] Haring, 	S., Kok, J. N., and van Wezel, M. C. (1997). Feature Selection for Neural Networks Through Functional Links Found by evolutionary Computation. In Advances in Intelligent Data Analysis (IDA-97).LNCS 1280, pp. 199–210. 
[22] Haykin, 	S. (1994). Neural Networks: A Comprehensive Foundation. Upper Saddle River, NJ: Prentice Hall. 
[23] He, X., 	Zhang, Q., Sun, N., & Dong, Y. (2009). Feature selection with discrete binary differential evolution. In Artificial Intelligence and Computational Intelligence, 2009. AICI'09. International Conference on, Vol. 4, pp. 327-330). 
[24] Hu, 	Y.-C. (2008). Functional Link Nets with Genetic Algorithm Based Learning for Robust Non-Linear Interval Regression Analysis. Neurocomputing. 72(7). pp. 1808-1816. 
[25] Hussain, 	A., Soraghan, J. J., & Durrani, T. S. (1997). A new adaptive functional-link neural­network-based DFE for overcoming co-channel interference. Communications, IEEE Transactions on, 45(11), 1358-1362. 
[26] 
Karegowda, A. G., Manjunath, A. S., and Jayaram, 

M. 
A. (2010). Comparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection. International Journal of Information Technology and Knowledge Management, 2(2), pp. 271-277. 


[27] Khusba Rami, N., Ahmed, A., and Adel, A. (2008). Feature Subset Selection Using Differential Evolution. In Proceedings of 15th International conference on Advances in Neuro-Information Processing, Springer-Verlag, Berlin, Heidelberg, part-I, pp.103-110,. 
[28] Khushaba, R. 	Al-Ani, A., AlSukker, A., and Al-Jumaily, A. (2008). A Combined Ant Colony and Differential Evolution Feature Selection Algorithm. Ant Colony Optimization and Swarm Intelligence, pp. 1-12. 
[29] Khushaba, 	R., Al-Ani, A., and Al-Jumaily, A. (2011). Feature Subset Selection using Differential Evolution and a Statistical Repair Mechanism. Expert Systems with Applications, 38(9), pp. 11511­11526. 
Informatica 39 (2015) 195–208 207 
[30] Klasser, 	M.S. and Pao, Y. H. (1988). Characteristics of the Functional Link Net: A Higher Order Delta Rule Net. IEEE Proceedings of 2nd Annual International Conference on Neural Networks, San Diago, CA, pp.507-513, 
[31] Krishnaiah, 	D., Prasad, D. M. R., Bono, A., Pandiyan, P. M., and Sarbatly, R. (2008). Application of Ultrasonic Waves Coupled with Functional Link Neural Network for Estimation of carrageenan Concentration. International Journal of Physical Sciences, 3(4), pp. 90–96. 
[32] Liu, 	H. and Motoda, H. (2002). On Issues of Instance Selection. Data Mining and Knowledge Discovery, 6, pp. 115-130. 
[33] Majhi, B. and Shalabi, H. (2005). An Improved Scheme for Digital Watermarking using Functional Link Arti.cial Neural Network. Journal of Computer Science, 1(2), pp. 169–174. 
[34] Marcu, T. and Koppen-Seliger, B. (2004). Dynamic Functional Link Neural Networks Genetically Evolved Applied to System Identi.cation. In Proceedings of ESANN’2004, Bruges (Belgium), pp. 115–120. 
[35] Michalewicz, Z. (1998). Genetic Algorithm + Data structure = Evolution Programs. Springer Verlag, New York. 
[36] Misra, B. B. and Dehuri, S. (2007). Functional Link 
Neural Network for Classi.cation Task in Data 
mining. Journal of Computer Science, 3(12), pp. 948–955. 
[37] Nayak, 	S. C., Misra, B. B., and Behera, H.S. (2012). Index prediction with Neuro-Genetic Hybrid network: A Comparative Analysis of Performance Computing. In Proceedings of International Conference on Communication and Applications (ICCCA), pp. 1-6. 
[38] Nayak, 	S. C., Misra, B. B., and Behera, H. S. (2012). Evaluation of Normalization Methods on Neuro-Genetic Models for Stock index Forecasting. In Proceedings of 2012 World Congress on Information and Communication Technologies (WICT), pp. 602-607. 
[39] Panagiotopoulos, 	D. A., Newcomb, R. W., and Singh, S. K. (1999). Planning with a Functional Neural Network Architecture. IEEE Trans. Neural Network, 10(1), pp. 115–127. 
[40] Pao, Y.-H., Phillips, S. M. (1995). The Functional Link Net Learning Optimal Control. Neurocomputing, 9(2), pp. 149–164. 
[41] Pao, Y. H. and Takefuji, Y. (1992). Functional Link Net Computing: Theory, system, Architecture and Functionalities. IEEE Computer Journal, pp. 76– 
79. 
[42] Pao, 	Y. H., Phillips, S. M., and Sobajic, D. J. (1992). Neural-net Computing and Intelligent Control Systems. International Journal of Control, 56(2), pp. 263–289. 
[43] Park, G. H. and Pao, Y. H. (2000). 	Unconstrained Word-Based Approach for Off-Line Script Recognition using Density Based Random Vector Functional Link Net. Neurocomputing, 31(1), pp. 45–65. 
[44] 
Patra, J. C., Pal, R. N., Baliarsingh, R., and Panda, 

G. 
(1999). Non-Linear Channel Equalization for QAM Signal Constellation using Arti.cial Neural Networks. IEEE Transactions on Systems, Man, Cybernetics-Part B: Cybernetics, 29(2), pp. 262– 


271. 
[45] Patra, J. C. and van den Bos, A. (2000). Modelling of an Intelligent Pressure Sensor using Functional Link Arti.cial Neural Networks. ISA Transaction, 39(1), pp. 15–27. 
[46] Patra, J. 	C., Goutam, C., and Subahas, M. (2008). Functional Link Neural Networks-Based Intelligent Sensors for Harsh Environments. Sensors and Transducers Journal, vol. 90, pp. 209–220. 
[47] Patra, J. 	C. and Pal, R. N. (1995). A Functional Link Neural Network for Adaptive Channel Equalization. Signal Processing, 43(2), pp. 181– 
195. 
[48] 
Patra, J. C., Pal, R. N. Chatterji, B. N., and Panda, 

G. 
(1999). Identification of Nonlinear Dynamic Systems using Functional Link Artificial Neural Networks. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 29(2), pp. 254-262. 


[49] Patra, 	J. C. and Kot, A. C. (2002). Nonlinear Dynamic System Identification using Chebyshev Functional Link Artificial Neural Networks. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 32(4), pp. 505-511. 
[50] Price, 	K., Storn, R., and Lampinen, J. (2005). Differential Evolution: A Practical Approach to Global Optimization, Springer-Verlag. 
[51] Purwar, S., Kar, I. N., and Jha, A. N. (2007). On­
line System Identi.cation of Complex Systems 
using Chebyshev Neural Networks. Applied Soft Computing, 7(1), pp. 364–372. 
[52] Roy, 	R., Dehuri, S., and Cho, S. B. (2011). A Novel Particle Swarm Optimization Algorithm for Multi-objective Combinatorial Optimization Problem. International Journal of Applied Meta­heuristic Computing, 2(4), pp. 41-57. 
[53] Sierra, A., Macias, J. A., and Corbacho, F. (2001). Evolution of Functional Link Networks. IEEE 
Ch.S.K. Dash et al. 
Transactions on Evolutionary Computation, 5(1), pp. 54–65. 
[54] Sing, 	S. N., Srivastava, K. N. (2002). Degree of Insecurity Estimation in a Power System using Functional Link Neural Network. European Transactions on Electrical Power, 12(5), pp. 353– 
359. 
[55] Storn, 	R., & Price, K. (1995). Differential evolution-a simple and efficient adaptive scheme for global optimization over continuous spaces (Vol. 3). Berkeley: ICSI. 
[56] Storn, 	R., & Price, K. (1997). Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 11(4), 341-359. 
[57] Weng, W. D., Yong, C. S., and Lin, R. C. (2007). A Channel Equalizer using Reduced Decision Feedback Chebyshev Function Link Arti.cial Neural Networks. Information Sciences, 177(13), pp. 2642–2654. 
[58] Weng, W. D., Yen, C. T. (2004). Reduced Decision Feed-back FLANN Non-Linear Channel Equaliser for Digital Communication Systems. IEEE Proceedings-Communications, 151(4), pp. 305– 
311. 
[59] Yan, 	Z., Wang, Z. and Xie, H. (2008). The Application of Mutual Information Based Feature Selection and Fuzzy LS-SVM based Classifier in Motion Classification. Computer Methods and Programs in Biomedicine, vol.90, pp. 275-284. 
[60] Zhang, 	G. P. (2000). Neural Networks for Classification: A Survey. IEEE Transactions on Systems, Man, Cybernetics-Part C: Application and Reviews, 30(4), pp. 451–461. 
[61] Zhang, J. R., Zhang, J., Lok, T. M., and Lyu, M. R. (2007). A Hybrid Particle Swarm Optimization-Back Propagation Algorithm for Feed-Forward Neural Network Training. Applied Mathematics and Computation, vol. 185, pp. 1026-1037. 
[62] Ghosh, A., Datta, A., and Ghosh, S. (2013). 	Self-Adaptive Differential Evolution for Feature Selection in Hyperspectral Image Data. Applied Soft Computing, 13(4), pp. 1969-1977. 
[63] Hu, Y. C. and Tseng, F. M. (2007). Functional-Link Net with Fuzzy Integral for Bankruptcy Prediction. Neurocomputing, 70(16), pp. 2959– 2968. 

Multimodal Score-Level Fusion Using Hybrid GA-PSO for Multibiometric System 
Cherifi Dalila and Hafnaoui Imane, Institute of Electrical and Electronic Engineering University of Boumerdes, Algeria E-mail: dacherifi@yahoo.fr, hafmane@hotmail.com 
Nait-Ali Amine LiSSi, EA 3956, Paris-Est Creteil (UPEC) Creteil, France E-mail: naitali@u-pec.fr 

Keywords: multibiometric, multimodal, fusion, score level, genetic algorithm, particle swarm optimization, hybrid, GA-PSO 
Received: May 16, 2014 

Due to the limitations that unimodal systems suffer from, Multibiometric systems have gained much interest in the research community on the grounds that they alleviate most of these limitations and are capable of producing better accuracies and performances. One of the important steps to reach this is the choice of the fusion techniques utilized. In this paper, a modeling step based on a hybrid algorithm, that includes Particle Swarm Optimization and Genetic Algorithm, is proposed to combine two biometric modalities at the score level. This optimization technique is employed to find the optimum weights associated to the modalities being fused. An analysis of the results is carried out on the basis of comparing the EER accuracies and ROC curves of the fusion techniques. Furthermore, the execution speed of the hybrid approach is discussed and compared to that of the single optimization algorithms, GA and PSO. 
Povzetek: Predstavljena je nova optimirna metoda za iskanje uteži pri kombiniranju dveh virov informacij za biometrično prepoznavo. 
feature-level fusion, perform fusion before matching. Post-mapping fusion techniques, such as rank-level, 
1 Introduction 
decision-level, and match score-level fusion, perform 
It is becoming increasingly apparent that a unimodal fusion after matching. In this paper, our work is focused 
system using a single biometric trait is not sufficient to on the fusion of multimodalities at the score level. This 
meet a number of system requirements imposed by scenario is extensively studied in literature because of the 
several large-scale authentication applications. The relatively easy access to information at this level, and the 
limitation of unimodal systems, such as noisy sensor fusion of the scores output by the different matchers[2]. 
data, intra-classvariations, non-universality, vulnerability This offers the best trade-off between accessibility and 
to spoof attacks and more, can lower the performance of fusion convenience. 
the system, and make it more susceptible to refusing a Paper contribution: we propose the use of a hybrid 
legitimate user and jeopardizing personal security. 
algorithm GA-PSO to optimize the weights assigned to 
Multibiometric systems seek to alleviate some of these the different biometric modalities used in the fusion at 
drawbacks by consolidating the evidence presented by the score level. 
multiple biometric sources. These systems are expected The idea of the hybrid GA-PSO is to take advantage 
to significantly improve the recognition performance of a of both algorithms so as to gain in time performance and 
biometric system besides improving population obtain a Multibiometric system with an optimum 
coverage, deterring spoof attacks, and reducing the 
accuracy. 
failure-to-enroll rate. Multibiometric Fusion can be Paper structure: The rest of the paper is structured 
implemented in different scenarios including the type of as follow: We present some of the previous works in 
fused sources and the level at which the fusion occurs. 
literature that tackled this problem in the next section. 
The sources can be multiple-sensors data, multiple-Section 3 gives a brief overview of GA and PSO as well 
samples, multiple-algorithms, or multiple-modalities. 
as some essential definitions. In section 4, we describe 
As for the levels, Sanderson and Paliwal [1] how the hybrid GA-PSO works and how it is used to 
proposed classifying fusion techniques into two obtain optimum biometric weights. Section 5 covers our 
categories: pre-mapping and post-mapping fusion. Pre-experiments including the results obtained and a brief 
mapping fusion techniques, such as sensor-level and discussion. Our conclusions are highlighted in section 6. 

2 Literature review 
In a comparison study, Damousis and al. [3] used four machine learning techniques to fuse face and voice modalities at the matching level; mainly Gaussian Mixture Models (GMMs), Artificial Neural Networks (ANNs), Fuzzy Expert Systems (FESs), and Support Vector Machines (SVMs). Their research concluded that although all four techniques performed well, SVM gave the best accuracies. 
The Sum Rule was proposed by Ross et al.[2] to fuse face, fingerprint, and hand geometry modalities. In order to compare this technique, Wang et al. [4] proposed using the Weighted Sum Rule by assigning weights to iris and face score modalities based on their false accept rate (FAR) and false reject rate (FRR). They concluded that the Weighted Sum Rule performs better at increasing the accuracy of recognition than the Simple Sum Rule. Various techniques were studied in order to assign said weights with varying levels of accuracy and performance. A recent trend has been the inclusion of optimization techniques in the fusion process in the hopes of obtaining the optimum of the biometric performance. Genetic algorithms (Gas) have seen a special interest. In the works of Alford and Hansen [5], a fusion of face and periocular biometrics at the score level based on Genetic and evolutionary computations (GEC) was achieved. Their work showed that better accuracies could be reached using this technique. Giot and al. [6] proposed a faster technique to compute the EERs of fused modalities as a fitness function for a Genetic Algorithm. Particle Swarm Optimization (PSO) was used in the works of Raghavendra and al. [7] in order to fuse near infrared and visible images for improved face verification. Mazouni and al. [8] did a comparison in performance of some Multibiometric fusion techniques on face and voice modalities. In their study, GA and PSO were proven to give the best accuracies, especially with degraded datasets. SVM in these cases gave the worst performances. 
The work presented in this paper builds on these previous findings and increases the performance of the implemented systems. Since the recognition systems work with thousands of individuals, reducing the computation times is essential. The proposed approach, GA-PSO, strives to achieve this while keeping the performances at their highest. To our knowledge, no previous work employed a hybrid GA-PSO to fuse biometric modalities at the score level in order to gain good accuracies with better computational times. 
3 Multimodal score level fusion 
During score level fusion, scores are combined to generate a single scalar score which is later used to make the final decision. There are several combination schemes to achieve this. These include statistical rule-based methods such as Simple Sum, Max rule, Min rule, Product rule and Weighted Sum. 
3.1 	Score Normalization 
With the methods mentioned above, score normalization is required before fusion of scores. Anil Jain and al. [9] showed in their work that both min-max and z-score methods are sufficient techniques but they are very sensitive to outliers. On the other hand, tanh normalization method, introduced by Hampel et al. [10] is both robust and efficient. For this purpose, and in our work, the tanh-estimators normalization rule was employed. 
Given a matching score .., the normalized score .~. is computed using the following equation: 
. 
......
.~.. 	........... ..... 
. ... 	(1) 
Where ...and are the mean and standard 
... 
deviation estimates, respectively, of the genuine score distribution as given by Hampel estimators. 
3.2 	Genetic Algorithm and Particle Swarm Optimization 
In this work, the focus is on finding the optimum weights .. for fusion of .modalities by weighted sum which is defined by: 
. 
... . ....... 
(2) 
... .
Given 	that ...}...~and . .... .
... 
Genetic algorithm is a well-known and frequently used evolutionary computation technique. This method was originally developed by John Holland et al.[11]. The GA is inspired by the principles of genetics and evolution, and mimics the reproduction behavior observed in biological populations. 
In GA, a candidate solution for a specific problem is called an individual or a chromosome and consists of a linear list of genes. GA begins its search from a randomly generated population of designs that evolve over successive generations (iterations). To perform its optimization-like process, the GA employs three operators to propagate its population from one generation to another. 1) Selection: In which the GA considers the principal 
of“survival of the fittest” to select and generate 
individuals that are adapted to their environment. 
2) 	Crossover: It mimics mating in biological populations. The crossover operator propagates features of good surviving designs from the current population into the future population, which will have a better fitness value on average. 
3) 	Mutation: It promotes diversity in population characteristics. The mutation operator allows for global search of the design space and prevents the algorithm from getting trapped in local minima [11]. 
Particle Swarm Optimization is one of the recent evolutionary optimization methods. This technique was originally developed by Kennedy & Eberhart [12] in order to solve problems with continuous search space. PSO uses social rules to search in the design space by controlling the trajectories of a set of independent particles. The position of each particle, .. representing a particular solution of the problem, is used to compute the value of the fitness function to be optimized. In fact, the main PSO operator is the velocity update, .., that takes into account the best position, in terms of fitness value reached by all the particles during their paths during its search ....., and the best position that the agent itself has reached ....., resulting in a migration of the entire swarm towards the global optimum. 
At each iteration, the particle moves around according to its velocity and position; the cost function to be optimized is evaluated for each particle in order to rank the current location. The velocity of the particle is then stochastically updated according to [13] 
... ....) (3) 
......(..... 
.......( )
.)
.....(..... ... 
After, the particle position is updated according to 
. ...
........ ... (4) 
Where: 
. Inertia weight, a parameter controlling the flying 
dynamics. 
.., .. random variables in the range }...~ . 
.., .. positive constants controlling the related 
weighting of corresponding terms. 
. Constriction parameter introduced by Clerc and 
al. [14]. 
The proposed hybrid GA-PSO approach 
Although GAs have been successfully applied to a wide spectrum of problems, using GAs for large-scale optimization could be very expensive due to its requirement of a large number of function evaluations for convergence. Compared to GA, PSO has some attractive characteristics. It has constructive cooperation between particles; that is, particles in the swarm share information among themselves. On the other hand, a drawback of PSO is that the swarm may prematurely converge. The underlying principle behind this problem is the fast rate of information flow between particles, resulting in the creation of similar particles with a loss in diversity that increases the possibility of being trapped in local optima. 
To deal with all these misgivings, and seeing as both GA and PSO work with an initial population of solutions and combining the searching abilities of both methods 
Informatica 39 (2015) 209–216 211 
seems to be a reasonable approach, we propose a new algorithm, denoted as GA-PSO, that combines the evolutionary natures and social interactions of both algorithms. 
To understand the workings of the algorithm, Figure 1 depicts a schematic representation of the proposed hybrid GA-PSO. As can be seen, GA and PSO both work with the same initial population. The hybrid approach picks N initial individuals that are randomly generated. The N individuals are sorted by fitness, and, according to a user defined probability .., the set is divided into two sub-sets{.....|. The top set .. is used to adjust the particles using the PSO algorithm. The other set .. is fed into the real-coded GA to create new individuals by selection, crossover and mutation operations. Both resulting populations are combined into one single population of N individuals, which are then sorted in preparation for repeating the entire run. 
In our experiments, and in terms of multimodal fusion, the hybrid algorithm generates an initial population of size N which consists of the weights ..defined in equation (2). In this work, we will fuse two modalities at a time to create fusion scores which makes .... 
The fitness function is defined as the Equal Error Rate (EER). As a reminder, the EER is the point at which the error rates FAR and FRR are equal. The goal is to minimize the value of the EER. For every set of individuals(.....), the EER of the fused scores .. is 
computed. Knowing that the best fitness is the one with the smallest EER, the individuals are then rearranged and sorted. The whole set is split into two sets which will go through the selection, crossover and mutation processes in case of GA, and velocity and position update in case 
of PSO. Evaluation of the fitness costs of the “offspring” 
is once again run and the weights to produce the minimum EER value is picked as optima. If the stopping criteria are not satisfied yet, this procedure is repeated until one of the conditions is met. This is summarized in Algorithm 1. 


Algorithm 1: Hybrid GA-PSO to find optimized fusion weights 
1. 	
Initialize parameter values 

2. 	
Generate random initial population (weights) 

3. 	
While k < itermax do 

4. 	
Evaluate then sort fitness function for every individual based on EER 

5. 	
For i=1:m 

6. 	
Update particle’s personal best .....and global best  ..... 

7. 	
Update particle’s velocity and position according to eqs. (3) and (4) 

8. 	
End for 

9. 	
For i=m+1: end 

10. 	
Select parents to reproduce 

11. 	
Generate children though crossover and mutation 

12. 	
End for 

13. 	
Merge the two resulting sub-populations into one population 

14. 	
If (stopping criteria) then 

15. 	
Goto18 

16. 	
End if 

17. 	
End while 

18. 	
Return individuals (fusion weights) that give the best EER 


5 Results and Discussion 
5.1 Experiment Setup 
Three publicly available Multibiometric databases were used in order to validate the fusion techniques. The NIST BSSR1 Set 1 [15] consists of sets of raw output similarity scores from 517 users of faces and both left and right index live-scan fingerprints coming from the same person. XM2VTS database [16] is built on the XM2VTS face and speech multimodal database, respecting the Lausanne Protocols I and II (LP1 and LP2). LP1 has eight baseline systems and LP2 has five baseline systems. In here, we only deal with the LP1 dataset. The BANCA dataset [17] contains matcher scores of face and speech. There are seven different protocols: Mc, Md, Ma, Ua, Ud, G and P. 
In order to validate the aforementioned algorithms and their effectiveness when dealing with Multibiometrics, we split the databases into two separate sets: 
• 	
The training set which serves to compute the biometric reference of each matcher. In other words, we train the algorithms to attain the optimal weights for each matcher. 

• 	
The testing set which serves to validate the results of the training by computing the performance of the fusion with the obtained weights. 


D. Cherifi et al. 
In our experiments, the hybrid algorithm which combines properties of both GA and PSO runs on the parameters summarized in Table.1. 

Parameter 
Value 
Initial Population size  50  
Maximum iterations  50  
Splitting probability  Pk=0.6  
Crossover probability  PC=1  
Mutation probability  Pm=0.05  
Inertia factor  w=1  
C1 and C2  Ci=2.05  
Constriction factor  k=0.73  

Table 1: The parameter values used in the hybrid GA­PSO. 
5.2 Experimental Results 
To compare the performances of the biometric systems, the EER values and Receiver Operating Characteristic (ROC) curves are studied. Table 2 presents the EER values of the single modalities involved in the fusion from each database. 
To evaluate its performance, the hybrid is compared to the classical combination rules as well as the single optimization techniques, GA and PSO. Table 3 summarizes the results we obtained from the experiments. Before applying the rules on these scores, they have all been put under the same range {0, 1} using the tanh-normalization scheme. The best performance in each of the fused modalities is shown in bold. 
From the first look, an improvement in accuracy is clearly observed between unimodal and multimodal systems, regardless of the fusion rule applied. In Table 3, even the best matcher in the NIST, Face-C with EER = 4.39%, is outperformed by a simple Max-rule, with an EER = 3.66%. 
Figure 2 plots the ROC curves of fused scores using the classical combination rules. We observe that among all these rules, Simple Sum gives the better performance even when dealing with degraded data, as is the case of the BANCA Ua subset with (EER = 10.4%). 
What interests us is the Weighted Sum where the weights associated to the different modalities are optimized through the hybrid GA-PSO. In Figure 3, the ROC curves of fused scores using Simple Sum are plotted against those using GA-PSO. We notice that although Simple Sum gave the best results previously, it is outperformed by the hybrid GA-PSO in every dataset. This is not only in terms of EER. From the same figure, we can see that, even when considering the FAR and FRR values, GA-PSO gives better rates. 
These results are confirmed in Table 3, where compared to the best EER obtained from Simple Sum in the NIST dataset with the (FaG-FiR) pair (EER = 1.21 %), the improvement in accuracy is quite apparent where the optimizations give a better optimized EER (= 0.43%). 



Although the performances of GA, PSO and the hybrid GA-PSO are closely similar in most datasets, the employment of the hybrid GA-PSO always reaches optimum weights which in turn gives the best EER values, to the contrary of GA and PSO, which sometimes tend to get stuck on local minima. We notice that even with the degraded data, the execution of this hybrid optimization technique provides good performance rates. 
5.3 Discussion 
When it comes to comparing the optimization techniques to each other, there are not one but many points to consider. It is clear from the results discussed in the previous section and as can be observed in Figure 4, that GA, PSO, and GA-PSO mostly result in the same best accuracies. But they differ in other aspects such as the time consumption (see Figure 5). Genetic Algorithm, due to the fact that it covers large search spaces, has a larger computational time. On the other hand, we have PSO that, as a consequence of its fast operations, consumes less computational time but converges quickly to local minima. The hybrid GA-PSO takes advantage of both algorithms where it gains in computational time, by adding the benefit of fast search property of PSO, and still covers the large search space efficiently. This is observed in Figure 5, where the cost function is plotted against the number of iterations run by all three algorithms. It can clearly be noted, with the XM2VTS dataset, that GA-PSO takes much fewer iterations (#iterations = 2) to reach the global optimum than either PSO (#iterations = 9) or GA (#iterations = 38). Table 4 puts in value the amount of time in CPU-time that each algorithm takes to be executed for 50 iterations and the time to reach a global minimum. It seems, from a first look, that the hybrid algorithm gives the least favorable running time. That is quite logical since GA-PSO computes the cost function three times in one iteration while PSO computes it twice, and GA, once. But when taking into consideration that it takes much fewer iterations to reach a global optimum, it is actually faster than the two other algorithms. After many runs of these programs, it has been noticed that, although GA and PSO mostly give good results, they would occasionally get stuck in local minima, as is the case in Figure.5.b with the NIST dataset. It appears at first glance that GA reached a good place faster than the other algorithms. But in fact, it reached a local minimum and got stuck there. Be that as it may, after giving it more time, it did reach the same global minimum. 
On the other hand, the hybrid GA-PSO is observed to always converge to a global point in the shortest time. 


(a) (b) 
Figure 2: ROC curves of fused scores using classical combination rules from (a) NIST (b) BANCA. 


(a) (b) 
Figure 3: ROC curves of fused scores using hybrid GA-PSO from (a) NIST (b) XM2VTS. 


(a) (b) 
Figure 4: ROC Curves of fused scores from (a) NIST (b) BANCA Databases using Optimization Techniques. 

-XM2VTS --NIST ­

(a) 
(b) 

(c) 
(d) 




(e) (f) Figure 5: Cost Function vs. Number of Iterations for (a) (b) Genetic Algorithm 
(c) (d) Particle Swarm Optimization (e) (f) Hybrid GA-PSO. 
Conclusion 
This paper proposes a hybrid GA-PSO approach to combining biometric modalities at the score level. With the Weighted Sum rule, the role of the hybrid is to optimize the weights associated with the fused modalities to reach optimum EER values. A normalization based on the tanh-normalization scheme is performed beforehand to put the score modalities on a same unified range. The performance of the hybrid is compared to that of the classical combination rules and the single GA and PSO algorithms. Our results show that the GA-PSO was successful in obtaining much better accuracies on the three different public biometric databases as compared to the classical rules. The time execution of the optimization techniques is also studied. We observe that the GA-PSO outperforms the single GA and PSO in terms of computational time where we find that since the hybrid takes advantage of the properties of both GA and PSO, it assures that the optimum is reached and in the least number of iterations. This makes the hybrid GA-PSO a faster and more robust technique. 
References 
[1] 	C. Sanderson and K. K. Paliwal, “On the Use of 
Speech and Face Information for Identity 
Verification,” IDIAP Research Report 04-10, Martigny, Switzerland, 2004. 
[2] 	A. A. Ross, K. Nandakumar, and A. K. Jain, Handbook of Multibiometrics, vol. 6, no. ISBN­13:978–0–387–22296–7.Springer-Verlag, pp. 74–75, 2006. 
[3] 	I. G. Damousis and S. Argyropoulos, “Four 
Machine Learning Algorithms for Biometrics 
Fusion: A Comparative Study,” Appl. Comput. Intell. Soft Comput., vol. 2012, pp. 1–7, 2012. 
[4] 	Y. Wang, T. Tan and A. K. Jain, "Combining Face and Iris Biometrics for Identity Verification", Proc. 4th Int`l Conf. on Audio-and Video-Based Biometric Person Authentication (AVBPA), pp. 805-813, Guildford, UK, June 9-11, 2003. 
[5] 	A. Alford, C. Hansen, G. Dozier, K. Bryant, J. Kelly, T. Abegaz, K. Ricanek, and D. L. Woodard, “GEC-based multi-biometric fusion,” IEEE Congr. Evol. Comput., pp. 2071–2074, Jun. 2011. 
[6] 	R. Giot and C. Rosenberger, “Genetic programming for multibiometrics,” Expert Syst. Appl., vol. 39, no. 2, pp. 1837–1847, Feb. 2012. 
[7] 	R. Raghavendra, B. Dorizzi, A. Rao, and G. Hemantha Kumar, “Particle swarm optimization 
based fusion of near infrared and visible images for 
improved face verification,” Pattern Recognit., vol. 44, no. 2, pp. 401–411, Feb. 2011. 
[8] 	M. Romaissaa and R. Abdellatif, “On Comparing 
Verification Performances of Multimodal 
Biometrics Fusion Techniques” Int. J. Comput. Appl., vol. 33, no. 7, pp. 24–29, 2011. 
[9] 	A. Jain, K. Nandakumar, and A. Ross, “Score normalization in multimodal biometric systems,” 
D. Cherifi et al. 
Pattern Recognit., vol. 38, no. 12, pp. 2270–2285, Dec. 2005. 
[10] F. R. 	Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust statistics: the approach based on influence functions, vol. 114. John Wiley & Sons, 2011. 
[11] J. 	H. Holland, Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. U Michigan Press, 1975. 
[12] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95­International Conference on Neural Networks, vol. 4, pp. 1942–1948, 1995. 
[13] A. T. Al-Awami, A. Zerguine, L. Cheded, A. Zidouri, and W. Saif, “A new modified particle swarm optimization algorithm for adaptive equalization,” Digit. Signal Process., vol. 21, no. 2, pp. 195–207, Mar. 2011. 
[14] 	M. Clerc and J. Kennedy, “The particle swarm ­explosion, stability, and convergence in a multidimensional complex space,” IEEE Trans. Evol. Comput., vol. 6, no. 1, pp. 58–73, 2002. 
[15] 	“NIST biometric score set,” National Institute of Standards and Technology, 2006. [Online]. Available: http://www.itl.nist.gov/iad/894.03/biometricscores/. 
[16] N. Poh and S. Bengio, “Database, protocols and 
tools for evaluating score-level fusion algorithms in 
biometric authentication,” Pattern Recognition, 2006. [Online]. Available: http://personal.ee.surrey.ac.uk/Personal/Norman.Po h/web/fusion. 
[17] N. 	Poh, “BANCA score database.” [Online]. Available: http://info.ee.surrey.ac.uk/Personal/ Norman.Poh/web/banca_multi. 

JOŽEF STEFAN INSTITUTE 
Jožef Stefan (1835-1893) was one of the most prominent physicists of the 19th century. Born to Slovene parents, he obtained his Ph.D. at Vienna University, where he was later Director of the Physics Institute, Vice-President of the Vienna Academy of Sciences and a member of several sci­enti.c institutions in Europe. Stefan explored many areas in hydrodynamics, optics, acoustics, electricity, magnetism and the kinetic theory of gases. Among other things, he originated the law that the total radiation from a black body is proportional to the 4th power of its absolute tem­perature, known as the Stefan–Boltzmann law. 
The Jožef Stefan Institute (JSI) is the leading indepen­dent scienti.c research institution in Slovenia, covering a broad spectrum of fundamental and applied research in the .elds of physics, chemistry and biochemistry, electronics and information science, nuclear science technology, en­ergy research and environmental science. 
The Jožef Stefan Institute (JSI) is a research organisation for pure and applied research in the natural sciences and technology. Both are closely interconnected in research de­partments composed of different task teams. Emphasis in basic research is given to the development and education of young scientists, while applied research and development serve for the transfer of advanced knowledge, contributing to the development of the national economy and society in general. 
At present the Institute, with a total of about 900 staff, has 700 researchers, about 250 of whom are postgraduates, around 500 of whom have doctorates (Ph.D.), and around 200 of whom have permanent professorships or temporary teaching assignments at the Universities. 
In view of its activities and status, the JSI plays the role of a national institute, complementing the role of the uni­versities and bridging the gap between basic science and applications. 
Research at the JSI includes the following major .elds: physics; chemistry; electronics, informatics and computer sciences; biochemistry; ecology; reactor technology; ap­plied mathematics. Most of the activities are more or less closely connected to information sciences, in particu­lar computer sciences, arti.cial intelligence, language and speech technologies, computer-aided design, computer ar­chitectures, biocybernetics and robotics, computer automa­tion and control, professional electronics, digital communi­cations and networks, and applied mathematics. 
The Institute is located in Ljubljana, the capital of the in­dependent state of Slovenia (or S¦nia). The capital today is considered a crossroad between East, West and Mediter-
Informatica 39 (2015) 217 
ranean Europe, offering excellent productive capabilities and solid business opportunities, with strong international connections. Ljubljana is connected to important centers such as Prague, Budapest, Vienna, Zagreb, Milan, Rome, Monaco, Nice, Bern and Munich, all within a radius of 600 km. 
From the Jožef Stefan Institute, the Technology park “Ljubljana” has been proposed as part of the national strat­egy for technological development to foster synergies be­tween research and industry, to promote joint ventures be­tween university bodies, research institutes and innovative industry, to act as an incubator for high-tech initiatives and to accelerate the development cycle of innovative products. 
Part of the Institute was reorganized into several high-tech units supported by and connected within the Technol­ogy park at the Jožef Stefan Institute, established as the beginning of a regional Technology park "Ljubljana". The project was developed at a particularly historical moment, characterized by the process of state reorganisation, privati­sation and private initiative. The national Technology Park is a shareholding company hosting an independent venture-capital institution. 
The promoters and operational entities of the project are the Republic of Slovenia, Ministry of Higher Education, Science and Technology and the Jožef Stefan Institute. The framework of the operation also includes the University of Ljubljana, the National Institute of Chemistry, the Institute for Electronics and Vacuum Technology and the Institute for Materials and Construction Research among others. In addition, the project is supported by the Ministry of the Economy, the National Chamber of Economy and the City of Ljubljana. 
Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Tel.:+386 1 4773 900, Fax.:+386 1 25193 85 WWW: http://www.ijs.si E-mail: matjaz.gams@ijs.si Public relations: Polona Strnad 

INFORMATICA 
AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS 
INVITATION, COOPERATION 

Submissions and Refereeing 
Please submit a manuscript to: http://www.informatica.si/Editors/ PaperUpload.asp. At least two referees outside the author’s coun­try will examine it, and they are invited to make as many remarks as possible from typing errors to global philosophical disagree­ments. The chosen editor will send the author the obtained re­views. If the paper is accepted, the editor will also send an email to the managing editor. The executive board will inform the au­thor that the paper has been accepted, and the author will send the paper to the managing editor. The paper will be published within one year of receipt of email with the text in Informat­ica MS Word format or Informatica LATEX format and .gures in .eps format. Style and examples of papers can be obtained from http://www.informatica.si. Opinions, news, calls for conferences, calls for papers, etc. should be sent directly to the managing edi­tor. 
QUESTIONNAIRE 
Send Informatica free of charge 
Yes, we subscribe 

Please, complete the order form and send it to Dr. Drago Torkar, Informatica, Institut Jožef Stefan, Jamova 39, 1000 Ljubljana, Slovenia. E-mail: drago.torkar@ijs.si 
Since 1977, Informatica has been a major Slovenian scienti.c journal of computing and informatics, including telecommunica­tions, automation and other related areas. In its 16th year (more than twentyone years ago) it became truly international, although it still remains connected to Central Europe. The basic aim of In­formatica is to impose intellectual values (science, engineering) in a distributed organisation. 
Informatica is a journal primarily covering intelligent systems in the European computer science, informatics and cognitive com­munity; scienti.c and educational as well as technical, commer­cial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international refereeing. It publishes scienti.c papers ac­cepted by at least two referees outside the author’s country. In ad­dition, it contains information about conferences, opinions, criti­cal examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and infor­mation industry are presented through commercial publications as well as through independent evaluations. 
Editing and refereeing are distributed. Each editor can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author’s country. If new referees are appointed, their names will appear in the Refereeing Board. 
Informatica is free of charge for major scienti.c, educational and governmental institutions. Others should subscribe (see the last page of Informatica). 

ORDER FORM – INFORMATICA 

Name: .................................................... 
TitleandProfession(optional): .............................. 
........................................................... 
HomeAddressandTelephone(optional): .................... 
........................................................... 

Of.ceAddressandTelephone(optional): .................... 

........................................................... 
E-mailAddress(optional): .................................. 

SignatureandDate: ........................................ 

Informatica WWW: 

http://www.informatica.si/ 

Referees from 2008 on: 

A. Abraham, S. Abraham, R. Accornero, A. Adhikari, R. Ahmad, G. Alvarez, N. Anciaux, R. Arora, I. Awan, J. Azimi, C. Badica, Z. Balogh, S. Banerjee, G. Barbier, A. Baruzzo, B. Batagelj, T. Beaubouef, N. Beaulieu, M. ter Beek, P. Bellavista, K. Bilal, S. Bishop, J. Bodlaj, M. Bohanec, D. Bolme, Z. Bonikowski, B. Boškovi´c, M. Botta, 
P. Brazdil, J. Brest, J. Brichau, A. Brodnik, D. Brown, I. Bruha, M. Bruynooghe, W. Buntine, D.D. Burdescu, J. Buys, X. Cai, Y. Cai, J.C. Cano, T. Cao, J.-V. Capella-Hernández, N. Carver, M. Cavazza, R. Ceylan, A. Chebotko, 
I. Chekalov, J. Chen, L.-M. Cheng, G. Chiola, Y.-C. Chiou, I. Chorbev, S.R. Choudhary, S.S.M. Chow, K.R. Chowdhury, V. Christlein, W. Chu, L. Chung, M. Ciglariˇ
c, J.-N. Colin, V. Cortellessa, J. Cui, P. Cui, Z. Cui, D. Cutting, A. Cuzzocrea, V. Cvjetkovic, J. Cypryjanski, L. ˇCerepnalkoski, I. ˇc, G. Daniele, G. 
Cehovin, D. ˇCosi´Danoy, M. Dash, S. Datt, A. Datta, M.-Y. Day, F. Debili, C.J. Debono, J. Dediˇ
c, P. Degano, A. Dekdouk, H. Demirel, B. Demoen, S. Dendamrongvit, T. Deng, A. Derezinska, J. Dezert, G. Dias, I. Dimitrovski, S. Dobrišek, 
Q. Dou, J. Doumen, E. Dovgan, B. Dragovich, D. Drajic, O. Drbohlav, M. Drole, J. Dujmovi´c, O. Ebers, J. Eder, 
S. Elaluf-Calderwood, E. Engström, U. riza Erturk, A. Farago, C. Fei, L. Feng, Y.X. Feng, B. Filipiˇ
c, I. Fister, I. Fister Jr., D. Fišer, A. Flores, V.A. Fomichov, S. Forli, A. Freitas, J. Fridrich, S. Friedman, C. Fu, X. Fu, T. Fujimoto, G. Fung, S. Gabrielli, D. Galindo, A. Gambarara, M. Gams, M. Ganzha, J. Garbajosa, R. Gennari, G. Georgeson, N. Gligori´car, M. Grgurovi´
c, S. Goel, G.H. Gonnet, D.S. Goodsell, S. Gordillo, J. Gore, M. Grˇc, D. Grosse, Z.-H. Guan, D. Gubiani, M. Guid, C. Guo, B. Gupta, M. Gusev, M. Hahsler, Z. Haiping, A. Hameed, C. Hamzaçebi, Q.-L. Han, H. Hanping, T. Härder, J.N. Hatzopoulos, S. Hazelhurst, K. Hempstalk, J.M.G. Hidalgo, J. Hodgson, M. Holbl, M.P. Hong, G. Howells, M. Hu, J. Hyvärinen, D. Ienco, B. Ionescu, R. Irfan, N. Jaisankar, D. Jakobovi´c, K. Jassem, I. Jawhar, Y. Jia, T. Jin, I. Jureta, .. Juriˇci´c, S. K, S. Kalajdziski, Y. Kalantidis, B. Kaluža, 
D. Kanellopoulos, R. Kapoor, D. Karapetyan, A. Kassler, D.S. Katz, A. Kaveh, S.U. Khan, M. Khattak, V. Khomenko, E.S. Khorasani, I. Kitanovski, D. Kocev, J. Kocijan, J. Kollár, A. Kontostathis, P. Korošec, A. Koschmider, D. Košir, J. Kovaˇc, A. Krajnc, M. Krevs, J. Krogstie, P. Krsek, M. Kubat, M. Kukar, A. Kulis, A.P.S. Kumar, H. Kwa´
snicka, W.K. Lai, C.-S. Laih, K.-Y. Lam, N. Landwehr, J. Lanir, A. Lavrov, M. Layouni, G. Leban, 
A. Lee, Y.-C. Lee, U. Legat, A. Leonardis, G. Li, G.-Z. Li, J. Li, X. Li, X. Li, Y. Li, Y. Li, S. Lian, L. Liao, C. Lim, J.-C. Lin, H. Liu, J. Liu, P. Liu, X. Liu, X. Liu, F. Logist, S. Loskovska, H. Lu, Z. Lu, X. Luo, M. Luštrek, I.V. Lyustig, S.A. Madani, M. Mahoney, S.U.R. Malik, Y. Marinakis, D. Marinciˇˇ
c, J. Marques-Silva, A. Martin, D. Marwede, M. Matijaševi´
c, T. Matsui, L. McMillan, A. McPherson, A. McPherson, Z. Meng, M.C. Mihaescu, V. Milea, N. Min-Allah, E. Minisci, V. Miši´c, A.-H. Mogos, P. Mohapatra, D.D. Monica, A. Montanari, A. Moroni, J. Mosegaard, M. Moškon, L. de M. Mourelle, H. Moustafa, M. Možina, M. Mrak, Y. Mu, J. Mula, D. Nagamalai, 
M. Di Natale, A. Navarra, P. Navrat, N. Nedjah, R. Nejabati, W. Ng, Z. Ni, E.S. Nielsen, O. Nouali, F. Novak, B. Novikov, P. Nurmi, D. Obrul, B. Oliboni, X. Pan, M. Panˇcur, W. Pang, G. Papa, M. Paprzycki, M. Paraliˇc, B.-K. Park, P. Patel, T.B. Pedersen, Z. Peng, R.G. Pensa, J. Perš, D. Petcu, B. Petelin, M. Petkovšek, D. Pevec, M. Piˇcan, M. Polo, V. Pomponiu, E. Popescu, D. Poshyvanyk, B. Potoˇ
culin, R. Piltaver, E. Pirogova, V. Podpeˇcnik, 
R.J. Povinelli, S.R.M. Prasanna, K. Pripuži´c, G. Puppis, H. Qian, Y. Qian, L. Qiao, C. Qin, J. Que, J.-J. Quisquater, C. Rafe, S. Rahimi, V. Rajkoviˇc, J. Ramaekers, J. Ramon, R. Ravnik, Y. Reddy, W. 
c, D. Rakovi´Reimche, H. Rezankova, D. Rispoli, B. Ristevski, B. Robiˇ
c, J.A. Rodriguez-Aguilar, P. Rohatgi, W. Rossak, I. Rožanc, J. Rupnik, S.B. Sadkhan, K. Saeed, M. Saeki, K.S.M. Sahari, C. Sakharwade, E. Sakkopoulos, P. Sala, 
M.H. Samadzadeh, J.S. Sandhu, P. Scaglioso, V. Schau, W. Schempp, J. Seberry, A. Senanayake, M. Senobari, 
T.C. 
Seong, S. Shamala, c. shi, Z. Shi, L. Shiguo, N. Shilov, Z.-E.H. Slimane, F. Smith, H. Sneed, P. Sokolowski, 

T. 
Song, A. Soppera, A. Sorniotti, M. Stajdohar, L. Stanescu, D. Strnad, X. Sun, L. Šajn, R. Šenkeˇrík, M.R. Šikonja, J. Šilc, I. Škrjanc, T. Štajner, B. Šter, V. Štruc, H. Takizawa, C. Talcott, N. Tomasev, D. Torkar, S. Torrente, M. Trampuš, C. Tranoris, K. Trojacanec, M. Tschierschke, F. De Turck, J. Twycross, N. Tziritas, W. Vanhoof, P. Vateekul, L.A. Vese, A. Visconti, B. Vlaoviˇc, M. Vozalis, P. Vraˇc, C.-H. 


c, V. Vojisavljevi´car, V. Vrani´Wang, H. Wang, H. Wang, H. Wang, S. Wang, X.-F. Wang, X. Wang, Y. Wang, A. Wasilewska, S. Wenzel, V. Wickramasinghe, J. Wong, S. Wrobel, K. Wrona, B. Wu, L. Xiang, Y. Xiang, D. Xiao, F. Xie, L. Xie, Z. Xing, H. Yang, X. Yang, N.Y. Yen, C. Yong-Sheng, J.J. You, G. Yu, X. Zabulis, A. Zainal, A. Zamuda, M. Zand, Z. Zhang, 
Z. Zhao, D. Zheng, J. Zheng, X. Zheng, Z.-H. Zhou, F. Zhuang, A. Zimmermann, M.J. Zuo, B. Zupan, M. Zuqiang, B. Žalik, J. Žižka, 
Informatica 

An International Journal of Computing and Informatics 
Web edition of Informatica may be accessed at: http://www.informatica.si. 
Subscription Information Informatica (ISSN 0350-5596) is published four times a year in Spring, Summer, 
Autumn, and Winter (4 issues per year) by the Slovene Society Informatika, Litostrojska cesta 54, 1000 Ljubljana, 
Slovenia. 
The subscription rate for 2015 (Volume 39) is 

– 
60 EUR for institutions, 


– 
30 EUR for individuals, and 



– 
15 EUR for students 
Claims for missing issues will be honored free of charge within six months after the publication date of the issue. 



Typesetting: Borut Žnidar. 
Printing: ABO gra.ka d.o.o., Ob železnici 16, 1000 Ljubljana. 

Orders may be placed by email (drago.torkar@ijs.si), telephone (+386 1 477 3900) or fax (+386 1 251 93 85). The 
payment should be made to our bank account no.: 02083-0013014662 at NLB d.d., 1520 Ljubljana, Trg republike 
2, Slovenija, IBAN no.: SI56020830013014662, SWIFT Code: LJBASI2X. 

Informatica is published by Slovene Society Informatika (president Niko Schlamberger) in cooperation with the 
following societies (and contact persons): 
Robotics Society of Slovenia (Jadran Lenarciˇˇ

c) Slovene Society for Pattern Recognition (Janez Perš) Slovenian Arti.cial Intelligence Society (Dunja Mladeni´
c) Cognitive Science Society (Urban Kordeš) Slovenian Society of Mathematicians, Physicists and Astronomers (Andrej Likar) Automatic Control Society of Slovenia (Sašo Blažiˇ
c) Slovenian Association of Technical and Natural Sciences / Engineering Academy of Slovenia (Vojteh Leskovšek) ACM Slovenia (Andrej Brodnik) 
Informatica is .nancially supported by the Slovenian research agency from the Call for co-.nancing of scienti.c periodical publications. 
Informatica is surveyed by: ACM Digital Library, Citeseer, COBISS, Compendex, Computer & Information Systems Abstracts, Computer Database, Computer Science Index, Current Mathematical Publications, DBLP Computer Science Bibliography, Directory of Open Access Journals, InfoTrac OneFile, Inspec, Linguistic and Language Behaviour Abstracts, Mathematical Reviews, MatSciNet, MatSci on SilverPlatter, Scopus, Zentralblatt Math 
Volume 39 Number 2 June 2015 ISSN 0350-5596 


Editors’ Introduction to the Special Issue on 
"Bioinspired Optimization" Differential Evolution Control Parameters Study for Self-Adaptive Triangular Brushstrokes 
Parallel Implementation of Desirability Function-Based Scalarization Approach for Multiobjective Optimization Problems 
Using a Genetic Algorithm to Produce Slogans 
Comparing Evolutionary Operators, Search Spaces, and Evolutionary Algorithms in the Construction of Facial Composites 
Heuristics for Optimization of LED Spatial Light Distribution Model Implicit and Explicit Averaging Strategies for Simulation-Based Optimization of a Real-World Production Planning Problem 
Data Mining-Assisted Parameter Tuning of a Search Algoritm 
End of Special Issue / Start of normal papers 

A High Resolution Clique-based Overlapping Community Detection Algorithm for Small-world Networks History-based Approach for Detecting Modularity Defects in Aspect Oriented Software 
Towards Crafting an Improved Functional Link Arti.cial Neural Network Based on Differential Evolution and Feature Selection 
Multimodal Score-Level Fusion Using Hybrid GA-PSO for Multibiometric System 
J. Šilc, A. Zamuda 
A. Zamuda, U. Mlakar 
O.T. Altinoz, E. Akca, 
A.E. 
Yilmaz, A. Duca, 

G. 
Ciuprina 

P. 
Tomašiˇ


c, G. Papa, M. Žnidaršiˇc 
J.J. Mist, S.J. Gibson, 
C.J. 
Solomon 

D. 
Kaljun, D.R. Poklukar, 

J. 
Žerovnik 


J.E. 
Diaz, J.Handl 

J. 
Šilc, K. Taškova, 

P. 
Korošec 


103 105 115 
125 135 
147 161 
169 
A. Bóta, M. Krész  177  
H. Cherait, N. Bounour  187  
Ch.S.K. Dash, A.K. Behera, S. Dehuri, S.-B. Cho, G.-N. Wang D. Cheri., I. Hafnaoui, A. Nait-Ali  195 209  


Informatica 39 (2015) Number 2, pp. 103–217