Informatica An International Journal of Computing and Informatics Theoretical Computer Science Guest Editor: Andrej Brodnik /i The Slovene Society Informatika, Ljubljana, Slovenia EDITORIAL BOARDS, PUBLISHING COUNCIL Informatica is a journal primarily covering the European computer science and informatics community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international referee-ing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor from the Editorial Board can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the list of referees. Each paper bears the name of the editor who appointed the referees. Each editor can propose new members for the Editorial Board or referees. Editors and referees inactive for a longer period can be automatically replaced. Changes in the Editorial Board are confirmed by the Executive Editors. The coordination necessary is made through the Executive Editors who examine the reviews, sort the accepted articles and maintain appropriate international distribution. The Executive Board is appointed by the Society Informatika. Informatica is partially supported by the Slovenian Ministry of Science and Technology. Each author is guaranteed to receive the reviews of his article. When accepted, publication in Informatica is guaranteed in less than one year after the Executive Editors receive the corrected version of the article. Executive Editor - Editor in Chief Anton P. Železnikar Volariceva 8, Ljubljana, Slovenia s51em@lea.hamradio.si http://lea.hamradio.si/~s51em/ Executive Associate Editor (Contact Person) Matjaž Gams, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 219 385 matjaz.gams@ijs.si http://ai.ijs.si/mezi/matjaz.html Deputy Managing Editor Mitja Luštrek, Jožef Stefan Institute mitja.lustrek@ijs.si Editorial Board Suad Alagic (USA) Anders Ardo (Sweden) Costin Badica (Romania) Vladimir Batagelj (Slovenia) Francesco Bergadano (Italy) Marco Botta (Italy) Pavel Brazdil (Portugal) Andrej Brodnik (Slovenia) Ivan Bruha (Canada) Wray Buntine (Finland) Hubert L. Dreyfus (USA) Jozo Dujmovic (USA) Johann Eder (Austria) Vladimir A. Fomichov (Russia) Janez Grad (Slovenia) Hiroaki Kitano (Japan) Igor Kononenko (Slovenia) Miroslav Kubat (USA) Ante Lauc (Croatia) Jadran Lenarcic (Slovenia) Huan Liu (USA) Suzana Loskovska (Macedonia) Ramon L. de Mantras (Spain) Angelo Montanari (Italy) Pavol Nävrat (Slovakia) Jerzy R. Nawrocki (Poland) Nadja Nedjah (Brasil) Franc Novak (Slovenia) Marcin Paprzycki (USA/Poland) Alberto Paoluzzi (Italy) Gert S. Pedersen (Denmark) Ivana Podnar (Switzerland) Karl H. Pribram (USA) Luc De Raedt (Germany) Dejan Rakovic (Serbia and Montenegro) Jean Ramaekers (Belgium) Wilhelm Rossak (Germany) Ivan Rozman (Slovenia) Sugata Sanyal (India) Walter Schempp (Germany) Johannes Schwinn (Germany) Zhongzhi Shi (China) Oliviero Stock (Italy) Robert Trappl (Austria) Terry Winograd (USA) Stefan Wrobel (Germany) Xindong Wu (USA) Executive Associate Editor (Technical Editor) Drago Torkar, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 219 385 drago.torkar@ijs.si Publishing Council: Tomaž Banovec, Ciril Baškovic, Andrej Jerman-Blažic, Jožko (Ćuk, Vladislav Rajkovic Board of Advisors: Ivan Bratko, Marko Jagodic, Tomaž Pisanski, Stanko Strmcnik Dear reader, in front of you is the final act of a very successful conference Theoretical Computer Science 04 (TCS) which took place as a sub-conference of the multiconference Information Society 04 on October 9th -15th 2004. The program committee of TCS received 23 contributions from 8 countries. After a thorough reviewing process it selected 12 papers to be presented at the conference. However, since the program committee wanted to bring the TCS conference closer to a general public, it decided to invite 10 contributions to a special, poster session. Besides these talks was at the conference also presented an invited talk by Prof. Ian Munro from University of Waterloo, Canada, with a title Succinct Data Structures. Members of the program committee were: - Andrej Brodnik, University of Primorska, Chairman - Stefano Crespi-Reghizzi, Technical University Milano - Roberto Grossi, University of Pisa - Marjan Mernik, University of Maribor - Bojan Mohar, University of Ljubljana - Peter Paule, Research Institute for Symbolic Computation (RISC), Linz - Marko Petkovšek, University of Ljubljana - Tomaž Pisanski, University of Ljubljana - Borut Robič, University of Ljubljana - John Shawe-Taylor, University of Southampton - Boštjan Vilfan, University of Ljubljana - Gerhard J. Woeginger, University of Twente - Janez Žerovnik, University of Maribor The program committee also received an invitation to publish the best papers from the conference as a special part of Informatica journal. The committee decided to invite authors of four contributions: - Miklós Bartha and Miklós Krész, Deterministic soliton graphs, - Sergio Cabello, Matt DeVos and Bojan Mohar, Expected case for projecting points, - Hovhannes A. Harutyunyan and Calin D. Morosan, The spectra of Knödel graphs, and - Bojan Mohar, On the crossing number of almost planar graphs. Cabello et al. in the second paper consider a set of n points in a plane where a distance between any pair of points is at least one. They project these points on a random line which they split into segments (cells) of length one - such a line is called a graduated line. In the paper they show an upper bound of O(n^^3) for the expected concentration of projections on a graduated line. Their result is relevant in Computational Geometry for sweepline algorithms when the sweeping direction is chosen at random. In the third paper Harutyunyan and Morosan study Knödel graphs. The Knödel graphs are applicable in distributed computing as they can be used for data broadcasting. The important property of Knödel graphs is their spectrum and authors in the paper show how to compute the spectra of Knödel graphs using results of Fourier analysis, circulant matrices and PD-matrices. From these results they derive the formula to compute the number of spanning trees each of which can be used to broadcast data in the graph. In the last paper Bojan Mohar answers on a question posed by Riskin whether a crossing number of a graph G0+xy is equal to d, where G0 is a 3-connected cubic planar graph, and x, y e V(G) at a dual distance d. The answer is negative and holds also for 5-connected graphs planar graphs. Andrej Brodnik The authors were asked to thoroughly review their papers and extend them for journal publication. The rewritten papers were reviewed once more and now they are in front of you. All four papers are on graphs and their use in solving various problems. In the first paper Bartha and Krész study soliton graphs. The soliton graphs are related to deterministic automata and the authors show how and when they can be reduced to simpler and more normal structures (chestnut graphs, generalized trees, and graphs having a unique perfect matching) not affecting their properties. Deterministic Soliton Graphs Miklós Bartha Department of Computer Science Memorial University of Newfoundland St. John's, NL, Canada E-mail: bartha@cs.mun.ca Miklós Krész Department of Computer Science Juhäsz Gyula Teacher Training College University of Szeged, Hungary E-mail: kresz@jgytf.u-szeged.hu Keywords: soliton automata, matching theory Received: March 23, 2005 Soliton graphs are studied in the context of a reduction procedure that simplifies the structure of graphs without affecting the deterministic property of the corresponding automata. It is shown that an elementary soliton graph defines a deterministic automaton iff it reduces to a graph not containing even-length cycles. Based on this result, a general characterization is given for deterministic soliton graphs using chestnut graphs, generalized trees, and graphs having a unique perfect matching. Povzetek: (Ćlanek obravnava grafe brez lihih ciklov. 1 Introduction One of the most ambitious goals of research1 in modern bioelectronics is to develop a molecular computer. The introduction of the concept "soliton automaton" in [5] has been inspired by this research, with the intention to capture the phenomenon called soliton waves [4] through an appropriate graph model. Soliton graphs and automata have been systematically studied by the authors on the grounds of matching theory in a number of papers. Perhaps the most significant contribution among these is [2], where soliton graphs have been decomposed into elementary components, and these components have been grouped into pairwise disjoint families based on how they can be reached by alternating paths starting from external vertices. This paper can also serve as a source of further references on soliton automata for the interested reader. Since soliton automata are proposed as switching devices, deterministic automata are in the center of investigations. The results reported in this paper are aimed at providing a complete characterization of deterministic soliton automata. The two major aspects of this characterization are: 1. Describing elementary deterministic soliton graphs. 2. Recognizing that deterministic soliton graphs having an alternating cycle follow a simple hierarchical pattern called a chestnut. 1 Partially supported by Natural Science and Engineering Research Council of Canada, Discovery Grant #170493-03 An important tool in the study of both aspects is a reduction procedure, which might be of interest by itself. It allows elementary deterministic soliton graphs to be reduced to generalized trees, and it can also be used to reduce chestnut graphs to really straightforward ones, called baby chestnuts. 2 Soliton graphs and automata By a graph G = (V{G),E{G)) we mean an undirected graph with multiple edges and loops allowed. A vertex v e V (G) is called external if its degree is one, and internal if the degree of v is at least two. An internal vertex is base if it is adjacent to an external one. External edges are those of E(G) that are incident with at least one external vertex, and internal edges are those connecting two internal vertices. Graph G is called open if it has at least one external vertex. A walk in a graph is an alternating sequence of vertices and edges, which starts and ends with a vertex, and in which each edge is incident with the vertex immediately preceding it and the vertex immediately following it. The length of a walk is the number of occurrences of edges in it. A trail is a walk in which all edges are distinct and a path is a walk in which all vertices are distinct. A cycle is a trail which can be decomposed into a path and an edge connecting the endpoints of the path. If a = vei... enw is a walk from v to w and ß = wfi... fk z is a walk from w to z, then the concatenation of a and ß is the walk a y ß = vei... enwfi... fk z from v to z. A matching M of graph G is a subset of E (G) such that no vertex of G occurs more than once as an endpoint of some edge in M. It is understood by this definition that loops cannot participate in any matching. The endpoints of the edges contained in M are said to be covered by M. A perfect internal matching is a matching that covers all of the internal vertices. An edge e G E{G) is allowed (mandatory) if e is contained in some (respectively, all) perfect internal matching(s) of G. Forbidden edges are those that are not allowed. We shall also use the term constant edge to identify an edge that is either forbidden or mandatory. An open graph having a perfect internal matching is called a soliton graph. A soliton graph G is elementary if its allowed edges form a connected subgraph covering all the external vertices. Observe that if G is elementary, then it cannot contain a mandatory edge, unless G is a mandatory edge by itself with a number of loops incident with one of its endpoints. Let G be an elementary soliton graph, and define the relation ^ on Int{G) as follows: vi ^ v^ if an extra edge e connecting vi with v2 becomes forbidden in G + e. It is known, cf. [6, 2], that ^ is an equivalence relation, which determines the so called canonical partition of (the internal vertices of) G. The reader is referred to [6] for more information on canonical equivalence, and on matching theory in general. Let G be a graph and M be a matching of G. An edge e G E (G) is said to be M -positive (M-negative) if e G M (respectively, e G M). An M-alternating path (cycle) in G is a path (respectively, even-length cycle) stepping on M-positive and M-negative edges in an alternating fashion. An M-alternating loop is an odd-length cycle having the same alternating pattern of edges, except that exactly one vertex has two negative edges incident with it. Let us agree that, if the matching M is understood or irrelevant in a particular context, then it will not be explicitly indicated in these terms. An external alternating path is one that has an external endpoint. If both endpoints of the path are external, then it is called a crossing. An alternating path is positive if it is such at its internal endpoints, meaning that the edges incident with those endpoints are positive. Let G be a soliton graph, fixed for the rest of this section, and let M be a perfect internal matching of G. An M-alternating unit is either a crossing or an alternating cycle with respect to M. Switching on an alternating unit amounts to changing the sign of each edge along the unit. It is easy to see that the operation of switching on an M-alternating unit a creates a new perfect internal matching S(M,a) for G. Moreover, as it was proved in [1], every perfect internal matching M of G can be transformed into any other perfect internal matching M' by switching on a collection of pairwise disjoint alternating units. Consequently, an edge e of G is constant iff there is no alternating unit passing through e with respect to any perfect internal matching of G. A collection of pairwise disjoint M -alternating units will be called an M-alternating network, and the network transforming one perfect internal matching M into another M' will be denoted by N(M, M'). Clearly, a) c-trail A b) l-trail Figure 1: Soliton trails. N(M, M') is unique. Now we generalize the alternating property to trails and walks. An alternating trail is a trail a stepping on positive and negative edges in such a way that a is either a path, or it returns to itself only in the last step, traversing a negative edge. The trail a is a c-trail (l-trail) if it does return to itself, closing up an even-length (respectively, odd-length) cycle. That is, a = ai y a2, where ai is a path and a2 is a cycle. These two components of a are called the handle and circuit, in notation, h(a) and c(a). The joint vertex on h(a) and c(a) is called the center of a. An external alternating trail is one starting out from an external vertex, and a soliton trail is a proper external alternating trail, that is, either a c-trail or an l-trail. See Fig. 1. The collection of external alternating walks in G with respect to some perfect internal matching M, and the concept of switching on such walks are defined recursively as follows. (i) The walk a = v0ev1, where e = (vo, v1) with vo being external, is an external M-alternating walk, and switching on a results in the set S(M, a) = MA{e}. (The operation A is symmetric difference of sets.) (ii) If a = voei... enVn is an external M-alternating walk ending at an internal vertex vn, and e„+i = (v„,v„+i) is such that en+i G S(M,a) iff en G S (M, a), then a' = aen+ivn+i is an external M-alternating walk and S(M, a') = S(M, a)A{e„+i}. It is required, however, that e„+i = e„, unless en G S (M, a) is a loop. It is clear by the above definition that S(M, a) is a perfect internal matching iff the endpoint vn of a is external, too. In this case we say that a is a soliton walk. EXAMPLE Consider the graph G of Fig. 2, and let M = {e, hi, h2}. A possible soliton walk from u to v with respect to M is a = uewgzihiz2l2z3h2z4lizigwfv. Switching on a then results in S (M, a) = {f, li,l2}. Graph G gives rise to a soliton automaton A^g, the states of which are the perfect internal matchings of G. The input alphabet for AG is the set of all (ordered) pairs of external vertices in G, and the state transition function 5 is defined Figure 2: Example soliton graph G by 5{M, {v,w)) = {S{M, a) I a is a soliton walk, v to w}. Graph G is called deterministic if Ag is such in the usual sense, that is, if for every state M and input (v, w), IS{M, (v,w))| < 1. Example (Continued) Observe that the soliton automaton defined by the graph of Fig. 2 is non-deterministic, as a = uewfv is also a soliton walk from u to v with respect to state M such that S (M, a) = S (M, a'). Let a be a soliton c-trail with respect to M. It is easy to see that the walk s(a) = h(a) y c(a) ^ h(a)R is a soliton walk, and the effect of switching on s (a) is the same as switching on the cycle c(a) alone. (For any walk ß, ßR denotes the reverse of ß.) If a is a soliton l-trail, then s (a) is defined as the soliton walk s(a) = h(a) y c(a) y c(a) y h(a)R. Clearly, this walk induces a self-transition of AG, that is, no state change is observed. In the sequel, all perfect internal matchings of G will simply be called states for obvious reasons. Recall from [5] that an edge e of G is impervious if there is no soliton walk passing through e in any state of G. Edge e is viable if it is not impervious. See Fig. 2, edge h for an example of an impervious edge. We are going to give a simpler characterization of impervious edges in terms of alternating paths, rather than walks. To this end, we need the following lemma. Lemma 2.1. If a is an external M-alternating walk from v to u, then there exists an M-alternating network F and an external M-alternating trail ß from v to u such that 1. F consists of cycles only, and it is disjoint from ß; 2. S(M,a) = S(S(M, F),ß). Proof. Easy induction on the length of a, left to the reader. □ An internal vertex v e V (G) is called accessible with respect to state M if there exists a positive external M-alternating path leading to v. It is easy to see, cf. [2], that vertex v is accessible with respect to some state M iff v is accessible with respect to all states of G. Proposition 2.2. For every edge e e E(G), e is impervious iff both endpoints of e are inaccessible. Proof. If either endpoint of e is accessible, then e is clearly viable. Assume therefore that both endpoints ui and u2 of e are inaccessible, and let a be an arbitrary external M-alternating walk from v e Ext (G) to either u1 or u2, say u1. By Lemma 2.1, there exists a suitable external M-alternating trail ß from v to ui . Each internal edge lying on ß has an accessible endpoint, so that e is not among them. Moreover, the edge of h(ß) incident with u1 must be positive with respect to S (M, a), otherwise h(ß) would be a positive external alternating path with respect to state S(M, F). (Recall that h(ß) is the handle of ß, and take h(ß) = ß if ß is just a path.) But then e must be negative with respect to S(M, a), or else h(ß)eu2 would be a positive external alternating path leading to u2 (with respect to S (M, F), or even M, since F is disjoint from ß). We conclude that the walk a cannot continue on e, because it must take the two positive edges incident with u1 before and after hitting that vertex. Thus, every time an external alternating walk reaches either endpoint of e, it will miss e as a possible continuation. In other words, e is impervious. □ 3 Elementary decomposition of soliton graphs Again, let us fix a soliton graph G for the entire section. In general, the subgraph of G determined by its allowed edges has several connected components, which are called the elementary components of G. Notice that an elementary component can be as small as a single external vertex of G. Such elementary components are called degenerate, and they are the only exception from the general rule that each elementary component is an elementary graph. A mandatory elementary component is a single mandatory edge e e E(G), which might have a loop around one or both of its endpoints. The structure of elementary components in a soliton graph G has been analysed in [2]. To summarize the main results of this analysis, we first need to review some of the key concepts introduced in that paper. Elementary components are classified as external or internal, depending on whether or not they contain an external vertex. An elementary component of G is viable if it does not contain impervious allowed edges. A viable internal elementary component C is one-way if all external alternating paths (with respect to any state M) enter C in vertices belonging to the same canonical class of C. This unique class, as well as the vertices belonging to this class, are called principal. Furthermore, every external elementary component is considered a priori one-way (with no principal canonical class, D Figure 3: Elementary components in a soliton graph of course). A viable elementary component is two-way if it is not one-way. An impervious elementary component is one that is not viable. Example The graph of Fig. 3 has five elementary components, among which Ci and D are external, while Cq, C3 and C4 are internal. Component C3 is one-way with the canonical class {u, v} being principal, while C2 is two-way and C4 is impervious. Let C be an elementary component of G, and M be a state. An M-alternating C-ear is a negative M-alternating path or loop having its two endpoints, but no other vertices, in C. The endpoints of the ear will necessarily be in the same canonical class of C. We say that elementary component C' is two-way accessible from component C with respect to any (or all) state(s) M, in notation CpC', if C' is covered by an M-alternating C-ear. It is required, though, that if C is one-way and internal, then the endpoints of this ear not be in the principal canonical class of C. As it was shown in [2], the two-way accessible relationship is matching invariant. A family of elementary components in G is a block of the partition induced by the smallest equivalence relation containing p. A family F is called external if it contains an external elementary component, otherwise F is internal. A degenerate family is one that consists of a single degenerate external elementary component. Family F is viable if every elementary component in F is such, and F is impervious if it is not viable. As it turns out easily, the elementary components of an impervious family are all impervious. Soliton graph G is viable if all of its families are such. Example (Continued) Our example graph in Fig. 3 has four families: Fi = {Ci,C2}, Fq = {D}, F3 = {C3}, F4 = {C4}. Family F1 is external, F2 is degenerate, and F3 is internal. These families are all viable, whereas family F4 is impervious. The first group of results obtained in [2] on the structure of elementary components of G can now be stated as follows. Theorem 3.1. Each viable family of G contains a unique one-way elementary component, called the root of the family. Each vertex in every member of the family, except for the principal vertices of the root, is accessible. The principal vertices themselves are inaccessible, but all other vertices are only accessible through them. A family F is called near-external if each forbidden viable edge incident with any principal vertex of its root is external. For two distinct viable families F1 and F2, F2 is said to follow F1, in notation F1 ^ F2, if there exists an edge in G connecting any non-principal vertex in F1 with a principal vertex of the root of F2. The reflexive and transitive closure of ^ is denoted by . The second group of results in [2] characterizes the edge connections between members inside one viable family, and those between two different families. Theorem 3.2. The following four statements hold for the families of G. 1. An edge e inside a viable family F is impervious iff both endpoints of e are in the principal canonical class of the root. Every forbidden edge e connecting two different elementary components in F is part of an ear to some member C e F. 2. For every edge e connecting a viable family F1 to any other family (viable or not) Fq, at least one endpoint of e is principal in F1 or F2. If the endpoint of e in F1 is not principal, then F2 is viable and it follows F1. 3 The relation is a partial order between viable families, by which the external families are minimal elements. 4 If F and G are families such that F ^^ G, then each non-principal vertex u of G is accessible from F, meaning that for every state M there exists a positive M-alternating path to u either from a suitable external vertex of F, if F is external, or from an arbitrary principal vertex of F, if F is internal. The path a runs entirely in the families between F and G according to * I— For convenience, the inverse of the partial order —— will be referred to as 1, where 7 is a cycle of even length and each a^ (i e [k]) is a tree subject to the following conditions: AA w; Figure 4: A chestnut. (i) V(a,) n V(aj) = 0 for 1 < i = j < k; (ii) V (a, ) n V (y ) consists of a unique vertex - denoted by v, - for each i e [k]; (iii) v, and v j are at even distance on 7 for any distinct i,j e [k]; (iv) any vertex w, e V(a,) with d(wi) > 2 is at even distance from v, in a, for each i e [k]. See Fig. 4 for an example chestnut. Our first observation regarding chestnuts is that they are bipartite graphs. Let us call a vertex of a chestnut G outer if its distance from any of the vi's is even, and inner if this distance is odd. Then the inner and outer vertices indeed define a bipartition of G. Moreover, the degree of each inner vertex is at most 2. It is easy to come up with a perfect internal matching for G: just mark the cycle 7 in an alternating fashion, then the continuation is uniquely determined by the structure of the trees a,. Thus, G has exactly two states. It is also easy to see that the inner internal vertices are accessible, while the outer ones are inaccessible. Thus, the cycle 7 forms an internal elementary component with its outer vertices constituting the principal canonical class of this component. Moreover, 7 forms a stand-alone internal family in G. The rest of G's families are all single mandatory edges along the trees a,, or they are degenerate ones consisting of a single inner external vertex. Their rank in the partial order n/2 for any line L, and so EConc{P) = Q(n). However, the problem becomes non-trivial if we put restrictions to the density of the point set. Our objective1,2 is to bound the value EConc(P) for any 1-separated point set. Kucera et al. [4] have shown that Conc(P) = O^n log n) for any 1-separated point set P. More interestingly, they use Besicovitch's sets [1] for constructing 1-separated point sets P having Conc(P) = Q^nlogn), which implies EConc(P) = Q^nlogn). We will show that for any 1-separated point set P we have EConc(P) = O(n2/3). Therefore, it remains open to find tight bounds for EConc(P). The rationale behind considering projections onto random lines is the efficiency of randomized algorithms whose running time depends on the expected concentration. As an example, consider a set of disjoint unit disks and any sweep-line algorithm [2, Ghapter 2] whose running time depends on the maximum number of disks that are intersected by the sweep line. Ghoosing the direction in which the line sweeps affects the running time, but computing the best direction, or an approximation, is expensive: Kucera et al. [4] claim that it can be done in polynomial time, and Diaz et al. [3] give a constant-factor approximation algorithm with O(nt log nt) running time, where t is the diameter of P. By choosing a random projection we avoid having to compute a good direction for projecting, and we get a randomized algorithm. The results in this paper become helpful for analyzing the expected running time of such randomized algorithms. The rest of the paper is organized as follows. In Section 2 we introduce some relevant random variables and give some basic facts. In Sections 3 and 4 we bound 1 Supported by the European Community Sixth Framework Programme under a Marie Gurie Intra-European Fellowship. 2Supported in part by the Ministry of Education, Science and Sport of Slovenia, Research Program P1-0507-0101. EConc{P) using the first and second moments, respectively. 2 Preliminaries Let P = {pi,... ,pn} be a 1-separated point set, and let di,j = d{pi,pj ). We use the notation [n] = {1,...,n}. Without loss of generality, we can restrict ourselves to graduated lines passing through the origin. Let L{a) be the line passing through the origin that has angle a with the x-axis, and let p* (a) be the orthogonal projection of a point p onto L(a). Consider the following random variables for the angle a (a) = 1 if 4p*(a),p*(a)) < 1, otherwise; Xi(a)^ Xij (a); j=i Xmax(a) = max{Xi(a),.. .,Xn(a)}; n n n X (a) = Y, Xi(a) = ^Y. Xij (a), i=l i=l j=l where a is chosen uniformly at random from the values [0, In words: Xi,j is the indicator variable for the event that p**(a) andp**(a) are at distance at most one in the projection; Xi is the number of points (including pi itself) whose projection is at distance at most one from p*(a); Xmax is the maximum among Xi,.. .,Xn ; and X counts twice the number of pairs of points at distance at most one in the projection. It is clear that P[Xi,i = 1] = 1 for any i e [n]. Otherwise we have the following result. Lemma 1. If i = j, then p[Xij = 1] = /dij . Proof. Assume without loss of generality that pi is placed at the origin and p j is vertically above it, on the y-axis. See Figure 1. We may also assume that the line L(a) passes through pi. Because di,j > 1, there are values a such that Xij(a) = 1. The angles that make Xij(a) = 1 are indicated in the figure. In particular, if ß is the angle indicated in the figure, and we choose a uniformly at random, then P[Xi,j = 1] = .The angle ß is such that We conclude that □ sin ß = d1—, and so ß = arcsin d. p^Xij = 1] = = . ' The first observation, which is already used for the approximation algorithms described by Diaz et al. [3], is that, asymptotically, we do not need to care for the graduation, but only for the orientation of the line. In particular, the random variables Xi contain all the information that we need asymptotically. Lemma 2. We have EConc(P)) 2 < E [Xmax(a)] < 2 EConc(P). 3 Using the first moment Using that the closest pair of P is at least one apart, we get the following result. Lemma 3. For every i e [n)], we have ^ dt-, = Proof. Without loss of generality, assume that i — n. Let nd be the number of points in P whose distance from pn is in the interval [d,d + 1). We have 1 ^ / 1 dr ■ ^ ^^ ^^ di je[n-i] i,j d=i \di,je[d,d+i) '-'■'J d=l \di,j e[d,d+l) CO nd d=i d (1) (2) (3) Observe that if we have two sequences (ai)i£N and (6i)ieN of nonnegative numbers such tha^ ^^ j=l ai t/2 for all p j e P'. We conclude that X(a) > ^ Xj (a) > ^ t/2 > t/2 • \P\ > t2/4, pj eP pj eP and the claim is proved. We have shown that for any value t > 0 we have > t] C [X > t2/4], and using Markov's inequality we conclude > t] < P[X > t2/4] < 4E[X^ 0(^7«) t2 t2 Let r = \v?/4\. Since Xmax only takes natural numbers, we have < je[n]\{i} and using Lemma 3 we conclude that E[Xi] = O^^Jn). □ Using the first moment method, we can show that for any 1-separated point set P it holds that EConc(P) = 0(n3/4). For this, consider a 1-separated point set P and its associated random variable X. We have X ^ X', and because of Lemma 4 we conclude E[X] = O(^^Jn). We claim that, for any value t > 0, if we have Xmax(a) > t, then X(a) > t2/4. Intuitively, if some X' = t, then there are 0(t2) pairs of points at distance at most one from each other, and so contributing to X. The formal proof of the claim is as follows. Let i be an index such that Xi(a) > t. Then, either to the right or to the left of p*(a), the projection of p' onto L(a), there are at least t/2 points p*(a) at distance at most one from p*(a). Assume that those points are to the left and let P' C P be the set of those points. We have \pP\ > t/2. For any p j ,pj' e pP we have Xj^i (a) = 1, and therefore we have E[Xmax] =Y. P[Xmax > t] t=1 r n = Y, P[Xmax > t]+ ]r ax > t] t=1 t=r+1 < è 1+ è t=1 t=r+1 rn 1 < r + ^dt < n3/4 + O(nVn) = 0(n3/4). t2 rn / Using Lemma 2 it follows that EConc(P) = O(n3/4). However, observe that this bound will be improved in next section. We would like to point out that the random variables X' do not have a strong concentration around their expectation. Therefore, we cannot use many of the results based on concentration of the measure that would reduce the bound on EConc(P). To see this, consider the example in Figure 2. The point p' is the center of a disc of n Figure 2: Example showing that Xi is not concentrated around its expectation. radius n3/4, and we consider a circular sector with arc-length n1/4. This region is grey in the picture. Imagine that we place a densest 1-separated point set P inside the grey region. Asymptotically, since the region has area 0(n), such a point set P has 0(n) points. Consider the lines L(a + n/2) passing through pi. If a is chosen uniformly at random, the line L(a) intersects the grey region with probability n'^/4/(2nn;3/4) = e(l/Vn), and in that case Xi(a + n/2) = 0(n3/4). We conclude that E[Xi] = e(n1/4), but P[Xi = ü(n4/3)] = Q(1/^n), and so Xi does not concentrate around its expectation. 4 Second moments Lemma 5. For every i e [n] we have E[X2] = O(n). Proof. Assume without loss of generality that di,j > di,k whenever j > k; that is, the points are indexed according to their distance from pi. Like above, we assume that the line L(a) passes through pi. We have E[Xi2 ] = E dikk. Therefore, by the way we indexed the points, we conclude that, if Xj,^ (a) = 1, then Y.k 0, if we have Xmax(a) > t, then T (a) > t3/8. The proof is as follows. Let i be an index such that X, (a) > t. Then, either to the right or to the left of p* (a), the projection of p, onto L (a), there are at least t/2 points p*(a) at distance at most one from p*(a). Assume that those points are to the left and let pP C P be the set of those points. We have | > t/2. For any p j ,pj' e we have Xjj' (a) = 1. Therefore for all p j e P we have Xj (a) > t/2, and X2(a) > t2/4. We conclude that T (a) X2(a) p j ^P > E t'/4 > t^/4 •IPI p j e P > t3/8, and the claim is proved. n Figure 3: For the proof of Lemma 5. For any angle a we have (Xi,i Ek 0 we have [Xmax > t] C [T > t3/8], and using Markov's inequality we conclude P[Xmax > t] < P[T > t3/8] < 8EtT < . Let r = [n^/3\. Since Xm,ax only takes natural numbers, we have E[Xmax] P[Xmax > t] t=l r n = Y, P[Xmax > t]^ Y. P[Xmax > t] t=l t=r + l < j: 1+ O'"2' t=i t3 t=r+l " I _ t3 / 2 2 \ < r + O(n2) J t3dt < n2/3 + O(n2) = O(n2/3). yr2 n2y E[X3] = 0(n2), and in general E[Xf] = 0(nP-1) for all naturals p > 2. From this we can only conclude weaker results of the type EConc(P) = O(nP/(P+l) ). Conclusions We have studied the expected concentration of projecting 1-separated point sets onto random lines, a parameter that is relevant for sweep-line algorithms when the direction for sweeping is chosen at random. We have shown that, if P consists of n points, the expected concentration EConc(P) is O(n'2/3), while the best known lower bound is n log n). Therefore, it remains to close this gap. Acknowledgements The authors are grateful to Jirf Matoušek for the key reference [4]. Sergio is also grateful to Christian Knauer for early discussions. References [1] A. S. Besicovitch. The Kakeya problem. The American Mathematical Monthly, 70:697-706, 1963. [2] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, Berlin, Germany, 2nd edition, 2000. [3] J.M. Diaz and F. Hurtado and M. López and J.A. Sell-arès. Optimal point set projections onto regular grids. In T. Ibaraki et al., editor, 14th Inter. Symp. on Algorithms and Computation, volume 2906 of LNCS, pages 270-279. Springer Verlag, 2003. [4] L. Kucera, K. Mehlhorn, B. Preis, and E. Schwarze-necker. Exact algorithms for a geometric packing problem. In Proc. 10th Sympos. Theoret. Aspects Comput. Sci., volume 665 of Lecture Notes Comput. Sci., pages 317-322. Springer-Verlag, 1993. Using Lemma 2 it follows that EConc(P) = O(n2/3). □ Trying to use the same ideas with higher moments of X' does not help. Consider for example the 1-separated point set P consisting of all n points in a horizontal row of length n, and let p1 be the leftmost point. We have The Spectra of Knödel Graphs Hovhannes A. Harutyunyan and Calin D. Morosan Department of Computer Science 1455 de Maisonneuve Blvd. West, H3G1M8 Concordia University, Montreal, QC, Canada E-Mail: haruty@cs.concordia.ca, cd_moros@cs.concordia.ca Keywords: Knödel graphs, spectra of graphs, number of spanning trees Received: January 28, 2005 Knödel graphs Wd,n are regular graphs on n vertices and degree d. They have been introduced by W. Knödel and have been proved to be minimum gossip graphs and minimum broadcast graphs for d = [log n\. They became even more interesting in the light of recent results regarding the diameter, which is, up to now, the smallest known diameter among all minimum broadcast graphs on 2d vertices. Also, the logarithmic time routing algorithm that we have found, the bipancyclicity property, embedding properties and, nevertheless, Cayley graph structure, impel these graphs as good candidates for regular network constructions, especially in supercomputing. In this paper we describe an efficient way to compute the spectra of Knödel graphs using results from Fourier analysis, circulant matrices and PD-matrices. Based on this result we give a formula for the number of spanning trees in Knödel graphs. Povzetek: Narejena je analiza Knödelovih grafov. 1 Introduction 2 Definitions and notations Knödel graphs Wd,n, are regular graphs on even number of vertices n and degree d. They have been introduced by W. Knödel [10] and have been proved to be minimum gossip graphs and minimum broadcast graphs for degree d = [log2 n\. Recently, it has been proved in [7] that the Knödel graph Wd,2d on 2d vertices and degree d have diameter [d/2 + 1], which is the minimum known diameter among all minimum broadcast graphs on 2d vertices. We believe that this is also a lower bound on diameter for all regular graphs on 2d vertices and degree d. Also, the logarithmic time routing algorithm that we have found [9], the bipan-cyclicity property, embedding properties and, nevertheless, Cayley graph structure [6], impel these graphs as good candidates for regular network constructions, especially in supercomputing. The goal of this study is to compute efficiently the spectra of Knödel graphs, first for Wd,2d, and then for arbitrary degree g and number of vertices n. We use results from Fourier analysis, circulant matrices and PD-matrices. Based on this result we give a formula for the number of spanning trees in Knödel graphs. The paper is organized as follows: section 2 gives some definitions, section 3 extracts the general properties of the spectra, section 4 explains the method of computation, section 5 makes some remarks regarding the obtained spectra and section 6 establishes the number of spanning trees. If we denote by A the adjacency matrix of a simple graph G, the set of eigenvalues of A, together with their multiplicities, is said to be the spectrum of G. If we denote by I the identity matrix, then the characteristic polynomial of G is defined as P (A) = det \XI - A|. The spectrum of G will be the set of solutions of the equation P (A) = 0. Knowing the spectrum of a graph has a great impact on other characteristics of the graph. For example, the com- n-1 plexity of a graph is k (G) = n (An - Ak), where n k=i is the number of eigenvalues, and An is the greatest eigenvalue. Up to now, the spectra are known for some particular graphs: path, cycle, complete graph, complete bipartite graph, complete tree, hypercube, fc-dimensional lattice, star graph, etc. (see [4] and [8] for further references). The Knödel graphs Wg,n are defined as G (V, E) with \V\ = n even, and the set of edges [6]: E = {(i, j) I i + j =2k - 1 mod n} (1) where k = 1, 2,..., g, 0 < i, j < n — 1, 1 < g < [log2 n\. We denote the adjacency matrix of an undirected graph by A = [aij], where 1 < i,j < \V\ = n, aij = 1 whenever vertex i is adjacent to vertex j, and 0 otherwise. If M is a matrix, we denote by MT the transpose of M, by M the complex conjugate of M, by M * the transpose complex conjugate of M, and by M-1 the inverse of M. We denote by n a permutation: 1 2 a(1) a(2) n a (n) (2) - In particular, for WV^^d, the number of distinct eigen- values is at least [4]. + 2 since the diameter is + 1 and by P (n) = (aij ) the corresponding permutation matrix of n, where ai,^(i) = 1 and ai,j=^(i) = 0. If z G C, we denote by ž the complex conjugate of z, and by = \fžž the norm of z. We denote by diag (Ai, A2,..., Xn) the diagonal matrix with the elements of the main diagonal (A1, A2,..., An). Wedenote by circ (a1, a2,..., an) a circulant matrix with the first row (a1, a2,..., an). That is, the rest of the rows will be circular permutations of the first row toward right. Thus, it holds that ai,j — ai, i-j+i mod n. If the step of the shift is an integer q = 1, we call this matrix a (q) circulant matrix [12]. We denote by r the inverse permutation matrix, which is a (-1)circulant: r = (-1)circ (1,0,..., 0). An important property of r is that r2 = I, where I is the identity matrix. We denote by F the Fourier matrix, defined by its con- jugate transpose F * = [w(i-1)(j-1^, 1 < i,j < n, where w stands for the nth root of the unity [5]. Two important properties of F are: F * = F and FF * = I. Other definitions and notations will follow in the places they are used. 3 General graph theory considerations We observe that the adjacency matrix of the Knödel graphs is a (-l)circulant matrix, called also a retrocirculant [1], where all the rows are circular permutations of the first row toward left. For example, the adjacency matrix of W3,23 is: 4 Computing the spectrum of W^^d According to [5], a matrix A is (-1)circulant if and only if A = F* (rA) F, where A = diag (y1,y2, ...,Yn). This relation can be transformed in FAF* = rA. That means that A and rA have the same eigenvalue set [5]. The second term is a PD-matrix, defined as a product of two matrices, P and D, where P is a permutation matrix and D is a diagonal matrix. The characteristic polynomial of a PD-matrix can be computed by decomposing the permutation P in prime cycles of total length n [5]. Since Knödel graphs adjacency matrices are (-V)circulants, the problem resumes now to that of determining the values of 71,72,..., Yn. Since rA has the form: rA = 71 0 00 0 7n (5) V 0 72 ... 0 / we can perform FAF * = [cij ] = rA and identify the terms c11 = 71, c2n = 7n, ..., cn2 = 72. In order to perform the triple matrix multiplication FAF*, we note that: F = F * = 4= w-(i-1)(j-1) mod n (6) Aw 3 W3 2.3 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 0 Since wn = 1 we may skip the modulo operations from the powers. Also, in order to avoid confusion with the summation indices, we emphasize the matrix indices. That is, (3) [a]^ j means that i is the row index and j is the column index, both varying from 1 to n. Some general remarks can be made about the spectra of W, g,n- All eigenvalues are real since the adjacency matrix is real and symmetric [3]. The maximum eigenvalue is An = g, since Wg,n is regular of degree g [2]. All eigenvalues are symmetric with respect to zero [11] since the Knödel graph is bipartite and its characteristic polynomial has the form: P (A)= An + a2An-2 + ... + an-2A'2 + o^ (4) w-(i-1)(fc-1) FAF* = 1 n 1 n i,k [ak,m] k,m w (m-1)(j-1) J2w-(i-1)(k-1) ak,r^ 1 n Lk = 1 ]rw-(i-1)(k-1) a1.m+k-1 k=1 w (m-1)(j-1) w(m-1)(j-1) (7) Since in the first row of the adjacency matrix only d values are nonzero, we can change the variable of summation in n = the first term of (7): k ^ r, where 1 < r < d. Therefore: FAF * = 1 n Y^W-^'-1^^2-m) r=l W (m-l)(j-l) 1 n 1 n 1 n n /d E m=1 \r=1 ^w-(i-1)(2r-m) „(m-1)(j-1) nd EE m=1 r = 1 w-(i-1)(2r-m)W(m-1)(j-1) „-(3-1) WW m=1 J2wm(3+i-2)J2'< -2r(i-1) r=1 i,3 Thus, for the general term of FAF * we obtain: ,-(3-1) J2wm(3+i-2)J2'< -2r (i-1) m=1 r=1 Ip — Cn—p+2,p — / n d \ -(P-1^ E W-2r(n-(P-1)) \ m=1 r = 1 / W Bu^ wmn — n and w-2r(n-(p-1)) — w2r(p-1). Thus, m=1 Yp — w-(P-1^ w2r (P-1) r=1 n (r) — (12 3 ... n/2+1 ... n 1 n n - 1 ... n/2+1 ... 2 S — {71, ^Y2 Yn, ±A/ n: ^V737n—1^ For the first eigenvalue we obtain: d Y1 — 1 — d r=1 Aitken proved in [1] that, for a (-1)circulant, 7n,/2+1 — a1 — a2 + as —... — an, where (a1, a2,..., an) are the values of the first row of adjacency matrix. Thus: 7n/2+1 — E(—1)i+1 ai — E(—1)2+1 — —d (15) i=1 3=1 For the rest of the eigenvalues we have to evaluate products of the form: 7t7n-t+2, 2 < t < n/2. From (11) we have: 7t7n-t+2 — / d -(t-1^ w2r (t-1) w (8) w r=1 \ d\ -(n-t+1^ ^ w2r (n-t+1) \ r=1 / -(t-1) Ew2r (t-1) r=1 (t-1) w2 (n (n-t+1) r=1 (9) J2w2(t-1^ w2(t-1) — r=1 r=1 The general term of the rA matrix from (5) can be expressed as follows: w2r (t-1) — r=1 d (10) r=1 2 w2 (t-1) r=1 (16) This confirms the well-known fact that all eigenvalues are real. Thus, the spectrum of W^ 2d is the set: (11) w2 r=1 2r (t-1) (17) On the other hand, r matrix corresponds to the permutation: This permutation can be decomposed in n/2+1 prime cycles of total length n [5, 1]: (1)(2,n)... (n/2,n/2+ 2) (n/2+1). Thus, the characteristic polynomial will be: P (X) — (A — 71) (A2 — 727n) (A2 — 737n-1 ) ... ... (A2 — 7n/27n/2+2) (A — 7n/2+0 (12) The eigenvalues set will be: S{Wa22^ — {±d}^± where 2 < t < n/2. 5 Observations A. Not all eigenvalues are distinct. We can show that at most (n — 4)/2 of them are distinct. If we decompose the norm from (17) in its trigonometric form we obtain: w2 ( (t-1) r=1 d Ecos2d 2r(t — 1) r=1 2 + d 2 r=1 sin-d 2r(t —1) (18) / ...j ^V7n/27n/2+2, 7n/2+1 } (13) (14) We observe that this norm evaluates to the same value for t — n/4 + 1 — k, and t — n/4 + 1 + k: d ^w2r (n/4+1-fc-1) r=1 2d V4 i, 2d r=1 2^ J2d . 2^ J2d \ --k \4 r=1 / d . /od \ V y^co^ ^^ 2^ ^ + k \r=1 d 2 (n/4+1+fc-1) r=1 + Es \r=1 2^ 2r 2r 2d r+k V r=1 d En cos-2 + d n \r = 1 / / \r=1 / 2 -1 + 1^^cos-2r + G^^sin-2r r=3 ^ / V r=2 2 (d - 2)2 Thus, for Wd2d, the spectrum from (17) can be written as: Sw,,, = {±d, ±(d - 2)} U E r=1 w (t-1) Sw,n = {±3} ul ± w2 (t-1) r=1 Sc2k = {±k} ^ ± ,j2(t-1) + w4(t-1) w2(t-1) + w4(t-1) / ^cos^J2(t - 1) + cos|f4(t - 1)^ 2 + sin^JJ 2(t - 1)+sin-J 4 (t - 1) 2n cos I (t - 1)' Thus, we meet the well-known result [2] that the spectrum of a cycle of length n is the set: (19) Scn ^ 2cos 2nj 1 < j < n (25) The computations for particular cases yield to the claim that these are the only overlapping eigenvalues. B. To our knowledge, there is no closed form for the sum from (16). Nevertheless, computations for particular cases suggest that, only for the particular value t = 2d/4 +1, the sum is evaluated to a closed form: 2 d 6 The number of spanning trees An immediate consequence of the spectra of the Knödel graphs Wg,n is an O [ng^) formula for the number of spanning trees. It is well known that, given a graph G on n vertices and degree k, the number of spanning trees can be expressed as: 1 P-1 (G) = -n(k - , (26) t=1 (20) where \t are the eigenvalues, mt their multiplicities, and p the number of distinct eigenvalues [2]. Thus, for the particular case in which the degree is d and the number of vertices is 2d, using (21) we obtain: « (Wd,2d ) = d(2d - 2) 2d-2 1-2 / d 2 2 d2- y^w2-(t-1) (27) =2 \ r=1 ) (21) where 2 < t < n/4 and the last set has multiplicity two. C. Note that in the formulas (7)- (16) we didn't make any assumptions regarding the number of vertices n nor the degree d. Therefore, the result from (17) can be extended in a similar manner for Knödel graphs with arbitrary degree g and number of vertices n, Wg^: If we further decompose the norm from (27) in its trigonometric form, we obtain: w2-(t-1) 2 (22) where 2 < t < n/2. For example, for W222k, which are cycles C2k of length 2J, applying (22), we obtain the spectrum: E r=1 'Ecos^d 2r (t-1^^sin0d 2r (t-1) r=1 r=1 d d 2 d ^ E cos^d - (t - 1) i=1 j=i+1 (28) (23) Substituting this result in (27) and changing the variable t ^ t +1 we obtain for the number of spanning trees of Wd,2d : where 2 < t < 2J 1. The norm from (23) can be evaluated to 2cos2n(t - 1)/2J as follows: = 2d(d - 1) 2d-2 2d-^ 1 n (d2 - d - ^(t))2 (29) t=1 where: ^(t)= EE cos (2^ - 2j ) t (30) i=1j=i+1 (24) In general, for Knödel graphs having arbitrary degree g and arbitrary number of vertices n, Wg,n, according to « 2 4 (22), the number of spanning trees can be expressed as follows: K (W9,n) = 2gU n/2 / g 2 n 2 g- t=2 \ r=1 / (31) A straightforward upper bound for the number of spanning trees of Knödel graphs Wg,n can be obtained cancelling the norm from (31): E»2' (t-i) r = 1 =0 (32) Therefore, for k (Wg,n) we obtain the upper bound: K (Wg,n) < 2gn-1 (33) [9] H. A. Harutyunyan and C. D. Morosan. On the minimum path problem in Knödel graphs. In Proceedings of the second international network optimization conference INOC2005, Lisbon, Portugal, pp. 43-48, 2005. [10] W. Knödel. New gossips and telephones. Discrete Mathematics, 13:95, 1975. [11] H. Sachs. Beziehungen zwischen den in einem graphen enthalten kreisen und seinem charakteristischen polynom. Publ. Math. Debrecen, 11:119-137, 1964. [12] K. Wang. On the generalisations of circulants. Linear algebra and its applications, 25:197-218, 1979. Since, for Knödel graphs Wg^, the degree of a vertex g is upper bounded by [log2 n\ (see (1)), the bound from (33) can be expressed as follows: K (Wg,n) < References 2 [log2 n n1 n (34) [1] A. C. Aitken. Two notes on matrices. In Proc. Glasgow Math. Ass., pages 109-113, Glasgow Math. Ass., 1961-1962. [2] N. L. Biggs. Algebraic Graph Theory. Cambridge University Press, 1974. [3] F. Chatelin and M. Ahués. Eigenvalues of Matrices. Chichester, Wiley, New York, 1993. [4] D. M. Cvetkovic, M. Doob, and H. Sachs. Spectra of Graphs. Academic Press, New York, 1980. [5] P. J. Davis. Circulant Matrices. Chichester, Wiley, New York, 1979. [6] G. Fertin and A. Raspaud. A survey on Knödel graphs. Discrete Applied Mathematics, 137(2):173-195, 2004. [7] G. Fertin, A. Raspaud, O. Sykora, H. Schröder, and I. Vrto. Diameter of Knödel graph. In 26th International Workshop on Graph-Theoretic Concepts in Computer Science (WG 2000) in Lecture Notes in Computer Science, volume 1928, pages 149-160. Springer-Verlag, 2000. [8] C. Godsil and G. Royle. Algebraic Graph Theory. Springer-Verlag, New York, Berlin, Heidelberg, 2001. 2 On the Crossing Number of Almost Planar Graphs Bojan Mohar Faculty of Mathematics and Physics Department of Mathematics University of Ljubljana Jadranska 19 1000 Ljubljana, Slovenia E-mail: bojan.mohar@uni-lj.si Keywords: planar graph, crossing number Received: May 8, 2005 If G is a plane graph and x,y e V (G), then the dual distance of x and y is equal to the minimum number of crossings of G with a closed curve in the plane joining x and y. Riskin [7] proved that if Go is a 3-connected cubic planar graph, and x, y are its vertices at dual distance d, then the crossing number of the graph Go + xy is equal to d. Riskin asked if his result holds for arbitrary 3-connected planar graphs. In this paper it is proved that this is not the case (not even for every 5-connected planar graph Go). Povzetek: Analizirana je Riskinova teza o planarnih graßh. 1 Introduction Crossing number minimization is one of the fundamental optimization problems in the sense that it is related to various other widely used notions. Besides its mathematical interest, there are numerous applications, most notably those in VLSI design [1,2, 3] and in combinatorial geometry [9]. We refer to [4, 8] and to [10] for more details about such applications. A drawing of a graph G is a representation of G in the Euclidean plane R2 where vertices are represented as distinct points and edges by simple polygonal arcs joining points that correspond to their endvertices. A drawing is clean if the interior of every arc representing an edge contains no points representing the vertices of G. If interiors of two arcs intersect or if an arc contains a vertex of G in its interior we speak about crossings of the drawing. More precisely, a crossing of V is a pair ({e, f },p), where e and f are distinct edges and p e R2 is a point that belongs to interiors of both arcs representing e and f in V. If the drawing is not clean, then the arc of an edge e may contain in its interior a point p e R2 that represents a vertex v of G. In such a case, the pair ({v, e}, p) is also referred to as a crossing of V. The number of crossings of V is denoted by cr (V) and is called the crossing number of the drawing V. The crossing number cr(G) of a graph G is the minimum cr ( V) taken over all clean drawings V of G. A clean drawing V with cr(V) =0 is also called an embedding of G. By a plane graph we refer to a planar graph together with an embedding in the Euclidean plane. We shall identify a plane graph with its image in the plane. A nonplanar graph G is almost planar if it contains an edge e such that G - e is planar. Such an edge e is called a planarizing edge. It is easy to see that almost planar graphs can have arbitrarily large crossing number. In the sequel, we will consider almost planar graphs with a fixed planariz-ing edge e = xy, and will denote by Go = G - e the corresponding planar subgraph. By a plane graph we mean a planar graph together with its embedding in the plane. Let G0 be a plane graph and let x,y be two of its vertices. A simple (polygonal) arc 7 : [0,1] ^ R2 is an (x, y)-arc if 7(0) = x and 7(1) = y. If Y(t) is not a vertex of G0 for every t, 0 k. Figure 2: The graph Qk 6-gon aa'bb'ccC. Let us take 5 copies of the graph Gk and let ai,a'i,bi,b'i,ci, be copies of the corresponding vertices on the outer face of the ith copy of Gk, i = 1,.5. Let Qk be the planar graph obtained from these copies by cyclically identifying bi with ai+i, adding edges bici^^ (i = 1, . . . , 5, indices modulo 5), and adding two vertices x and y such that x is joined to a1,... ,a5 and y is joined to c1,...,c5. See Figure 2. The obtained graph Qk is planar and it is not difficult to verify that it is 5-connected. It is easy to see that d*(x,y) = k + 2 in Qk .By putting the vertex x close to y, so that we can draw the edge xy without introducing crossings with other edges, and then redrawing the edges from x to its neighbors as shown in Figure 2, a drawing of Qk + xy is obtained whose crossing number is 11. q Figure 1: Part of the triangular lattice with side length 8 Proof. Let Hk be the planar graph that is obtained from the icosahedron by replacing all of its triangles, except one, with the dissection of the equilateral triangle with side of length k into equilateral triangles with sides of unit length (as shown in Figure 1 for k = 8). This graph is a near triangulation, all its faces are triangles, except one, whose length is 3k. We may assume that this is the outer face in a plane embedding of H k. Its boundary is composed of three paths A, B, C of length k joining the original vertices a',b', c of the icosahedron we started with. Now we add three new vertices, a, b, c and join a with all vertices on A, b with B, and c with C. This gives rise to a 5-connected near triangulation Gk whose outer face is the "--ifish' Figure 3: A planar graph for which two flips are needed The construction of Theorem 2.1 can be generalized such that a similar redrawing as made above for x is necessary also for y (in order to bring these two vertices close together). Such an example is shown in Figure 3, where x and y are vertices in the centers of the small circular grids on the picture, and where the bold lines represent a "thick" barrier similar to the one used in the graph Qk in Figure 2. In Figure 4, an optimum drawing of G0 + xy is shown, where the edge xy is represented by the broken line. In this drawing, neighborhoods of x and y, are redrawn inside the faces denoted by A and B (respectively) in Figure 3. At the first sight the redrawing described in the above example seems like the worst possibility which may happen - to "flip" a part of the graph containing x and to "flip" a part containing y. If this would be the only possibility of making the crossing number smaller than the one coming from the planar drawing of Go, this would most likely give rise to a polynomial time algorithm for computing the crossing number of graphs that are just one edge away from a 3-connected planar graph. [3] F. T. Leighton, New lower bound techniques for VLSI, Math. Systems Theory 17 (1984) 47-70. [4] A. Liebers, Planarizing graphs—a survey and annotated bibliography, J. Graph Algorithms Appl. 5 (2001) 74 pp. [5] B. Mohar, Crossing number of almost planar graphs, preprint, 2005. [6] B. Mohar and C. Thomassen, Graphs on Surfaces, Johns Hopkins University Press, Baltimore, 2001. [7] A. Riskin, The crossing number of a cubic plane polyhedral map plus an edge, Studia Sci. Math. Hungar. 31 (1996) 405-413. [8] F. Shahrokhi, O. Sykora, L.A. Székely, I. Vrt'o, Crossing numbers: bounds and applications. Intuitive geometry (Budapest, 1995), 179-206, J. Bolyai Math. Soc., Budapest, 1997. [9] L.A. Székely, A successful concept for measuring non-planarity of graphs: the crossing number, Discrete Math. 276 (2004) 331-352. [10] I. Vrt'o, Crossing number of graphs: A bibliography. ftp://ftp.ifi.savba.sk/ pub/imrich/crobib.pdf Figure 4: An optimum drawing of G0 + xy Unfortunately, some more complicated examples show that there are other ways for shortcutting the dual distance from x to y. (Such an example was produced in a discussion with Thomas Böhme and Neil Robertson whose help is greatly acknowledged.) Despite such examples, the following question may still have a positive answer: Problem 2.2. Is there a polynomial time algorithm which would determine the crossing number of Go + xy if Go is planar. Acknowledgement Supported in part by the Ministry of Higher Education, Science and Technology of Slovenia, Research Program P1-0507-0101 and Research Project L1-5014-0101. References [1] S.N. Bhatt, F.T. Leighton, A framework for solving VLSI graph layout problems, J. Comput. System Sci. 28 (1984) 300-343. [2] F. T. Leighton, Complexity Issues in VLSI, MIT Press, Cambridge, Mass., 1983. Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform Hui Zhang and Tu Bao Ho School of Knowledge Science, Japan Advanced Institute of Science and Technology, Asahidai, Nomi, Ishikawa 923-1292. Japan E-mail: {zhang-h,bao}@jaist.ac.jp Yang Zhang Department of Avionics, Chengdu Aircraft Design and Research Institute, No. 89 Wuhouci street, Chendu, Sichuan 610041. P.R. China E-mail: v104@sohu.com Mao-Song Lin School of Computer Science, Southwest University of Science and Technology, Mianyang, Sichuan 621002. P.R. China E-mail: lms@swust.edu.cn Keywords: time series, data mining, feature extraction, clustering, wavelet Received: September 4, 2005 Time series clustering has attracted increasing interest in the last decade, particularly for long time series such as those arising in the bioinformatics and financial domains. The widely known curse of dimensionality problem indicates that high dimensionality not only slows the clustering process, but also degrades it. Many feature extraction techniques have been proposed to attack this problem and have shown that the performance and speed of the mining algorithm can be improved at several feature dimensions. However, how to choose the appropriate dimension is a challenging task especially for clustering problem in the absence of data labels that has not been well studied in the literature. In this paper we propose an unsupervised feature extraction algorithm using orthogonal wavelet transform for automatically choosing the dimensionality of features. The feature extraction algorithm selects the feature dimensionality by leveraging two conflicting requirements, i.e., lower dimensionality and lower sum of squared errors between the features and the original time series. The proposed feature extraction algorithm is efficient with time complexity O{mn) when using Haar wavelet. Encouraging experimental results are obtained on several synthetic and real-world time series datasets. Povzetek: (Jlanek analizira pomembnost atributov pri grupiranju časovnih vrst. 1 Introduction far away. But in high dimensional spaces the contrast between the nearest and the farthest neighbor gets increas- Time series data are widely existed in various domains, ingly smaller, making it difficult to find meaningful groups such as financial, gene expression, medical and science. [6]. Thus high dimensionality normally decreases the per- Recently there has been an increasing interest in mining formance of clustering algorithms. this sort of data. Clustering is one of the most frequently used data mining techniques, which is an unsupervised Data Dimensionality Reduction aims at mapping high- learning process for partitioning a dataset into sub-groups dimensional patterns onto lower-dimensional patterns. so that the instances within a group are similar to each other Techniques for dimensionality reduction can be classified and are very dissimilar to the instances of other groups. into two groups: feature extraction and feature selection Time series clustering has been successfully applied to var- [34]. Feature selection is a process that selects a subset ious domains such as stock market value analysis and gene of original attributes. Feature extraction techniques extract function prediction [17, 22]. When handling long time se- a set of new features from the original attributes through ries, the time required to perform the clustering algorithm some functional mapping [43]. The attributes that are im- becomes expensive. Moreover, the curse of dimensional- portant to maintain the concepts in the original data are se- ity, which affects any problem in high dimensions, causes lected from the entire attribute sets. For time series data, highly biased estimates [5]. Clustering algorithms depend the extracted features can be ordered in importance by us-on a meaningful distance measure to group data that are ing a suitable mapping function. Thus feature extraction is close to each other and separate them from others that are much popular than feature selection in time series mining community. Many feature extraction algorithms have been proposed for time series mining, such as Singular Value Decomposition (SVD), Discrete Fourier Transform (DFT), and Discrete Wavelet Transform (DWT). Among the proposed feature extraction techniques, SVD is the most effective algorithm with minimal reconstruction error. The entire time-series dataset is transformed into an orthogonal feature space in that each variable are orthogonal to each other. The time-series dataset can be approximated by a low-rank approximation matrix by discarding the variables with lower energy. Korn et al. have successfully applied SVD for time-series indexing [31]. It is well known that SVD is time-consuming in computation with time complexity O(mn2), where m is the number of time series in a dataset and n is the length of each time series in the dataset. DWT and DFT are powerful signal processing techniques, and both of them have fast computational algorithms. DFT maps the time series data from the time domain to the frequency domain, and there exists a fast algorithm called Fast Fourier Transform (FFT) that can compute the DFT coefficients in O(mnlogn) time. DFT has been widely used in time series indexing [4, 37, 42]. Unlike DFT, which takes the original time series from the time domain and transforms it into the frequency domain, DWT transforms the time series from time domain into time-frequency domain. Since the wavelet transform has the property of time-frequency localization of the time series, it means most of the energy of the time series can be represented by only a few wavelet coefficients. Moreover, if we use a special type of wavelet called Haar wavelet, we can achieve O( mn) time complexity that is much efficient than DFT. Chan and Fu used the Haar wavelet for time-series classification, and showed performance improvement over DFT [9]. Popivanov and Miller proposed an algorithm using the Daubechies wavelet for time series classification [36]. Many other time series dimensionality reduction techniques also have been proposed in recent years, such as Piecewise Linear Representation [28], Piecewise Aggregate Approximation [25, 45], Regression Tree [18], Symbolic Representation [32]. These feature extraction algorithms keep the features with lower reconstruction error, the feature dimensionality is decided by the user given approximation error. All the proposed algorithms work well for time series with some dimensions because the high correlation among time series data makes it possible to remove huge amount of redundant information. Moreover, since time series data are normally embedded by noise, one byproduct of dimensionality reduction is noise shrinkage, which can improve the mining quality. However, how to choose the appropriate dimension of the features is a challenging problem. When using feature extraction for classification with labeled data, this problem can be circumvented by the wrapper approach. The wrapper approach uses the accuracy of the classification algorithm as the evaluation criterion. It searches for features better suited to the classification algorithm aiming to improve classification accuracy [30]. For clustering algorithms with unlabeled data, determining the feature dimensionality becomes more difficult. To our knowledge, automatically determining the appropriate feature dimensionality has not been well studied in the literature, most of the proposed feature extraction algorithms need the users to decide the dimensionality or give the approximation error. Zhang et al. [46] proposed an algorithm to automatically extract features from wavelet coefficients using entropy. Nevertheless, the length of the extracted features is the same with the length of the original time series that can't take the advantage of dimensionality reduction. Lin et al. [33] proposed an iterative clustering algorithm exploring the multi-scale property of wavelets. The clustering centers at each approximation level are initialized by using the final centers returned from the coarser representation. The algorithm can be stopped at any level but the stopping level should be decided by the user. There are several feature selection techniques for clustering have been proposed [12, 15, 41]. However, these techniques just order the features in the absence of data labels, the appropriate dimensionality of features still need to be given by the user. In this paper we propose a time-series feature extraction algorithm using orthogonal wavelet for automatically choosing feature dimensionality for clustering. The problem of determining the feature dimensionality is circumvented by choosing the appropriate scale of the wavelet transform. An ideal feature extraction technique has the ability to efficiently reduce the data into a lower-dimensional model, while preserving the properties of the original data. In practice, however, information is lost as the dimensionality is reduced. It is therefore desirable to formulate a method that reduces the dimensionality efficiently, while preserving as much information from the original data as possible. The proposed feature extraction algorithm leverages the lower dimensionality and lower errors by selecting the scale within which the detail coefficients have lower energy than that within the nearest lower scale. The proposed feature extraction algorithm is efficient that can achieve time complexity O( mn) with Haar wavelet. The rest of this paper is organized as follows. Section 2 gives the basis for supporting our feature extraction algorithm. The feature extraction algorithm and its time complexity analysis are introduced in Section 3. Section 4 contains a comprehensive experimental evaluation of the proposed algorithm. We conclude the paper by summarizing the main contributions in Section 5. 2 The basis of the wavelet-based feature extraction algorithm Section 2.1 briefly introduces the basic concepts of wavelet transform. The properties of wavelet transform supporting our feature extraction algorithm are given in Section 2.2. Section 2.3 presents the Haar wavelet transform algorithm used in our experiments. 2.1 Orthogonal Wavelet Transform Background Wavelet transform is a domain transform technique for hierarchically decomposing sequences. It allows a sequence to be described in terms of an approximation of the original sequence, plus a set of details that range from coarse to fine. The property of wavelets is that the broad trend of the input sequence is preserved in the approximation part, while the localized changes are kept in the detail parts. No information will be gained or lost during the decomposition process. The original signal can be fully reconstructed from the approximation part and the detail parts. The detailed description of wavelet transform can be found in [13, 10]. The wavelet is a smooth and quickly vanishing oscillating function with good localization in both frequency and time. A wavelet family is the set of functions generated by dilations and translations of a unique mother wavelet = t - k), J, k e Z A function ^ e L2 (R) is an orthogonal wavelet if the family is an orthogonal basis of L2(R), that is < , >= Sj,l ■ Sk,m, J, k,l,m e Z where < > is the inner product of and and ói,j is the Kronecker delta defined by = 0, for i = J 1, for i = j Any function f (t) e L2(R) can be represented in terms of this orthogonal basis as f (t) = Y. cj,k ^j,k (t) (1) j,k part and the details is determined by the resolution, that is, by the scale below which the details of a signal cannot be discerned. At a given resolution, a signal is approximated by ignoring all fluctuations below that scale. We can progressively increase the resolution; at each stage of the increase in resolution finer details are added to the coarser description, providing a successively better approximation to the signal. A MBA of L2(R) is a chain of subspace {Vj : J e Z} satisfying the following conditions [35]: (i) ... C V_2 C V_1 C Vo C Vi... C L2(R) (ii) lljez Vj = {0}^jez Vj = L2(R) (iii) f (t) e Vj ^^ f (2t) e Vj+i ; v j e Z (iv) 3^(t), called scaling function, such that {^(t - k) : k e Z} is an orthogonal basis of Vo. Thus ^j,k(t) = 2j/2^(2jt - k) is the orthogonal basis of Vj. Consider the space i, which is an orthogonal complement of Vj-i in Vj : Vj = Vj-^^ Wj-i. By defining the ^jk form the orthogonal basis of Wj, the basis {^jk ; j e Z, k e Z} spans the space V j = Vj (3) and the cjkk =< (t),f (t) > are called the wavelet coefficients of f (t). Parsevel's theorem states that the energy is preserved under the orthogonal wavelet transform, that is, E I < f(t),^j,k > l2 = ^f(t)\\2, f (t) e L2(R) (2) j,kez (Chui 1992, p. 226 [10]). If f (t) be the Euclidean distance function, Parsevel's theorem also indicates that f (t) will not change by the orthogonal wavelet transform. The distance preserved property makes sure no false dismissal will occur with distance based learning algorithms [29]. To efficiently calculate the wavelet transform for signal processing, Mallat introduced the Multiresolution Analysis (MRA) and designed a family of fast algorithms based on it [35]. The advantage of MRA is that a signal can be viewed as composed of a smooth background and fluctuations or details on top of it. The distinction between the smooth Notice that because Wj — i is orthogonal to Vj-i, the ^ is orthogonal to For a given signal f (t) e L2(R) one can find a scale J such that fj e Vj approximates f (t) up to predefined precision. If dj-i e Wj-i, fj-i e Vj-i, then fj is decomposed into {fj-i, dj-i}, where fj-i is the approximation part of fj in the scale J - 1 and dj-i is the detail part of fj in the scale J - 1. The wavelet decomposition can be repeated up to scale 0. Thus fj can be represented as a series {f0, d0, di,..., dj-i} in scale 0. 2.2 The Properties of Orthogonal Wavelets for Supporting the Feature Extraction Algorithm Assume a time series X (X e R") is located in the scale J. After decomposing X at a specific scale J (J e [0,1,...,J - 1]), the coefficients Hj (J() corresponding to the scale J can be represented by a series {Aj,Dj,...,Dj-i}. The Aj are called approximation coefficients which are the projection of X in Vj and the D j,..., Dj-i are the wavelet coefficients in Wj,..., WJ-i representing the detail information of X. From a single processing point of view, the approximation coefficients within lower scales correspond to the lower frequency part of the signal. As noise often exists in the high frequency part of the signal, the first few coefficients of H J (X ), corresponding to the low frequency part of the signal, can be viewed as a noise-reduced signal. Thus keeping these coefficients will not lose much information from the original time series X. Hence normally the first k coefficients of Ho(X ) are chosen as the features [36, 9]. We keep all the approximation coefficients within a specific scale j as the features which are the projection of X in Vj. Note that the features retain the entire information of X at a particular level of granularity. The task of choosing the first few wavelet coefficients is circumvented by choosing a particular scale. The candidate selection of feature dimensions is reduced from [1, 2,..., n] to [20, 2^,..., 2J-1]. Definiton 2.1. Given a time series X e R", the features are the Haar wavelet approximation coefficients Aj decomposed from X within a specific scale j, j e [0,1,..., J -1]. The extracted features should be similar to the original data. A measurement for evaluating the similarity/dissimilarity between the features and the data is necessary. We use the widely used sum of squared errors (square of Euclidean distance) as the dissimilarity measure between a time-series and its approximation. Definiton 2.2. Given a time-series Ji e R", let Ji e R" denote any approximation of X, the sum of squared errors (SSE) between XC and XC is defined as SSE('X, JX- Xi)^ (4) i=i E(X )^(xi)' i=i (5) Definiton 2.4. Given a time-series X e R" and its features A j e R™, the energy difference (ED) between X and Aj is " m ED(J(,-Cj)= E(U)-E(-Cj)^(xi)' ^(aj)2 (6) The X can be reconstructed by padding zeros to the end of Aj to make sure the length of padded series is the same as that of X and preforming the reconstruction algorithm with the padded series. The reconstruction algorithm is the reverse process of decomposition [35]. An example of Jk, and the Ji reconstructed from Ä^ using Haar wavelet transform for a time series located in scale 7 is shown in Figure 1. From Eq. 2 we know the wavelet transform is energy preserved, thus the energy of approximation coefficients within the scale j is equal to that of their reconstructed approximation series, i.e., E(X) = E(Aj). As mentioned in Section 2.1, Vj = Vj0 Wj-i, we have E(1x) = E(Äjk^1) + E(Dj_1). When decomposing the X to a scale j, from Eq. 3, we have E(X) = E(^i ) + Y.J=-1 E (D j ). Therefore, the energy difference between the Aj and X is the sum of the energy of wavelet coefficients located in the scale j and scales higher than j, i.e., ED('Xt,-ij) = E(DdI). The Hj(X) is {Aj,Dj,...,Dj-i} and Hj(X) is {Aj, 0,..., 0}. Since Euclidean distance also preserved with orthogonal wavelet transform, we have SSE (X,X ) = SSE (Hj (X ),Hj (X )) = j: ',=-1 E (Di). Therefore, the energy difference between X and X is equal to that between Aj and X, that is SSE(X,X )= ED(X,Aj ) (7) Since the length of the features corresponding to scale j is smaller than the length of X, we can't calculate the SSE between X and Aj by Eq. (4) directly. One choice is to reconstruct a sequence Ji e R" from -Lj then calculate the SSE between X and X. For instance, Kaewpijit et al. [23] used the correlation function of X and X to measure the similarity between X and A j. Actually, SSE (X, X ) is the same as energy difference between Aj and X with orthogonal wavelet transform. This property makes it possible to design an efficient algorithm without reconstructing ^. Definiton 2.3. Given a time-series X e R", the energy of ^J is: " ^2 i=i i=i Figure 1: An example of approximation coefficients and their reconstructed approximation series 2.3 Haar Wavelet Transform We use the Haar wavelet in our experiments which has the fastest transform algorithm and is the most popularly used orthogonal wavelet proposed by Haar. Note that the properties mentioned in Section 2.2 are hold for all orthogonal wavelets such as the Daubechies wavelet family. The concrete mathematical foundation of the Haar wavelet can be found in [7]. The length of an input time series is restricted to an integer power of 2 in the process of wavelet decomposition. The series will be extended to an integer power of 2 by padding zeros to the end of the time series if the length of input time series doesn't satisfy this requirement. The Haar wavelet has the mother function 1, if 0 Em E(D^). The scale j* is defined as the appropriate scale and the features corresponding to the scale j* are kept as the appropriate features. Note that by this process, at least Dj-1 will be removed, and the length of DJ-1 is n/2 for Haar wavelet. Hence the dimensionality of the features will smaller than or equal to n/2. The proposed feature extraction algorithm is summarized in pseudo-code format in Algorithm 1. 3.2 Time Complexity Analysis The time complexity of Haar wavelet decomposition for a time-series is 2(n " 1) bound by O(^n) [8]. Thus for a time- Algorithm 1 The feature extraction algorithm Input: a set of time-series {Xi,X2,..., Xm} for i=1 to m do calculate Aj-i and Dj-i for Xi end for calculate m E(D1I) exitFlag = true for j=J-2 to 0 do for i=1 to m do calculate Aj and Dj for Xi end for calculate E (Dj ) i^m E(Dj) ^m E(Dj+i) then keep all the Aj+i as the appropriate features for each time-series exitFlag = false break end if end for if exitFlag then keep all the Ao as the appropriate features for each time-series end if series dataset having m time-series, the time complexity of decomposition is m * 2(n - 1). Note that the feature extraction algorithm can break the loop before achieving the lowest scale. We just analyze the extreme case of the algorithm with highest time complexity (the appropriate scale j = 0). When j = 0, the algorithm consists of the following sub-algorithms: - Decompose each time-series in the dataset until the lowest scale with time complexity m * (2n — 1); - Calculate the energy of wavelet coefficients with time complexity m * ( n - 1) ; - Compare th^ m E (D j ) of different scales with time complexity log2(n). The time complexity of the algorithm is the sum of the time complexity of the above sub-algorithms bounded by O(mn). 4 Experimental evaluation We use subjective observation and five objective criteria on nine datasets to evaluate the clustering quality of the K-means and hierarchical clustering algorithm [21]. The effectiveness of the feature extraction algorithm is evaluated by comparing the clustering quality of extracted features to the clustering quality of the original data. We also compared the clustering quality of the extracted appropriate features with that of the features located in the scale prior to the appropriate scale (prior scale) and the scale posterior to the appropriate scale (posterior scale). The efficiency of the proposed feature extraction algorithm is validated by comparing the execution time of the chain process that performs feature extraction firstly then executes clustering with the extracted features to that of clustering with original datasets directly. 4.1 Clustering Quality Evaluation Criteria Evaluating clustering systems is not a trivial task because clustering is an unsupervised learning process in the absence of the information of the actual partitions. We used classified datasets and compared how good the clustered results fit with the data labels which is the most popular clustering evaluation method [20]. Five objective clustering evaluation criteria were used in our experiments: Jac-card, Rand and FM [20], CSM used for evaluating time series clustering algorithms [44, 24, 33], and NMI used recently for validating clustering results [40, 16]. Consider G = G1,G2,.. ., Gm as the clusters from a supervised dataset, and A = A1,A2,..., Am as that obtained by a clustering algorithm under evaluations. Denote D as a dataset of original time series or features. For all the pairs of series (Di, D j ) in D, we count the following quantities: - a is the number of pairs, each belongs to one cluster in G and are clustered together in A. - b is the number of pairs that are belong to one cluster in G, but are not clustered together in A. - c is the number of pairs that are clustered together in A, but are not belong to one cluster in G. - d is the number of pairs, each neither clustered together in A, nor belongs to the same cluster in G. The used clustering evaluation criteria are defined as below: 1. Jaccard Score (Jaccard): n Jaccard = a + b + c 2. Rand statistic (Rand): Rand = a + d a + b + c + d 3. Folkes and Mallow index (FM): FM = a + b a + c 4. Cluster Similarity Measure (CSM) : The cluster similarity measure is defined as: 1 M CSM(G,A) = ^y^ max Sim(Gi,Aj) ^ ' ' M^ ib 11, if a < t < b n and e(t) are drawn from a standard normal distribution N(0,1), a is an integer drawn uniformly from the range [16,32] and b - a is an integer drawn uniformly from the range [32,96]. The UCR Archive provides the source code for generating the samples. We generated 128 samples for each class with length 128. - Control Chart Time Series (CC): This dataset has 100 instances for each of the six different classes of control charts. - Trace dataset (Trace): The 4-class dataset contains 200 instances, 50 for each class. The dimensionality of the data is 275. - Gun Point dataset (Gun): The dataset has two classes, each contains 100 instances. The dimensionality of the data is 150. - Reality dataset (Reality): The dataset consists of data from Space Shuttle telemetry, Exchange Rates and artificial sequences. The data is normalized so that the minimum value is zero and the maximum is one. Each cluster contains one time series with 1000 datapoints. - ECG dataset (ECG): The ECG dataset was obtained from the ECG database at PhysioNet [19]. We used 3 groups of those ECG time-series in our experiments: Group 1 includs 22 time series representing the 2 sec ECG recordings of people having malignant ventricular arrhythmia; Group 2 consists 13 time series that are 2 sec ECG recordings of healthy people representing the normal sinus rhythm of the heart; Group 3 includes 35 time series representing the 2 sec ECG recordings of people having supraventricular arrhythmia. - Personal income dataset (Income): The personal income dataset [1] is a collection of time series representing the per capita personal income from 19291999 in 25 states of the USA1. The 25 states were partitioned into two groups based on their growing rate: group 1 includes the east coast states, CA and IL in which the personal income grows at a high rate; the mid-west states form a group in which the personal income grows at a low rate is called group 2. - Temperature dataset (Temp): This dataset is obtained from the National Climatic Data Center [2]. It is a collection of 30 time series of the daily temperature in year 2000 in various places in Florida, Tennessee and Cuba. It has temperature recordings from 10 places in Tennessee, 5 places in Northern Florida, 9 places in Southern Florida and 6 places in Cuba. The dataset is grouped basing on geographically distance and similar temperature trend of the places. Tennessee and Northern Florida form group 1. Cuba and South Florida form group 2. - Population dataset (Popu): The population dataset is a collection of time series representing the population estimates from 1900-1999 in 20 states of USA [3]. The 20 states are partitioned into two groups based on their trends: group 1 consists of CA, CO, FL, GA, MD, NC, SC, TN, TX, VA, and WA having the exponentially increasing trend while group 2 consists of IL, MA, MI, NJ, NY, OK, PA, ND, and SD having a stabilizing trend. 1The 25 states included were: CT, DC, DE, FL, MA, ME, MD, NC, NJ, NY, PA, RI, VA, VT, WV, CA, IL, ID, IA, IN, KS, ND, NE, OK, SD. As Gavrilov et al. [17] did experiments showing that normalization is suitable for time series clustering, each time series in the datasets downloaded from the Internet (ECG, Income, Temp, and Popu) are normalized by formula xi = (xi - jj)/a, i e [1, 2,... ,n]. 4.3 The Clustering Performance Evaluation We took the widely used Euclidean distance for K-means and hierarchical clustering algorithm. As the Reality dataset only has one time series in each cluster that is not suitable for K-means algorithm, it was only used for hierarchical clustering. Since the clustering results of K-means depend on the initial clustering centers that should be randomly initialized in each run, we run K-means 100 times with random initialized centers for each experiment. Section 4.3.1 gives the energy ratio of wavelet coefficients within various scales and the calculated appropriate scale for each used dataset. The evaluation of K-means clustering algorithm with the proposed feature extraction algorithm is given in Section 4.3.2. Section 4.3.3 describes the comparative evaluation of hierarchical clustering with the feature extraction algorithm. 4.3.1 The Energy Ratio and Appropriate Scale Table 1 provides the energy ratio Em E(D^j)/Em EJ-1 E(D^j) (in proportion to the energy) of wavelet coefficients within various scales for all the used datasets. The calculated appropriate scales for the nine datasets using Algorithm 1 are shown in Table 2. The algorithm stops after the first iteration (scale = 1) for most of the datasets (Trace, Gun, Reality, ECG, Popu, and Temp), and stops after the second iteration (scale = 2) for CBF and CC datasets. The algorithm stops after the third iteration (scale = 3) only for Income dataset. If the sampling frequency for the time series is f, wavelet coefficients within scale J correspond to the information with frequency f/2J. Table 2 shows that most used time series datasets have important frequency components beginning from f/2 or f/4. 4.3.2 K-means Clustering with and without Feature Extraction The average execution time of the chain process that first executes feature extraction algorithm then performs K-means with the extracted features (termed by FE + K-means) with 100 runs and that of performing K-means directly on the original data (termed by K-means) with 100 runs are illustrated in Figure 2. The chain process executes faster than K-means with original data for the used eight datasets. Table 3 describes the mean of the evaluation criteria values of 100 runs for K-means with original data. Table 4 gives the mean of the evaluation criteria values of 100 runs for K-means with extracted features. K-means FE + K-menas ECG Income ILl Popu In In ICL 6 7 Figure 2: The average execution time (s) of the K-means + FE and K-means algorithms for eight datasets To compare the difference between the mean obtained from 100 runs of K-means with extracted features and that obtained from 100 runs of K-means with corresponding original data, two-sample Z-test or two-sample t-test can be used. We prefer two-sample t-test because it is robust with respect to violation of the assumption of equal population variances, provided that the number of samples are equal [39]. We use two-sample t-test with the following hypothesis: Ho : ji = j2 Hi : j i = j 2 where j i is the mean of the evaluation criteria values corresponding to original datasets and j 2 is that corresponding to extracted features. The significance level is set as 0.05. When the null hypothesis (H0) is rejected, we conclude that the data provide strong evidence that j i is different with j2, and which item is bigger can be easily gotten by comparing the corresponding mean values as shown in Table 3 and Table 4. We list the results of t-tests in Table 5 (If the mean of the values of a criterion corresponding to extracted features is significantly bigger than that corresponding to the original data, we set the character as ' >'; if the mean of the values of a criterion corresponding to extract features is significantly smaller than that corresponding to the original data, the character is set as '<'; otherwise we set the character as '='). Table 5 shows that the evaluation criteria values corresponding to extracted features are bigger than that corresponding to the original data eleven times, smaller than that corresponding to the original data five times, and equal to that corresponding to the original data twenty four times for eight datasets. Based on the above analysis, we can conclude that the quality of K-means algorithm with extracted features is better than that with original data av-eragely for the used datasets. Table 6 gives the mean of the evaluation criteria values of 100 runs of K-means with features in the prior scale. CBF CC Trace Gun Temp 0 2 3 5 8 4 Table 1: The energy ratio (%) of the wavelet coefficients within various scales for all the used datasets scale CBF CC Trace Gun Reality ECG Income Popu Temp 1 8.66 6.34 0.54 0.18 0.03 18.22 5.07 0.12 4.51 2 6.00 5.65 1.31 1.11 0.1 26.70 2.03 0.46 7.72 3 7.67 40.42 2.60 2.20 0.29 19.66 1.49 7.31 5.60 4 11.48 19.73 4.45 7.85 3.13 12.15 26.08 8.10 4.57 5 18.97 9.98 6.75 15.58 3.85 8.97 28.49 13.68 9.92 6 32.25 17.87 14.66 54.81 8.94 7.11 26.39 21.94 4.29 7 15.62 39.66 14.43 21.39 3.55 10.46 48.39 16.60 8 29.56 4.02 20.01 1.80 42.62 9 0.46 19.41 1.83 4.16 10 22.84 Table 2: The appropriate scales of all nine datasets CBF CC Trace Gun Reality ECG Income Popu Temp scale 2 2 1 1 1 1 3 1 1 The difference between the mean of criteria values produced by K-means algorithm with extracted features and that of criteria values generated by the features in the priori scale validated by t-test is described in Table 7. The mean of the criteria values corresponding to the extracted features are twelve times bigger, nineteen times equal, and nine times small than that corresponding to the features located in priori scale. The mean of the evaluation criteria values of 100 runs of K-means with features in the posterior scale are shown in Table 8. Table 9 provides the t-test result of the difference between the clustering criteria values of extracted features and the clustering criteria produced by the features within posterior scale. The mean of the criteria values corresponding to the extracted features are ten times bigger than that corresponding to the features located in the posterior scale, twenty nine times equal to that corresponding to the features located in the posterior scale, and only smaller than the features in the posterior scale one time. Based on the result of hypothesis testing, we can conclude that the quality of K-means algorithm with extracted appropriate features is better than that with features in the prior scale and posterior scale averagely for the used datasets. 4.3.3 Hierarchical Clustering with and without Feature Extraction We used single linkage for the hierarchical clustering algorithm in our experiments. Figure 3 provides the comparison of the execution time of performing hierarchical clustering algorithm with original data (termed by HC) and the chain process of feature extraction plus hierarchical clustering algorithm (termed by HC + FE). For clearly observing the difference between the execution time of HC and HC + FE, the execution time is also given in Table 10. The chain process executes faster than hierarchical clustering with origi- nal data for all nine datasets. H HC I I HC+FE Trace Gun ECG Income Popu Temp Reality ■n 1 2 3 5 6 7 8 9 Figure 3: The execution time (s) of the HC + FE and HC Evaluating the quality of the hierarchial clustering algorithm can be divided into subject way and objective way. Dendrograms are good for subjectively evaluating hierarchical clustering algorithm with time series data [27]. As only Reality dataset has one time series in each cluster that is suitable for visual observation, we used it for subjective evaluation and other datasets are evaluated by objective criteria. Hierarchical clustering with Reality dataset and its extracted features had the same clustering solution. Note that this result is fit with the introduction of the dataset as Euclidean distance produces the intuitively correct clustering [26]. The dendrogram of the clustering solution is shown in Figure 4. As each run of hierarchical clustering for the same 60 CC 50 40 30 20 CBF 10 0 4 Table 3: The mean of the evaluation criteria values obtained from 100 runs of K-means algorithm with eight datasets CBF CC Trace Gun ECG Income Popu Temp Jaccard 0.3490 0.4444 0.3592 0.3289 0.2048 0.6127 0.7142 0.7801 Rand 0.6438 0.8529 0.7501 0.4975 0.5553 0.7350 0.7611 0.8358 FM 0.5201 0.6213 0.5306 0.4949 0.3398 0.7611 0.8145 0.8543 CSM 0.5830 0.6737 0.5536 0.5000 0.4240 0.8288 0.8211 0.8800 NMI 0.3450 0.7041 0.5189 0.0000 0.0325 0.4258 0.5946 0.6933 Table 4: The mean of the evaluation criteria values obtained from 100 runs of K-means algorithm with the extracted features CBF CC Trace Gun ECG Income Popu Temp Jaccard 0.3439 0.4428 0.3672 0.3289 0.2644 0.6344 0.7719 0.8320 Rand 0.6447 0.8514 0.7498 0.4975 0.4919 0.7644 0.8079 0.8758 FM 0.5138 0.6203 0.5400 0.4949 0.4314 0.7770 0.8522 0.8912 CSM 0.5751 0.6681 0.5537 0.5000 0.4526 0.8579 0.8562 0.9117 NMI 0.3459 0.6952 0.5187 0.0000 0.0547 0.4966 0.6441 0.7832 Figure 4: The dendrogram of hierarchical clustering with extracted features and that with original data for Reality dataset dataset always gets the same result, we don't need multiple runs, the criteria values obtained from extracted features are compared to that obtained from original data directly without hypothesis testing. Table 11 describes the evaluation criteria values produced by hierarchical clustering with eight original datasets. Table 12 gives the evaluation criteria values obtained from hierarchical clustering with extracted features. The difference between the items in Table 11 and Table 12 is provided in Table 13. The meaning of the characters in Table 13 is described as below: '>' means a criterion value produced by extracted features is bigger than that produced by original data; '<' denotes a criterion value obtained from extracted features is smaller than that obtained from original data; Otherwise, we set the character as '='. Hierarchical clustering with extracted features produces same result as clustering with the original data on CBF, Trace, Gun, Popu and Temp datasets. For other three datasets, the evaluation criteria values pro- duced by hierarchial clustering with extracted features are ten times bigger than, four times smaller than, and one time equal to that obtained from hierarchical clustering with original data. From the experimental results, we can conclude that the quality of hierarchical clustering with extracted features is better than that with original data aver-agely for the used datasets. The criteria values produced by hierarchical clustering algorithm with features in the prior scale are given in Table 14. The criteria values corresponding to the extracted features shown in Table 12 are nine times bigger than, five times small than, and twenty six times equal to the criteria values corresponding to the features in the prior scale. Table 15 shows the criteria values obtained by hierarchical clustering algorithm with features in the posterior scale. The criteria values produced by the extracted features given in Table 12 are eleven times bigger than, four times smaller than, and twenty five times equal to the criteria values corresponding to the features in the posterior scale. From the experimental results, we can conclude that the quality of hierarchical clustering with extracted features is better than that of hierarchical clustering with features located in the priori and posterior scale averagely for the used datasets. 5 Conclusions In this paper, unsupervised feature extraction is carried out in order to improve the time series clustering quality and speed the clustering process. We propose an unsupervised feature extraction algorithm for time series clustering using orthogonal wavelets. The features are defined as the approximation coefficients within a specific scale. We show that the sum of squared errors between the approximation series reconstructed from the features and the time-series is Table 5: The difference between the mean of criteria values produced by K-means algorithm with extracted features and with original datasets validated by t-test CBF CC Trace Gun ECG Income Popu Temp Jaccard < = = = > > = = Rand > = = = < > = = FM < = = = > > = = CSM < = = = > > = = NMI = < = = > > = > Table 6: The mean of the evaluation criteria values obtained from 100 runs of K-means algorithm with features in the prior scale CBF CC Trace Gun ECG Income Popu Temp Jaccard 0.3489 0.4531 0.3592 0.3289 0.2048 0.6138 0.7142 0.7801 Rand 0.6438 0.8557 0.7501 0.4975 0.5553 0.7376 0.7611 0.8358 FM 0.5200 0.6299 0.5306 0.4949 0.3398 0.7615 0.8145 0.8543 CSM 0.5829 0.6790 0.5536 0.5000 0.4240 0.8310 0.8211 0.8800 NMI 0.3439 0.7066 0.5189 0.0000 0.0325 0.4433 0.5946 0.6933 Table 7: The difference between the mean of criteria values produced by K-means algorithm with extracted features and with features in the priori scale validated by t-test CBF CC Trace Gun ECG Income Popu Temp Jaccard < < = = > > = = Rand > < = = < > = = FM < < = = > > = = CSM < < = = > > = = NMI > < = = > > = > Table 8: The mean of the evaluation criteria values obtained from 100 runs of K-means algorithm with features in the posterior scale CBF CC Trace Gun ECG Income Popu Temp Jaccard 0.3457 0.4337 0.3632 0.3289 0.2688 0.4112 0.7770 0.8507 Rand 0.6455 0.8482 0.7501 0.4975 0.4890 0.5298 0.8141 0.8906 FM 0.5158 0.6114 0.5352 0.4949 0.4388 0.5826 0.8560 0.9031 CSM 0.5771 0.6609 0.5545 0.5000 0.4663 0.6205 0.8611 0.9216 NMI 0.3474 0.6868 0.5190 0.0000 0.0611 0.0703 0.6790 0.8037 Table 9: The difference between the mean of criteria values produced by K-means algorithm with extracted features and with features in the posterior scale validated by t-test CBF CC Trace Gun ECG Income Popu Temp Jaccard = > = = = > = = Rand = > = = = > = = FM = > = = = > = = CSM = > = = < > = = NMI = > = = = > = = Table 10: The execution time (s) of HC + FE and HC CBF CC Trace Gun ECG Income Popu Temp Reality HC 13.7479 51.3319 2.7102 2.1256 0.3673 0.0169 0.0136 0.0511 0.0334 HC + FE 12.1269 49.7365 2.6423 2.0322 0.2246 0.0156 0.0133 0.0435 0.0172 Table 11: The evaluation criteria values produced by hierarchical clustering algorithm with the original eight datasets CBF CC Trace Gun ECG Income Popu Temp Jaccard 0.3299 0.5594 0.4801 0.3289 0.3259 0.5548 0.4583 0.4497 Rand 0.3369 0.8781 0.7488 0.4975 0.3619 0.5800 0.5211 0.4877 FM 0.5714 0.7378 0.6827 0.4949 0.5535 0.7379 0.6504 0.6472 CSM 0.4990 0.7540 0.6597 0.5000 0.4906 0.6334 0.6386 0.6510 NMI 0.0366 0.8306 0.6538 0.0000 0.0517 0.1460 0.1833 0.1148 Table 12: The evaluation criteria values obtained by hierarchical clustering algorithm with appropriate features extracted from eight datasests CBF CC Trace Gun ECG Income Popu Temp Jaccard 0.3299 0.5933 0.4801 0.3289 0.3355 0.5068 0.4583 0.4497 Rand 0.3369 0.8882 0.7488 0.4975 0.3619 0.5200 0.5211 0.4877 FM 0.5714 0.7682 0.6827 0.4949 0.5696 0.6956 0.6504 0.6472 CSM 0.4990 0.7758 0.6597 0.5000 0.4918 0.6402 0.6386 0.6510 NMI 0.0366 0.8525 0.6538 0.0000 0.0847 0.0487 0.1833 0.1148 Table 13: The difference between the criteria values obtained by hierarchical clustering algorithm with eight datasets and with features extracted from the datasets CBF CC Trace Gun ECG Income Popu Temp Jaccard = > = = > < = = Rand = > = = = < = = FM = > = = > < = = CSM = > = = > > = = NMI = > = = > < = = Table 14: The evaluation criteria values obtained by hierarchical clustering algorithm with features in the prior scale CBF CC Trace Gun ECG Income Popu Temp Jaccard 0.3299 0.5594 0.4801 0.3289 0.3259 0.5548 0.4583 0.4497 Rand 0.3369 0.8781 0.7488 0.4975 0.3619 0.5800 0.5211 0.4877 FM 0.5714 0.7378 0.6827 0.4949 0.5535 0.7379 0.6504 0.6472 CSM 0.4990 0.7540 0.6597 0.5000 0.4906 0.6334 0.6386 0.6510 NMI 0.0366 0.8306 0.6538 0.0000 0.0517 0.1460 0.1833 0.1148 Table 15: The evaluation criteria values obtained by hierarchical clustering algorithm with features in the posterior scale CBF CC Trace Gun ECG Income Popu Temp Jaccard 0.3299 0.4919 0.4801 0.3289 0.3355 0.5090 0.4583 0.4343 Rand 0.3369 0.8332 0.7488 0.4975 0.3619 0.5467 0.5211 0.5123 FM 0.5714 0.6973 0.6827 0.4949 0.5696 0.6908 0.6504 0.6207 CSM 0.4990 0.6640 0.6597 0.5000 0.4918 0.6258 0.6386 0.6340 NMI 0.0366 0.7676 0.6538 0.0000 0.0847 0.0145 0.1833 0.1921 equal to the energy of the wavelet coefficients within this scale and lower scales. Based on this property, we leverage the conflict of taking lower dimensionality and lower sum of squared errors simultaneously by finding the scale within which the energy of wavelet coefficients is lower than that within the nearest lower scale. An efficient feature extraction algorithm is designed without reconstructing the approximation series. The time complexity of the feature extraction algorithm can achieve O{mn) with Haar wavelet transform. The main benefit of the proposed feature extraction algorithm is that dimensionality of the features is chosen automatically. We conducted experiments on nine time series datasets using K-means and hierarchical clustering algorithm. The clustering results were evaluated by subjective observation and five objective criteria. The chain process of performing feature extraction firstly then executing clustering algorithm with extract features executes faster than clustering directly with original data for all the used datasets. The quality of clustering with extracted features is better than that with original data averagely for the used datasets. The quality of clustering with extracted appropriate features is also better than that with features corresponding to the scale prior and posterior to the appropriate scale. References [1] http://www.bea.gov/bea/regional/spi. [2] http://www.ncdc.noaa.gov/rcsg/ datasets.html. [3] http://www.census.gov/population/www/ estimates/st_stts.html. [4] R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms, pages 69-84, 1993. [5] R. Bellman. Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton, NJ, 1961. [6] K. Beyen, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbor meaningful? In Proceedings of the 7th International Conference on Database Theory, pages 217-235, 1999. [7] C. S. Burrus, R. A. Gopinath, and H. Guo. Introduction to Wavelets and Wavelet Transforms, A Primer. Prentice Hall, Englewood Cliffs, NJ, 1997. [8] K. P. Chan and A. W. Fu. Efficient time series matching by wavelets. In Proceedings of the 15th International Conference on Data Engineering, pages 126133, 1999. [9] K. P. Chan, A. W. Fu, and T. Y. Clement. Harr wavelets for efficient similarity search of time-series: with and without time warping. IEEE Trans. on Knowledge and Data Engineering, 15(3):686-705, 2003. [10] C. K. Chui. An Introduction to Wavelets. Academic Press, San Diego, 1992. [11] T. Cover and J. Thomas. Elements of Information Theory. Wiley Series in Communication, New York, 1991. [12] M. Dash, H. Liu, and J. Yao. Dimensionality reduction of unsupervised data. In Proceedings of the 9th IEEE International Conference on Tools with AI, pages 532-539, 1997. [13] I. Daubechies. Ten Lectures on Wavelets. Philadelphia, PA, 1992. SIAM, [14] D. L. Donoho. De-noising by soft-thresholding. IEEE Trans. on Information Theory, 41(3):613-627, 1995. [15] J. G. Dy and C. E. Brodley. Feature subset selection and order identification for unsupervised learning. In Proceedings of the 17th International Conference on Machine Learning, pages 247-254, 2000. [16] X. Z. Fern and C. E. Brodley. Solving cluster ensemble problems by bipartite graph partitioning. In Procedings of the 21th International Conference on Machine Learning, 2004. Article No. 36. [17] M. Gavrilov, D. Anguelov, P. Indyk, and R. Motwani. Mining the stock market: Which measure is best? In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 487-496, 2000. [18] P. Geurts. Pattern extraction for time series classification. In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pages 115-127, 2001. [19] A. L. Goldberger, L. A. N. Amarai, L. Glass, J. M. Hausdorff, P. Ch. Ivanov, R. G. Mark, J. E. Mi-etus, G. B. Moody, C.-K. Peng, and H. E. Stanley. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23):e215-e220, 2000. http://www.physionet.org/physiobank/database/. [20] M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validataion techniques. Journal of Intelligent Information Systems, 17(2-3):107-145, 2001. [21] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Fransisco, 2000. [22] D. Jiang, C. Tang, and A. Zhang. Cluster analysis for gene expression data: A survey. IEEE Trans. On Knowledge and data engineering, 16(11):1370-1386, 2004. [23] S. Kaewpijit, J. L. Moigne, and T. E. Ghazawi. Automatic reduction of hyperspectral imagery using wavelet spectral analysis. IEEE Trans. on Geoscience and Remote Sensing, 41(4):863-871, 2003. [24] K. Kalpakis, D. Gada, and V. Puttagunta. Distance measures for effective clustering of arima time-series. In Proceedings of the 2001 IEEE International Conference on Data Mining, pages 273-280, 2001. [25] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehro-tra. Dimensionality reduction of fast similarity search in large time series databases. Journal of Knowledge and Information System, 3:263-286, 2000. [26] E. Keogh and T. Folias. The ucr time series data mining archive. http://www.cs.ucr.edu/~eamonn/TSDMA/ in-dex.html, 2002. [27] E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and Knowledge Discovery, 7(4):349-371, 2003. [28] E. Keogh and M. Pazzani. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining, pages 239241, 1998. [29] S. W. Kim, S. Park, and W. W. Chu. Efficient processing of similarity search under time warping in sequence databases: An index-based approach. Information Systems, 29(5):405-420, 2004. [30] R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273-324, 1997. [31] F. Korn, H. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proceedings of The ACM SIGMOD International Conference on Management of Data, pages 289-300, 1997. [32] J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 2-11, 2003. [33] J. Lin, M. Vlachos, E. Keogh, and D. Gunopulos. Iterative incremental clustering of time series. In Proceedings of 9th International Conference on Extending Database Technology, pages 106-122, 2004. [34] H. Liu and H. Motoda. Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic, Boston, 1998. [35] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, San Diego, second edition, 1999. [36] I. Popivanov and R. J. Miller. Similarity search over time-series data using wavelets. In Proceedings of the 18th International Conference on Data Engineering, pages 212-221, 2002. [37] D. Rafiei and A. Mendelzon. Efficient retrieval of similar time sequences using dft. In Proceedings of the 5th International Conference on Foundations of Data Organizations, pages 249-257, 1998. [38] N. Saito. Local Feature Extraction and Its Application Using a Library of Bases. PhD thesis, Department of Mathematics, Yale University, 1994. [39] G. W. Snedecor and W. G. Cochran. Statistical Methods. Iowa State University Press, Ames, eight edition, 1989. [40] A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3(3):583-617, 2002. [41] L. Talavera. Feature selection as a preprocessing step for hierarchical clustering. In Proceedings of the 16th International Conference on Machine Learning, pages 389-397, 1999. [42] Y. Wu, D. Agrawal, and A. E. Abbadi. A comparison of dft and dwt based similarity search in time-series databases. In Proceedings of the 9th ACM CIKM International Conference on Information and Knowledge Management, pages 488-495, 2000. [43] N. Wyse, R. Dubes, and A. K. Jain. A critical evaluation of intrinsic dimensionality algorithms. In E. S. Gelsema and L. N. Kanal, editors, Pattern Recognition in Practice, pages 415-425. Morgan Kaufmann Publishers, Inc, San Mateo, CA, 1980. [44] Y. Xiong and D. Y. Yeung. Time series clustering with arma mixtures. Pattern Recognition, 37(8):1675-1689, 2004. [45] B. K. Yi and C. Faloustos. Fast time sequence indexing for arbitrary Ip norms. In Proceedings of the 26th International Conference on Very Large Databases, pages 385-394, 2000. [46] H. Zhang, T. B. Ho, and M. S. Lin. A non-parametric wavelet feature extractor for time series classification. In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 595603, 2004. Coloring Weighted Series-Parallel Graphs Gašper Fijavž Faculty of Computer and Information Science E-mail: gasper.fijavz@fri.uni-lj.si Keywords: graph coloring, circular coloring, weighted graphs Received: December 31, 2003 Let G bea series-parallel graph with integer edge weights. A p-coloring of G is a mapping of vertices of G into Zp (ring of integers modulo p) so that the distance between colors of adjacent vertices u and v is at least the weight of the edge uv. We describe a quadratic time p-coloring algorithm where p is either twice the maximum edge weight or the largestpossible sum of three weights of edges lying on a common cycle. Povzetek: Opisano je barvanje grafov. 1 Introduction The motivation of the problem is twofold. An instance of coloring edge weighted graphs is the channel assignment problem, cf. [4]. On the other hand, traditional vertex coloring of (unweighted) graphs can be viewed as a circular one—consider the colors to lie in an appropriate ring of integer residues. Circular colorings of graphs, see [8] for a comprehensive survey, where we allow the vertices to be colored by real numbers (modulo p) model several optimization problems better than traditional colorings of graphs. Circular chromatic number, the minimum p for which a circular coloring exists, is a refinement of the chromatic number of a graph, and similarly NP-hard to compute. If the largest complete minor in (an unweighted graph) G has k vertices and k < 6, then the valid cases of Hadwiger conjecture imply x{G) < k, see [7]. Let G = (V, E, w) be a weighted graph (where (V, E) is the underlying unweighted graph) with edge weights w (and w : E ^ [1, to)). We can, similarly as in the unweighted case, define the size of the largest complete minor, see [5, 6, 3]: the size of the largest weighted K 2 -minor in G is twice the maximal edge weight, and for the size of the largest weighted complete K3 minor we have also to consider the biggest possible sum of weights of three edges lying on the common cycle. If G is a series parallel graph then the largest of the above-mentioned quantities is called the weighted Hadwiger number of G, which we denote by h(G). The weighted case of Hadwiger conjecture is valid only for graphs satisfying h(G) < 4, i.e., it is true that if h(G) < 4, then the weighted chromatic number of G, which we denote by Xw (G), is at most h(G) [3]. If a weighted graph G is not series-parallel, then it may occur that xw (G) > h(G), see [3] for examples. Hence, for series-parallel weighted graphs h(G) is a natural upper bound for xw (G). We present an algorithm for h(G)-coloring weighted series-parallel graphs. As op- posed to results in [3], the coloring algorithm presented here successfully colors series-parallel graphs with at most h(G) colors even if the ratio between maximal and minimal edge weights exceeds 2. 2 Definitions and preprocessing Let N denote the set of positive integers and let Zp denote the ring of integers modulo p. If x,y e Zp then we denote the distance between x and y in Zp by \x - y|p. Let G = (V, E, w) be a weighted series-parallel graph. Series-paralel graphs are constructed by first pasting triangles along edges (starting with a triangle), and then deleting edges [2]. In order to avoid computational difficulties concerning real numbers we shall assume that weights are integers, w : E ^ N. A p-coloring of G is a mapping c : V ^ Zp so that for every edge e = uv the condition \c(u) - c(v)\p > w(uv) is satisfied. Given a p-coloring c and an edge e = uv, we call \c(v) - c(u)\p the span of e, denoted by span(e), and say that e is tight if its span equals its weight. We shall also say that p is the size of the color space Zp. 2.1 Tree decomposition Tree decomposition, see [2] for the theoretical background, of a series-parallel graph can be computed in lineartime [1]. Given a tree-decomposition (tg, V) of G we can by adding edges to G (and setting their weights to 1) assume that G is an edge-maximal series parallel graph. The parts V of the decomposition are exactly the edges and triangles of G. Two parts are adjacent (in TG) if and only if one part is a triangle t, the other is an edge e, and e is incident with t. Hence, G is 2-connected, and given distinct edges e and f from G, there exists a cycle containing both. We shall use both G and its tree decomposition (Tg , V) for storing the graph during the course of the coloring algorithm. Let e = viv2 be an edge in G. If {vi,v2} is a separator in G we say that an edge e is a separating edge in G, and e is called nonseparating otherwise. If e = viv2 is separating and G -{vi, V2} consists of k components Ci,.. .,Ck, then G i (i = 1,..., k) denotes the graph (infact its representation) induced by vertices of Ci and {v1, v2}. We call Gj's (i = 1,. ..,k) the e-splits of G. Throughout the algorithm we shall keep track whether an edge e is a separating edge of G. This can be easily seen from TG, namely, an edge e is nonseparating if it is adjacent to a single triangle in TG. Let t = e1e2e3, be a triangle (t contains edges e1, e2, and e3) in G. Let us further assume that e1 is a separating edge and let Go, G1.. .,Gk be all e1-splits of G, so that Go contains triangle t as its subgraph. Then the (graph) union G1 U ... U Gk is called the (t, e1)-fragment of G and is denoted by G(t, e1). If e is nonseparating, and there exists a triangle t containing e (there may be at most one), then the (t, e)-fragment of G is the graph containing only e together with its endvertices. We call a graph trivial if it contains at most two vertices. 2.2 Heavy cycle, heavy triangle As noted in the introduction h(G), the hadwiger number of G, equals either twice the weight of the heaviest edge or the sum of three largest edge weights of edges lying on a common cycle. It is the latter option that is more appealing to our problem. Let t = f1f2f3 be a triangle in G. Define G1 = G(t,f1), G2 = G(t,f2), and G3 = G(t,f3). If Gi (i = 1, 2, 3) is trivial, then we say that ei = fi is a realizing edge of t (in Gi). If Gi is not trivial then every heaviest edge in G i can be chosen as a realizing edge of t (in Gi). Weight of a triangle t, w(t), is defined as the sum of edge weights of edges realizing t. Clearly enough, the realizing edges of a triangle lie on a common cycle in G. Let e1 and e2 be distinct edges with largest edge weights in G. Triangle t is called a heavy triangle if w(t) equals h(G), and both e1 and e2 are realizing edges of G. It may occur that no triangle is heavy in G. In this case we can by increasing weight of a single edge construct a heavy triangle in G while not increasing h(G). This is the essence of the procedure heavyTriangle described in the next section. By scanning through the edges of G, we find some heaviest edge ea = uava. Next we run heavyTriangle(G, ea, ea, ea) ^ h(G); t, fa, fb, fc] e^, e^, P. Finally, we set p = h(G), c(ua) = 0, c(va) = w(ea) and run the main coloring procedure color(G,p, t; fa, fb, fc] ea, eb, ec] P) using a heavy triangle t = fafb fc with its realizing edges ea, eb, and ec as arguments. 3 Coloring algorithm The coloring algorithm is recursive. Given a graph G with two precolored adjacent vertices ua and va we split G along a carefully chosen edge(s) into several subgraphs, say G0, G1,... Only one of these, say Go, contains both Ua and va, and it is the first one to get colored. We find colorings of G1, G2,... recursively, taking care that exactly two vertices of G j are already colored when it is G j 's turn. 3.1 Looking for a heavy triangle We shall first describe the routine heavyTriangle. The input for his routine consists of weighted graph G, edges ea and eb (ea is heaviest in G, and eb is either second heaviest in G or ea = eb), and a path P C TG linking edges ea and eb (P is trivial in case ea = eb). The routine heavyTriangle outputs, apart from the possibly new eb and P, also the hadwiger number h(G), a heavy triangle t = fa fb fc, and its third realizing edge ec. We set the notation so that ea e E(G(t, fa)), eb e E(G(t, fb)), and ec e E(G(t, fc)), and assume that h(G) = w(ea) + w(eb) + w(ec). We use the following shorthand heavyTriangle(G, ea, eb, P) ^ h(G); t, fa, fb, fc] eb, ec, P. The routine runs as follows: (T1) if ea = eb then we find some second heaviest edge in G and adjust P so that P links ea and the newly determined eb. Hence ea = eb. (T2) For every triangle t we compute the realizing edges and its weight w(t). This can be done by tracing TG starting from ea first. Hence ea is one of the realizing edges in every triangle t. By retracing towards ea from the leaves of Tg we compute the other two realizing edges of every triangle recursively. Finally we set that eb is one of the realizing edges in every triangle lying in P (in the direction from eb to ea). (T3) Find the triangle t' with largest possible w(t'). Set h(G) = max{w(t'), 2w(ea)}. (T4) If h(G) > w(t') or h(G) = w(t') and eb is not one of the realizing edges of t' then do the following: Let t = fafbfc be an arbitrary triangle from P so that e a e G (t, fa) and eb e G (t, fb). Set ec = fc and increase the weight of ec = fc by setting w(ec) = h(G) - w(ea) - w(eb). Note that increasing weight of ec does not increase h(G), as ea and eb are heaviest edges in G. (T5) If h(G) = w(t') a^ eb is one of the realizing edges in t' then : By (T2) ea is also one of the realizing edges in t'. Set t = t'. Further, set ec to be the third realizing edge in t = fafbfc where the notation of edges in t is chosen so that ea e E(G(t, fa)), etc. (T6) output h{G); t, fa, fb, fc] e^, ec, P. It is not difficult to see that heavyTriangle runs in linear time. 3.2 Recursion We shall first describe a routine for coloring a graph with small edge weights. Let e = uv be the heaviest edge in G, and assume that p > 3w(e), where p denotes the size of the color space. Let us also assume that colors c(u) and c(v) are already determined so that the span(e) is at most p -2w(e). Procedure colorCgraph with G, p, and e as its input (satisfying the above conditions) extends the coloring c to the remaining vertices of G. This can be done by tracing along Tg starting at e, and taking care that every edge f G G satisfies span(f) < p - 2w(e) (as w(f ) < w(e)). It is easy to implement colorCgraph to run in linear time. We turn our attention to coloring the graph in case its edge weights (at least some of them) are large when compared to h(G). Let G be a weighted graph, p an upper bound for h(G), t = fafbfc a heavy triangle, and ea, eb, and ec its realizing edges (so that ea G E(G(t, fa)), etc.). Let P be a path in TG joining ea and eb, and suppose that a coloring of endvertices of ea is given so that ea is tight. Then Coloring Principle. With the assumptions as above the procedure color extends the coloring c to the rest of G so that (i) apart from ea the edge eb is also tight, and (ii) span(ec) < p - w(ea) - w(eb). The call color(G,p, t; fa, fb, fc] ea, eb, e^ P) splits into three cases, and exactly one of them applies. These three cases will also serve as a recursive proof that a graph can indeed be colored according to the principle. The first case (C1) serves as the recursion basis, the last two cases (C2) and (C3) serve as recursion steps. (C1) if G contains a single triangle t then. In this case ea = fa, eb = fb, and ec = fc. Let u and v be the (colored) endvertices of ea, and let w be the common endvertex of eb and ec. There exists a unique color c(w) so that eb is tight and span(ec) = p - w(ea) - w(eb). Hence, we can extend the coloring to G according to the coloring principle. exit (C2) if ea is a separating edge in G then let Go,Gi,... ,Gk be the ea-splits of G so that Gq contains eb, ec, fa, fb, fc, t, and P. We first color Gq by calling color(Go,p, t; fa, fb, fc] ea, eb, ec] P) and then take care of the other splits: for i = 1 to k do heavyTriangle(Gj,p,ea,ea,ea) ^ h(Gi),ti; fai, fbi, fci; ebi, eci, Pi for i = 1 to k do color(Gi,p, ti; fai, fbi, fci; ea, ebi, eci; Pi) exit (C3) if ea is nonseparating in G then we first increase weights of fb and fc by setting w(fc) = w(ec) and w(fb) = w(eb). Let G a be the graph containing G(t, ea) and triangle t. Observe that either Ga contains at least two triangles or at least one of G(t, fb), G(t, fc) is not trivial (i.e. at least one of fb, fc is separating in G). Let Pa be the subpath of P linking fb and ea. Since w(fb) = w(eb) the edge fb is second heaviest in Ga. if ea = fa then color(Ga,p, t; ea, fb, fc; ea, fb, fc; Pa) Note that in the above case Ga contains a single triangle t as ea is not separating in G. Observe that w(fa) < w(eb) = w(fb), as eb is second heaviest in G and ea = fa. Hence, we increase the weight by setting w(fa) = w(fb), which makes fa second heaviest in G(t, fa). Let P' be the subpath of Pa linking ea and fa, let G' be the graph G(t, fa), and let G" be the subgraph of G induced by triangle t. heavyTriangle(G',p,ea,fa,P') ^ h(G'),t'; fa,f',fc; fa,e'c,P' color(G',p,t'; fa, fb, fc ; ea,fa,eJ„; P ') After coloring G' the edge fa is tight and we also color(G'',p,t; fa, fb, fc; fa, fb, fc; fatfb) end if Note that at this point endvertices of both fb and fc are colored. What is more, fb is tight, and by recursion, span(fc) < p - w(ea) - w(fb) = p - w(ea) - w(eb). Finally we settle the uncolored parts. if fb is separating in G then heavyTriangle(G(t, fb), fb, eb, ebPfb) ^ h(G(t, fb)),ti;; fai, fbi, fci; ebi,eci,Pi color(G(t, fb),p, ti; fai, fbi, fci; fb, ebi, eci; Pi) if fc is separating in G then colorCgraph(G(t, fc),p, fc) exit 4 Time complexity The last section is devoted to estimating the speed of the coloring algorithm. TIME COMPLEXITY. There exists a constant C so that for every weighed series parallel graph G of order n, the running time of the described coloring algorithm is bounded above by Cn?. In other words, we can h(G)-color a weighted series parallel graph G in quadratic time. As already mentioned, the preprocessing takes linear amount of time. After preprocessing G is an edge maximal series parallel graph. If G contains n + 3 vertices then G contains n triangles, 2n +1 edges, and 3n lines (edge- triangle incidencies). All these quantities are equally appropriate for measuring the size of the problem. Let T(n) denote the maximal running time for the color procedure taking a graph G with n triangles as an input. We have to show that T (n) < Cn2 assuming T (m) < Cm2 for every m < n. Let Don be the upper bound for the running times of both heavyTriangle and colorCgraph if they take agraph G containing n triangles as input. A call of color with G as its argument takes one of the three possible options: (C1), (C2), or (C3). The running time of (C1) is bounded above by a constant, say Di. If (C2) applies let G0,G1,...,Gk be the splits. Observe that k > 1. Since Gj's together contain exactly n triangles, the recursively called procedures heavyTriangle cumulatively take no more than Don running time. Let (no, ni, n2,..., nk) be a proper (integer) partition of n, i.e. no,ni,n2,... ,nk > 1, k > 1, and no + ni + • • • + nk = n. Then n2 + n2 + • • + ni < no + (ni + • • + ni)2 < (n - 1)2 + 1 (1) Now (1) implies that the cumulative running time of recursive calls of procedure color in (C2) is bounded from above by C(n - 1)2 + C. Summing it all up, the running time of (C2) is bounded from above by C(n-1)2 +C+Don+D2n if we use at most D2n time for running the loops (excluding time for recursive calls of heavyTriangle and color). The case when (C3) applies is settled similarly as above. Assume that the base running time of (C3) (i.e. the running time excluding running times of recursive calls of color, colorCgraph, and heavyTriangle) is bounded by constant D3 . Recursive color-ing and colorCgraph-ing takes at most C (n-1)2 + C, and heavyTriangle-s takeatmost D0n time. Combining all three possibilities yields T(n) < max{Di, C(n - 1)2 + C + D0n + D2n, C(n - 1)2 + C + Don + D3} < max{Di, Cn2 + (-2Cn + 2C + Don + D2n), Cn2 + (-2Cn + C + Don + D3)}, which is, if C is large enough, at most Cn2. This proves the assertion on time complexity. Acknowledgment The author's research was conducted while visiting University of Hannover under sponsorship of Alexander von Humboldt Foundation. Both hospitality of the university and help of the foundation are greatly acknowledged. [3] G. Fijavž. Hadwiger's conjecture for circular colorings of edge-weighted graphs. Discrete Math., to appear. [4] C. McDiarmid. Discrete mathematics and radio channel assignment. In Recent advances in algorithms and combinatorics, volume 11 of CMS Books Math./Ouvrages Math. SMC, pages 27-63. Springer, New York, 2003. [5] B. Mohar. Circular colorings of edge-weighted graphs. J. Graph Theory, 43:107-116, 2003. [6] B. Mohar. Hajós theorem for colorings of edge-weighted graphs. Combinatorica, 25:65-76, 2004. [7] B. Toft. A survey of Hadwiger's conjecture. Congr. Numer., 115:249-283, 1996. Surveys in graph theory (San Francisco, CA, 1995). [8] X. Zhu. Circular chromatic number: a survey. Discrete Math., 229(1-3):371-410, 2001. Combinatorics, graph theory, algorithms and applications. References [1] H. L. Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM J. Com-put., 25(6):1305-1317, 1996. [2] R. Diestel. Graph theory. Springer, New York, 1997. Credit Classification Using Grammatical Evolution Anthony Brabazon University College Dublin, Ireland. E-mail: Anthony.Brabazon@ucd.ie Michael O'Neill University of Limerick, Ireland. E-mail: Michael.ONeill@ul.ie Keywords: grammatical evolution, credit rating, bond rating Received: May 25, 2004 Grammatical Evolution (GE) is a novel data driven, model induction tool, inspired by the biological gene-to-protein mapping process. This study provides an introduction to GE, and demonstrates the methodology by applying it to model the corporate bond-issuer credit rating process, using information drawn from the financial statements of bond-issuing firms. Financial data and the associated Standard & Poor's issuer-credit ratings of 791 US firms, drawn from the year 1999/2000 are used to train and test the model. The best developed model was found to be able to discriminate in-sample (out-of-sample) between investmentgrade and junk bond ratings with an average accuracy of87.59 (84.92)% across a five-fold cross validation. Povzetek: Metoda gramatične evolucije je uporabljena za klasificiranje kreditov. 1 Introduction Grammatical Evolution (GE) [1], represents an evolutionary automatic programming methodology, and can be used to evolve rule sets. These rule sets can be as general as a functional expression which produces a good mapping between a series of known input-output data vectors. A particular strength of the methodology is that the form of the model need not be specified a priori by the modeler. This is of particular utility in cases where the modeler has a theoretical or intuitive idea of the nature of the explanatory variables, but a weak understanding of the functional relationship between the explanatory and the dependent vari-able(s). GE does not require that the model form is linear, nor does the method require that the measure of model error used in model construction is a continuous or differen-tiable function. Neither is GE a black box method. As such the evolved rules (taking the form of symbolic expressions in this instance) are amenable to human interpretation and consequently have the potential to enhance our understanding of the problem domain. A key element of the methodology is the concept of a Grammar, which governs the creation of the rule sets. This paper describes the GE methodology, and applies the methodology to accurately model the corporate bond rating process. Most large firms employ both share and debt capital to provide long-term finance for their operations. The debt capital may be provided by a bank, or may be obtained by selling bonds directly to investors. As an example of the scale of US bond markets, the value of bonds issued in the first quarter of 2003 totalled $1.70 trillion [2]. A bond can be defined as a 'debt security which constitutes a promise by the issuing firm, to pay a stated rate of interest based on the face value of the bond, and to redeem the bond at this face value at maturity.' When a publicly-traded company wants to issue traded debt (bonds), it must obtain a credit rating for the issue from at least one recognised rating agency (Standard and Poor's (S&P), Moody's or Fitches'). The credit rating represents the rating agency's opinion, at a specific date, of the creditworthiness of a borrower in general (an issuer credit rating), or in respect of a specific debt issue (a bond credit rating). Therefore it serves as a surrogate measure of the risk of non-payment of interest or capital of a bond. These ratings impact on the borrowing cost and the marketability, of issued bonds. 1.1 Motivation for study There are a number of reasons to suppose a priori that the use of an evolutionary automatic programming (EAP) approach such as GE, can prove fruitful in this domain. In common with the related corporate failure prediction problem [3], a feature of the bond-rating problem is that there is no clear theoretical framework for guiding the choice of explanatory variables, or model form. Rating agencies assert that their credit rating process involves consideration of both financial and non-financial information about the firm and its industry, but the precise factors utilised, and the related weighting of these factors, are not publicly disclosed by the rating agencies. In the absence of an underlying theory, most published work on credit rating prediction employs a data-inductive modelling approach, using firm-specific financial data as explanatory variables, in an attempt to recover the model used by the rating agencies. This produces a high-dimensional combinatorial problem, as the modeller is attempting to uncover a good set of model inputs, and model form, giving rise to particular potential for an evolutionary automatic programming methodology such as GE. In this initial application of GE to modelling credit rating, we restrict attention to the binary classification case (discriminating between investment grade vs junk grade ratings). This will be extended to the multi-class case in future work. It is noted that a limited number of studies have applied a grammar-based methodology to constrain the search space for classification rules [3, 4, 5, 6]. This study extends this methodology into the domain of bond-rating. The rest of this contribution is organized as follows. The next section provides an overview of the literature on bond rating, followed by a section which describes Grammatical Evolution. We then outline the data set and methodology utilised. The following sections provide the results of the study followed by a number of conclusions. AAA and BBB (inclusive) are deemed to represent investment grade, with lower quality ratings deemed to represent debt issues with significant speculative characteristics (junk bonds). A 'C' grade represents a case where a bankruptcy petition has been filed, and a 'D' rating represents a case where the borrower is currently in default on their financial obligations. As would be expected, the probability of default depends strongly on the initial rating which a bond receives (see table 1). Initial Rating Defaults (%) AAA 0.52 AA 1.31 A 2.32 BBB 6.64 BB 19.52 B 35.76 CCC 54.38 Table 1: Rate of default by initial rating category (1987-2002)(from [8]). 2 Bond rating Several categories of individuals would be interested in a model that could produce accurate estimates of bond ratings. Such a model would be of interest to firms that are considering issuing debt as it would enable them to estimate the likely return investors would require if the debt was issued, thereby providing information for the pricing of the bonds. The model could also be used to assess the credit-worthiness of firms that have not issued debt and hence do not already have a published bond rating. This information would be useful to bankers or other companies that are considering whether they should extend credit to that firm. Much rated debt is publicly traded on stock markets, and bond ratings are typically changed infrequently. An accurate bond-rating prediction model could indicate whether the current rating of a bond is still justified. To the extent that an individual investor could predict a bond rerating before other investors foresee it, this may provide a trading edge. In addition, the recent introduction of credit-risk derivatives allows investors to buy protection against the risk of the downgrade of a bond [7]. The pricing of such derivative products requires a quality model for estimating the likelihood of a credit rating change. 2.1 Bond Rating Notation Although the precise notation used by individual rating agencies to denote the creditworthiness of a bond or issuer varies, in each case the rating is primarily denoted by a discrete, mutually exclusive, 'letter grade'. Taking the rating structure of S&P as an example, the ratings are broken down into 10 broad classes. The highest rating is denoted AAA, and the ratings then decrease in the following order, AA, A, BBB, BB, B, CCC, CC, C, D. Ratings between Ratings from AAA to CCC can be modified by the addition of a + or a -, to indicate at which end of the rating category the bond rating falls. An initial rating is prepared when a bond is being issued, and this rating is periodically reviewed thereafter by the rating agency. Bonds (or issuers) may be re-rated upwards (upgrade) or downwards (downgrade) if firm or environmental circumstances change. A re-rating of a bond below investment grade to junk bond status (such bonds are colorfully termed 'a fallen angel') may trigger a significant sell-off as many institutional investors are only allowed, by external or self-imposed regulation, to hold bonds of investment grade. The practical affect of a bond (or issuer) being assigned a lower rather than a higher rating is that its perceived riskiness in the eyes of potential investors increases, and consequently the required interest yield of the bond rises. 2.2 Prior Literature In essence, the objective of constructing a model of bond ratings, is to produce a model of rating agency behaviour, using publicly available information. A large literature exists on bond-rating prediction. Earliest attempts utilised statistical methodologies such as linear regression (OLS) [9], multiple discriminant analysis [10], the multi-nomial logit model [11], and ordered-probit analysis [12]. The results from these studies varied, and typically results of about 50-60% prediction accuracy (out-of-sample) were obtained, using financial data as inputs. With the advent of artificial intelligence and machine learning, the range of techniques applied to predict bond ratings has expanded to include neural networks [13]. In the case of prior neural network research, the predictive accuracy of the developed models has varied. Several studies employed a binary pre- dictive target and reported good classification accuracies. For example, [14] used a neural network to predict AA or non-AA bond ratings, and obtained an accuracy of approximately 83.3%. However, a small sample size (47 companies) was adopted in the study, making it difficult to generalise strongly from its results. 3 Grammatical evolution Evolutionary algorithms (EAs) operate on principles of evolution, usually being coarsely modelled on the theories of survival of the fittest and natural selection [15]. In general, evolutionary algorithms can be characterized as: x[t +1]= r(«(s(x[t]))) (1) where x\t] is the population of solutions at iteration t , v(.) is the random variation operator (crossover and mutation), s(.) is the selection for reproduction operator, and r is the replacement operator which determines which of the parents and children survive into the next generation. Therefore the algorithm turns one population of candidate solutions into another, using selection, crossover and mutation. Selection exploits information in the current population, concentrating interest on 'high-fitness' solutions. Crossover and mutation perturb these solutions in an attempt to uncover better solutions, and these operators can be considered as general heuristics for exploration. GE is a grammatical approach to Genetic Programming (GP) that can evolve computer programs (or rulesets) in any language, and a full description of GE can be found in [1, 16, 17, 18]. Rather than representing the programs as syntax trees, as in Koza's GP [19], a linear genome representation is used. Each individual, a variable length binary string, contains in its codons (groups of 8 bits) the information to select production rules from a Backus Naur Form (BNF) grammar. In other words, an individual's binary string contains the instructions that direct a developmental process resulting in the creation of a program or rule. As such, GE adopts a biologically-inspired, genotype-phenotype mapping process. At present, the search element of the system is carried out by an evolutionary algorithm, although other search strategies with the ability to operate over binary or integer strings have also been used [1, 5]. The GE system possesses a modular structure (see figure 1) which will allow future advances in the field of evolutionary algorithms to be easily incorporated. 3.1 The Biological Approach The GE system is inspired by the biological process of generating a protein from the genetic material of an organism. Proteins are fundamental in the proper development and operation of living organisms and are responsible for traits such as eye color and height [20]. The genetic material (usually DNA) contains the information required to produce specific proteins at different points along the molecule. For simplicity, consider DNA to be a string of building blocks called nucleotides, of which there are four, named A, T, G, and C, for adenine, tyrosine, guanine, and cytosine respectively. Groups of three nucleotides, called codons, are used to specify the building blocks of proteins. These protein building blocks are known as amino acids, and the sequence of these amino acids in a protein is determined by the sequence of codons on the DNA strand. The sequence of amino acids is very important as it determines the final three-dimensional structure of the protein, which in turn has a role to play in determining its functional properties. In order to generate a protein from the sequence of nu-cleotides in the DNA, the nucleotide sequence is first transcribed into a slightly different format, that being a sequence of elements on a molecule known as mRNA. Codons within the mRNA molecule are then translated to determine the sequence of amino acids that are contained within the protein molecule. The application of production rules to the non-terminals of the incomplete code being mapped in GE is analogous to the role amino acids play when being combined together to transform the growing protein molecule into its final functional three-dimensional form. The result of the expression of the genetic material as proteins in conjunction with environmental factors is the phenotype. In GE, the phenotype is a sentence or sentences in the language defined by the input grammar. These sentences can take the form, for example, of functions, programs, or as in the case of this study, rule sets. The pheno-type is generated from the genetic material (the genotype) by a process termed a genotype-phenotype mapping. This is unlike the standard method of generating a solution directly from an individual in an evolutionary algorithm by explicitly encoding the solution within the genetic material. Instead, a many-to-one mapping process is employed within which the robustness of the GE system lies. Figure 2 compares the mapping process employed in both GE Figure 1: Modular structure of grammatical evolution Grammatical Evolution Binary String TRANSCRIPTION Biological System XXXXXXXXX dna I Integer String TRANSLATION Program / [Function I Executed Program Phenotypic Effect N = { , , } T = {Sin, +, -, /, *, X, 1.0, (, )} S = And P can be represented as: (A) ::= (0) | ( ) (1) | ( ) (2) | (3) (B) ::= + (0) | - (1) | / (2) | * (3) (C) ::= Sin (D) ::= X (0) | 1.0 (1) Figure 2: A comparison between the grammatical evolution system and a biological genetic system. The binary string of GE is analogous to the double helix of DNA, each guiding the formation of the phenotype. In the case of GE, this occurs via the application of production rules to generate the terminals of the compilable program. In the biological case by directing the formation of the phenotypic protein by determining the order and type of protein subcomponents (amino acids) that are joined together. and biological organisms. 3.2 The Mapping Process When tackling a problem with GE, a suitable BNF (Backus Naur Form) grammar definition must first be defined. The BNF can be either the specification of an entire language or, perhaps more usefully, a subset of a language geared towards the problem at hand. In GE, a BNF definition is used to describe the output language to be produced by the system. BNF is a notation for expressing the grammar of a language in the form of production rules. BNF grammars consist of terminals, which are items that can appear in the language, e.g. binary operators +, -, unary operators Sin, constants 1.0 etc. and non-terminals, which can be expanded into one or more terminals and non-terminals. For example from the grammar detailed below, can be transformed into one of four rules, i.e it becomes , () (which is the same as the first, but surrounded by brackets), () , or . A grammar can be represented by the tuple {N,T,P, 5}, where N is the set of non-terminals, T the set of terminals, P a set of production rules that maps the elements of N to T, and S is a start symbol which is a member of N. When there are a number of productions that can be applied to one element of N the choice is delimited with the symbol. For example, The program, or sentence(s), produced will consist of elements of the terminal set T. The grammar is used in a developmental approach whereby the evolutionary process evolves the production rules to be applied at each stage of a mapping process, starting from the start symbol, until a complete program is formed. A complete program is one that is comprised solely from elements of T. As the BNF definition is a plug-in component of the system, it means that GE can produce code in any language thereby giving the system a unique flexibility. For the above BNF, table 2 summarizes the production rules and the number of choices associated with each. Rule no. Choices A 4 B 4 C 1 D 2 Table 2: The number of choices available from each production rule. The genotype is used to map the start symbol onto terminals by reading codons of 8 bits to generate a corresponding integer value, from which an appropriate production rule is selected by using the following mapping function: Rule = Codon Value % No. Rule Choices (2) where % is the MOD function which returns the remainder after a division operation (e.g. 4 % 3 = 1). Consider the following rule from the given grammar for the non-terminal op. There are four possible production rules for this nonterminal. (B) |/ |* (0) (1) (2) (3) If we assume the codon being read produces the integer 6, then 6%4 = 2 RNA Rules Acids Protein + would select rule (2), the operator /. Each time a production rule has to be selected to transform a non-terminal, another codon is read. In this way the system traverses the genome. During the genotype-to-phenotype mapping process, it is possible for individuals to run out of codons, and in this case we wrap the individual and reuse the codons. This is quite an unusual approach in EAs, as it is entirely possible for certain codons to be used two or more times. This technique of wrapping the individual draws inspiration from the gene-overlapping phenomenon that has been observed in many organisms [20]. In GE, each time the same codon is expressed it will always generate the same integer value, but, depending on the current non-terminal to which it is being applied, it may result in the selection of a different production rule. This feature is referred to as intrinsic polymorphism. Crucially, however, each time a particular individual is mapped from its genotype to its phenotype, the same output is generated. This is the case because the same choices are made each time. However, it is possible that an incomplete mapping could occur, even after several wrapping events, and in this case the individual in question is given the lowest possible fitness value. The selection and replacement mechanisms then operate accordingly to increase the likelihood that this individual is removed from the population. An incomplete mapping could arise if the integer values expressed by the genotype were applying the same production rules repeatedly. For example, consider an individual with three codons, all of which specify rule 0 from below, (A) (0) () (1) () (2) (3) even after wrapping the mapping process would be incomplete and would carry on indefinitely unless stopped. This occurs because the nonterminal is being mapped recursively by production rule 0, so it becomes . Therefore, the leftmost after each application of a production would itself be mapped to a , resulting in an expression continually growing as follows: and so on. Such an individual is dubbed invalid as it will never undergo a complete mapping to a set of terminals. For this reason we impose an upper limit on the number of wrapping events that can occur. It is clearly essential that stop sequences are found during the evolutionary search in order to complete the mapping process to a functional program. The stop sequence being a set of codons that result in the non-terminals being transformed into elements of the grammars terminal set. Beginning from the left hand side of the genome then, codon integer values are generated and used to select rules from the BNF grammar, until one of the following situations arise: 1. A complete program is generated. This occurs when all the non-terminals in the expression being mapped are transformed into elements from the terminal set of the BNF grammar. 2. The end of the genome is reached, in which case the wrapping operator is invoked. This results in the return of the genome reading frame to the left hand side of the genome once again. The reading of codons will then continue, unless an upper threshold representing the maximum number of wrapping events has occurred during this individual's mapping process. 3. In the event that a threshold on the number of wrapping events has occurred and the individual is still incompletely mapped, the mapping process is halted, and the individual is assigned the lowest possible fitness value. To reduce the number of invalid individuals being passed from generation to generation, a steady state replacement mechanism is employed. One consequence of the use of a steady state method is its tendency to maintain fit individuals at the expense of less fit, and in particular, invalid individuals. 4 Experimental approach The dataset consists of financial data of 791 non-financial US companies drawn from the S&P Compustat database. The associated S&P overall credit rating for each corporate bond issuer is also obtained from the database.1 Of these companies, 57% have an investment rating (AAA, AA, A, or BBB), and 43% have a junk rating. To allow time for the preparation of year-end financial statements, the filing of these statements with the Securities and Exchange Commission (S.E.C), and the development of a bond rating opinion by Standard and Poor rating agency, the bond rating of the company as at 30 April 2000, is matched with financial information drawn from their financial statements as at 31 December 1999. A subset of 600 firms was randomly sampled from the total of 791 firms, to produce two groups of 300 'investment' grade and 300 junk rated firms. The 600 firms were randomly allocated to the training set (420) or the hold-out sample (180), ensuring that each set was equally balanced between investment and non-investment grade ratings. A total of eight financial variables was selected for inclusion in this study. The selection of these variables was guided both by prior literature in bankruptcy prediction [21, 22, 23], literature on bond rating prediction 1 S&P is one of the largest credit rating agencies in the world, currently rating about 150,000 issues of securities across 50 countries. It provides credit ratings for about 99.2% of the debt obligations and preferred stock issues which are publicly traded in the US [8]. [14, 24, 25], resulting in an initial judgemental selection of a subset of accounting ratios. These ratios were then further filtered using statistical analysis. Five groupings of explanatory variables, drawn from financial statements, are given prominence in prior literature as being the prime determinants of bond issue quality and default risk: i. Liquidity ii. Debt iii. Profitability iv. Activity / Efficiency v. Size Liquidity refers to the availability of cash resources to meet short-term cash requirements. Debt measures focus on the relative mix of funding provided by shareholders and lenders. Profitability considers the rate of return generated by a firm, in relation to its size, as measured by sales revenue and/or asset base. Activity measures consider the operational efficiency of the firm in collecting cash, managing stocks and controlling its production or service process. Firm size provides information on both the sales revenue and asset scale of the firm and also provides a proxy metric on firm history. The groupings of potential explanatory variables can be represented by a wide range of individual financial ratios, each with slightly differing information content. The groupings themselves are interconnected, as weak (or strong) financial performance in one area will impact on another. For example, a firm with a high level of debt, may have lower profitability due to high interest costs. Following the examination of a series of financial ratios under each of these headings, the following inputs were selected: i. Current ratio ii. Retained earnings to total assets iii. Interest coverage iv. Debt ratio v. Net margin vi. Market to book value vii. Log (Total assets) viii. Return on total assets The objective in selecting a set of proto-explanatory variables is to choose financial variables that vary between companies in different bond rating classes, and where information overlaps between the variables are minimised. Comparing the means of the above ratios for the two groups of ratings (see table 3), reveals a statistically significant difference between the two groups at both the 5% and the 1% level, and as expected, the financial ratios in each case, for the investment ratings are stronger than those for the junk ratings. The only exception is the current ratio, which is stronger for the junk rated companies, possibly indicating a preference for these companies to hoard short-term liquidity, as their access to long-term capital markets is weak. A correlation analysis between the selected ratios (see table 4) indicates that most of the cross-correlations are less than I 0.20 |, with the exception of the debt ratio and (Retained Earnings/Total Assets) ratio pairing, which has a correlation of-0.64. In this study, the GE algorithm uses a steady state replacement mechanism, such that, two parents produce two children the best of which replaces the worst individual in the current population, if the child has greater fitness. The standard genetic operators of bit mutation (probability of 0.01), and variable-length one-point crossover (probability of 0.9) are adopted. A series of functions, are pre-defined as are a series of mathematical operators. A population of initial rule-sets (programs) are randomly generated, and by means of an evolutionary process, these are improved. No explicit model specification is assumed ex-ante, although the choice of mathematical operators defined in the grammar do place implicit limitations on the model specifications amongst which GE can search. The grammar adopted in this study is as follows: ::= if( ) class=''Junk''; else class=''Investment Grade''; ::= ( ) + ( ) | * ::= Current_Ratio | Retained_Earnings_to_total_assest | Interest_Coverage | Debt_Ratio | Net_Margin | Market_to_book_value | Total_Assets | ln(Total_Assets) | Return_on_total_assets ::= ( ) ( ) | ::= + | - | * ::=9|8|7|6|5|4 | 3 | 2 | 1 | -1 | .1 ::= <= 5 Results The results from our experiments are now provided. Each of the GE experiments is run for 100 generations, with variable-length, one-point crossover at a probability of 0.9, one point bit mutation at a probability of 0.01, roulette selection, and steady-state replacement. Results are reported for two population sizes (500 and 1000). To assess the stability of the results across different randomisations of the dataset between training and test data, we recut the dataset five times, maintaining an equal balance of investment and non-investment grade ratings in the resulting training and test datasets. Investment grade Junk bond Current ratio 1.354 1.93 Ret. earn/Tot assets 0.22 -0.12 Interest coverage 7.08 1.21 Debt ratio 0.32 0.53 Net margin 0.07 -0.44 Market to book value 18.52 4.02 Total assets 10083 1876 Return on total assets 0.10 0.04 Table 3: Means of input ratios for investment and junk bond groups of companies. CR RE/TA IC DR NM MTB TA ROA CR 1 -0.08 -0.01 0.06 -0.27 0.01 -0.18 -0.15 RE/TA -0.08 1 0.27 -0.64 0.14 0.15 0.15 0.48 IC -0.01 0.27 1 -0.28 0.06 0.31 0.15 0.41 DR 0.06 -0.64 -0.28 1 -0.05 -0.19 -0.20 -0.27 NM -0.27 0.14 0.06 -0.05 1 0.01 0.03 0.22 MTB 0.01 0.15 0.31 -0.19 0.01 1 0.04 0.14 TA -0.18 0.15 0.15 -0.20 0.03 0.04 1 0.07 ROA -0.15 0.48 0.41 -0.27 0.22 0.14 0.07 1 Table 4: Correlations between financial ratios. In our experiments, fitness is defined as the number of correct classifications obtained by an evolved discriminant rule. The results for the best individual of each cut of the dataset, where 30 independent runs were performed for each cut, averaged over all five randomisations of the dataset, for both the 500 and 1000 population sizes, are given in table 5. In each case the overall classification accuracy is provided, and this is then subdivided into the number of true positives Ntp, the number of true negatives Ntn, and the number of false positives, and false negatives respectively (Nfp, Nf„). To assess the overall hit-ratio of the developed models (out-of-sample), Press's Q statistic [26] was calculated for each model. In all cases, the null hypothesis, that the out-of sample classification accuracies are not significantly better than those that could occur by chance alone, was rejected at the 1% level. A t-test of the hit-ratios also rejected a null hypothesis that the classification accuracies were no better than chance at the 1% level. Across all the data recuts, the best individual achieved an 87.56 (84.36)% accuracy in-sample (out-of-sample) when the population size was 500, with the best individual across all data recuts in the population=1000 case obtaining an accuracy of 87.59 (84.92)% accuracy in-sample (out-of-sample). Although the average out-of-sample accuracy obtained for popula-tion=1000 slightly exceeds that for population=500, the difference was not found to be statistically significant. A plot of the best and average fitness on each cut of the insample dataset, for the population=500 case, can be seen in figure 3, and for case where population=1000 in figure 4. Examining the structure of the best individual in the case where the initial fitness function was utilised and where population=500 shows that the evolved discriminant function had the following form: IF (10 + 16 var6 -9 var4 -2 var9) > 0 THEN 'Junk' ELSE 'Investment Grade' where var6 is Debt Ratio, var4 is R^tT!nta ^^^n^s and ' Total Assets ' var9 is Total Assets. In the case where population=1000 the best evolved discriminant function had a similar form to the above: IF (5 + 8 var6 -4 var4 - var9) > 0 THEN 'Junk' ELSE 'Investment Grade' Examining the signs of the coefficients of the evolved rules does not suggest that they conflict with common financial intuition. The rules indicate that low/negative retained earnings, low/negative total assets or high levels of debt finance are symptomatic of a firm that has a junk rating. It is noted that similar risk factors have been identified in predictive models of corporate failure which utilise financial ratios as explanatory inputs [3, 4]. Conversely, low levels of debt, a history of successful profitable trading, and high levels of total assets are symptomatic of firms that have an investment grade rating. Although the two discriminant functions have differing coefficient values, they are in essence very similar, as the differing coefficient values are balanced by the differing constant term which has been evolved in each function. Considering the individual classification rules, it interesting that despite the potential to generate long, complex ratio chains, this bloating did not occur and the evolved classifiers are reasonably concise in form. We also note that the evolved classifiers (unlike those created by means of a neural network methodology, for example) are amenable to human interpretation. Fitness TP TN FP FN Train GEBOND500 0.861 185.4 176.4 33.6 24.6 Train GEBOND1000 0.867 183.4 180.8 29.2 26.6 Out-Sample GEBOND500 0.854 77.8 76 14 12.2 Out-Sample GEBOND1000 0.860 78 76.8 13.2 12 Train MLP 0.8690 181.8 183.2 26.8 28.2 Out-sample MLP 0.8500 75.8 77.2 12.8 14.2 Table 5: Performance of the best evolved rules on their training and out-of-sample datasets, averaged over all five randomisations, compared with the classification performance of an MLPs on same datasets. Grammatical Evolution - Bond Rating Grammatical Evolution - Bond Rating 0.84 0.83 0.82 0.81 ^ 0.79 - 0.78 0.77 0.76 0.75 0.74 40 60 Generation 40 60 Generation Figure 3: Best and average fitness values of 30 runs on the five recuts of the in-sample dataset with a population size of 500. 0 20 80 100 0 20 80 100 Grammatical Evolution - Bond Rating Grammatical Evolution - Bond Rating 0.83 0.82 0.79 0.78 f; 0.77,,: 40 60 Generation 40 60 Generation Figure 4: Best and average fitness values of 30 runs on the five recuts of the in-sample dataset with a population size of 1000. 5.1 Benchmarking the Results To provide a benchmark for the results obtained by GE, we compare them with the results obtained on the same recuts of the dataset, using a fully-connected, three-layer, feedforward multi-layer perceptron (MLP) trained using the back-propagation algorithm, and with the results obtained using linear discriminant analysis. The developed MLP networks utilised all the explanatory variables. The optimal number of hidden-layer nodes was found following experimentation on each separate data recut, and varied between two and four nodes. The classification accuracies for the networks, averaged over all five recuts is provided in table 5. The levels of classification accuracy obtained with the MLP are competitive with earlier research, with for example [14] obtaining an out-of-sample classification accuracy of approximately 83.3%, although it is noted that the size of the dataset in this study was small. Comparing the results from the MLP with those of GE on the initial fitness function suggests that GE has proven highly competitive with an MLP methodology, producing a similar classification accuracy on the training data, and slightly out-performing the MLP out-of-sample. Utilising the same dataset recuts as GE, LDA produced results (averaged across all five recuts) of 82.74% insample, and 85.22% out-of-sample. Again, GE is competitive against these results in terms of classification accuracy. Comparing the results obtained by the linear classifiers (LDA and GE) against those of an MLP suggests that strong non-linearities between the explanatory variables and the dependent variable are not present. 6 Conclusions & future work In this paper a novel methodology, GE, was described and applied for the purposes of prediction of bond ratings. It is noted that this novel methodology has general utility for rule-induction applications. GE was found to be able to evolve quality classifiers for bond ratings from raw financial information. Despite using data drawn from companies in a variety of industrial sectors, the developed models showed an impressive capability to discriminate between investment and junk rating classifications. The GE-developed models also proved highly competitive with a series of MLP models developed on the same datasets. Several extensions of the methodology in this study are indicated for future work. One route is the inclusion of non-financial company and industry-level information as 0 20 80 100 0 20 80 100 input variables. A related possibility would be to concentrate on building rating models for individual industrial sectors. The study can also be extended to encompass the multi-class rating prediction problem. As already noted, there are multiple methodologies available for the generation of classification rules / regression models [27, 28]. Future work could extend this study by examining the general utility of GE vs other methods of generating classification rules, by comparing the performance of a range of methods on a wider range of datasets. References [1] O'Neill M. and Ryan C. (2003) Grammatical Evolution: Evolutionary Automatic Programming in an Arbitrary Language. Kluwer Academic Publishers 2003. [2] Bond Market Statistics (2003). New York: The Bond Market Association. [3] Brabazon, A. and O'Neill. M. (2004). Diagnosing Corporate Stability using Grammatical Evolution, International Journal of Applied Mathematics and Computer Science, 14(3), pp. 363-374. [4] Brabazon, A. and O'Neill, M. (2003). Anticipating Bankruptcy Reorganisation from Raw Financial Data using Grammatical Evolution, Proceedings of EvoIASP 2003, Lecture Notes in Computer Science (2611): Applications of Evolutionary Computing, edited by Raidl, G., Meyer, J.A., Middendorf, M., Cagnoni, S., Cardalda, J. J. R., Corne, D. W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C. G., Mar-chiori, E., pp. 368-378, Berlin: Springer-Verlag. [5] O'Neill, M. and Brabazon, A. (2004). Grammatical Swarm, in Proceedings of the Genetic and Evolutionary Computation Conference GECCO 2004, Lecture Notes in Computer Science (3102), Deb et. al. (eds.), Seattle, USA, June 26-30, 2004, 1, pp. 163174, Berlin: Springer-Verlag. [6] Shan, Y., McKay, R., Baxter, R., Abbass, H., Essam, D. and Nguyen, H. (2003). Grammar Model Based Program Evolution, in Proceedings of the 2004 IEEE Congress on Evolutionary Computation, 1, pp. 478485, IEEE Press: New Jersey. [7] Altman, E. (1998). The importance and subtlety of credit rating migration, Journal of Banking & Finance, 22, pp. 1231-1247. [8] Standard and Poor's (2002). Standard and Poor's Rating Services, Statement at US SEC Public Hearing on the Role and Function of Credit Rating Agencies in the US Securities Markets, 15 November 2002. [9] Horrigan, J. (1966). The determination of long term credit standing with financial ratios, Journal of Accounting Research, (supplement 1966), pp. 44-62. [10] Pinches, G. and Mingo, K. (1973). A multivariate analysis of industrial bond ratings, Journal of Finance, 28(1), pp. 1-18. [11] Ederington, H. (1985). Classification models and bond ratings, Financial Review, 20(4), pp. 237-262. [12] Gentry, J., Whitford, D. andNewbold, P. (1988). Predicting industrial bond ratings with a probit model and funds flow components, Financial Review, 23(3), pp. 269-286. [13] Maher, J. and Sen, T. (1997). Predicting bond ratings using neural networks: a comparison with logistic regression, Intelligent Systems in Accounting, Finance and Management, 6, pp. 23-40. [14] Dutta, S. and Shekhar, S. (1988). Bond rating: a nonconservative application of neural networks, Proceedings of IEEE International Conference on Neural Networks, II, pp. 443-450. [15] Fogel, D. (2000). Evolutionary Computation: Towards a new philosophy of machine intelligence, New York: IEEE Press. [16] O'Neill, M. (2001). Automatic Programming in an Arbitrary Language: Evolving Programs in Grammatical Evolution. PhD thesis, University of Limerick, 2001. [17] O'Neill M. and Ryan C. (2001) Grammatical Evolution, IEEE Trans. Evolutionary Computation. 2001. [18] Ryan C., Collins J.J. and O'Neill M. (1998). Grammatical Evolution: Evolving Programs for an Arbitrary Language. Lecture Notes in Computer Science 1391, Proceedings of the First European Workshop on Genetic Programming, pp. 83-95, Springer-Verlag. [19] Koza, J. (1992). Genetic Programming. MIT Press. [20] Lewin, Benjamin. (2000). Genes VII. Oxford University Press. [21] Altman, E. (1993). Corporate Financial Distress and Bankruptcy, New York:John Wiley and Sons Inc. [22] Morris, R. (1997). Early Warning Indicators of Corporate Failure: A critical review of previous research and further empirical evidence, London: Ashgate Publishing Limited. [23] Altman, E. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy, Journal ofFinance, 23, pp. 589-609. [24] Kamstra, M., Kennedy, P. and Suan, T.K. (2001). Combining Bond Rating Forecasts Using Logit, The Financial Review , 37, pp. 75-96. [25] Singleton, J. and Surkan, A. (1991). Modeling the Judgment of Bond Rating Agencies: Artificial Intelligence Applied to Finance, Journal of the Midwest Finance Association, 20, pp. 72-80. [26] Hair, J., Anderson, R., Tatham, R. and Black, W. (1998). Multivariate Data Analysis, Upper Saddle River, New Jersey: Prentice Hall. [27] Breiman, L., Freidman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees, New York: Chapman and Hall. [28] Torgo, L. (2000). Partial Linear Trees, in Proceedings of the 17th International Conference on Machine Learning (ICML 2000), Langley, P. (ed), pp. 10071014, Morgan Kauffman. Integration of Bargaining into E-Business Systems Heinrich C. Mayr University of Klagenfurt, Department of Business Informatics and Application Systems, Universitätsstr. 65-67, 9020 Klagenfurt, Austria E-mail: mayr@ifit.uni-klu.ac.at, http://www.ifi.uni-klu.ac.at/IWAS Klaus-Dieter Schewe Massey University, Department of Information Systems & Information Science Research Centre, Private Bag 11 222, Palmerston North, New Zealand E-mail: k.d.schewe@massey.ac.nz, http://isrc.massey.ac.nz Bernhard Thalheim Christian Albrechts University Kiel, Department of Computer Science and Applied Mathematics, Olshausenstr. 40, 24098 Kiel, Germany E-mail: thalheim@is.informatik.uni-kiel.de, http://www.is.informatik.uni-kiel.de Tatjana Welzer University of Maribor, Institute of Informatics, Smetanova ul. 17, 2000 Maribor, Slovenia E-mail: welzer@uni-mb.si, http://lisa.uni-mb.si Keywords: bargaining, co-design, e-business, games, web information systems Received: October 24, 2006 Despite the fact that bargaining plays an important role in business communications, it is largely neglected in e-business systems. In this paper a conceptual model that integrates bargaining into web-based e-business systems will be developed starting from an informal characterisation of the bargaining process. Bargaining can be formalised as a two-playergame, and integrated with the co-design approach for the design of web information systems. In this way bargaining games are played on parameterised story spaces, such that each move of a player adds constraints to the parameters. Each player follows a strategy for making moves, and winning strategies are characterised by highly-ranked agreements. Povzetek: Opisano je uvajanje pogajanja v sistem e-poslovanja. 1 Introduction flow [8], e-payment [2], trust [4], decision support [3], or web services [12]. Bargaining plays an important role in business communi- In this paper we make an attempt to integrate bargain-cations. For instance, in commerce it is common to bargain ing into web-based e-business systems using the co-design about prices, discounts, etc., and in banking and insurance approach [11] to the design of web information systems bargaining about terms and conditions applies. E-business (WISs). We start with a characterisation of the bargaining aims at supporting business with electronic media, in par- process as an interaction between at least two parties. The ticular web-based systems. These systems support, com- cornerstones of this characterisation are goals, acceptable plement or even replace human labour that would normally outcomes, strategies, secrets, trust and distrust, and pref-be involved in the process. In [10] it has been outlined that erences. We believe that before dropping into formal desuch systems can only be developed successfully, if the hu- tails of a conceptual model for bargaining, we first need a man communication behaviour is well understood, so that clearer picture of what we are aiming at. We will discuss it can become part of an electronic system. Bargaining is the characteristics of bargaining in Section 2 following pre-part of that communication behaviour. vious work in [9]. We will also outline the differences to However, bargaining is largely neglected in e-business. auction systems. In business-oriented literature, e.g. [6,13] secure payments In Section 3 we briefly present parts of the co-design and trust are mentioned, but negotiation latitude or bargain- approach to WIS design [11] in order to have a simple coning do not appear. Looking at the discussion of technology ceptual model of WISs, into which ideas concerning bar-for e-business this comes as no surprise, as the emphasis is gaining can be implanted. We emphasise the idea of story on the sequencing of user actions and the data support, but space as a collection of abstract locations (called scenes) almost never on inferences. For instance, favourable topics and transitions between them that are initiated by actions, in e-business modelling are business processes [1], work- the support of the scenes by database views, and the sup- port of the actions by operations associated with the views. Though many aspects of the co-design approach will be omitted in this model, it will suffice to serve as a basis for a formalisation of bargaining. In Section 4 we develop a model for bargaining based on games that are played on the conceptual model. This idea was already presented in [9], though only in a completely informal way. We concentrate on bargaining involving only two parties. Their "playground" will be the parameterised story space, and moves consist of adding constraints to the parameters. The moves of the players reflect offers, counteroffers, acceptance and denial. Both players aim at an optimal outcome for themselves, but success is defined as outcomes that are acceptable for both parties. Furthermore, players follow bargaining strategies that may lead them to a final agrement. We will characterise such strategies and attempt to define what a "winning strategy" might be, though obviously bargaining games do not end with one party winning and the other one losing. Furthermore, we characterise the context of bargaining as being defined by user profiles including preferences and desires, and bargaining preferences. In e-business systems the role of one player will be taken by a user, while the system plays the other role. This may be extended to a multiple-player game with more than one single human player, e.g. if bargaining becomes too critical to leave it exclusively to a system. 2 Characteristics of the Bargaining Process Let us start looking at human bargaining processes. We consider two typical bargaining situations in a commerce application and a loan application. From these examples we derive characteristic features of bargaining. 2.1 Examples of Bargaining In a typical commerce situation a customer may enter into bargaining over the total price of an order consisting of several goods, each with its particular quantity. The seller might have indicated a price, but as the order will lead to substantial turnover, he is willing to enter into bargaining. The goal of the purchaser is to reduce the total price as much as possible, i.e. to bargain a maximal discount, while the seller might want to keep the discount below a certain threshold. Both parties may be willing to accept additional items added to the order for free. This defines optimal and acceptable outcomes for both sides. However, none of the two parties may play completely with open cards, i.e. the seller may try to hide the maximal discount he could offer, while the purchaser may hide the limit price he is willing to accept. Both parties may also try to hide their preferences, e.g. whether an add-on to the order or a discount is really the preferred option. It may even be the case that adding a presumably expensive item to the order is acceptable to the seller, while the latitude for a discount is much smaller, e.g. if the add-on item does not sell very well. So, both parties apply their own strategies to achieve the best outcome for them. The bargaining process then consists of making offers and counteroffers. Both offers and counteroffers narrow down the possible outcomes. For instance, an offer by the seller indicating a particular discount determines already a maximal price. The purchaser may not be happy with the offer, i.e. the price is not in the set of his/her acceptable outcomes, therefore request a larger discount. Bargaining first moves into the set of mutually acceptable outcomes, finally achieves an agreement, i.e. a contract. Bargaining outside the latitude of either party may jeopardise the whole contract or require that a human agent takes over the bargaining task. Similar price bargaining arises in situations, when real estate, e.g. a house is sold. In loan applications, i.e. both personal loans and mortgages [10] the bargaining parties aim at acceptable conditions regarding disagio, interest rate, securities, duration, bail, etc. The principles are the same as for price bargaining, but the customer may bring in evidence of offers from competing financial institutions. As a loan contract binds the parties for a longer time than a one-off sale, it becomes also important that the bargaining parties trust each other. The bank must be convinced that the customer will be able to repay the loan, and the customer must be convinced that the offer made is reasonable and not an attempt to achieve extortionate conditions. In this case the set of acceptable outcomes is also constrained by law. Figure 1 illustrates the characteristics of bargaining using a mindmap. 2.2 Formal Ingredients In order to obtain a conceptual model from these examples let us try to extract the formal ingredients of the bargaining process. From now on we concentrate on the case that only two parties are involved in the bargaining. First of all there is the object of the bargaining, which can be expressed by a parameterised view. In case of the sales situation this object is the order, which can be formalised by a set of items, each having a quantity, a price, and a discount, plus a discount for the total order. At the beginning of bargaining processes the set contains just the items selected by the customer, and all discounts are set to 0. During the bargaining process items may be added to the order, and discounts may be set. Similarly, in the loan bargaining situation the object is the loan, which is param-eterised by interest rate, disagio, and duration, and the set of securities, some of which might belong to bailsmen, in which case the certification of the bailsmen becomes part of the bargaining object. The set of acceptable outcomes is obtained by instantiations of the bargaining object. These instantiations are Figure 1: Mindmap for Bargaining Characteristics expressed by static constraints for each party. However, the constraints are not visible to the other party. They can only be inferred partially during the bargaining process. In addition to the constraints of each party there are general constraints originating from law or other agreed policies. These general constraints are visible to both parties, and they must not be violated. In case of the sales situation a constraint on the side of the purchaser might be a maximal acceptable price for the original order, or it might be expressed by a minimum discount in terms of any extended order. It may also be the case that the discount is expressed by a function on the set of added items, e.g. the more items are added to the order, the higher the acceptable discount must be. In case of the loan situation constraints on side of the customer can be a maximal load issued by repayments or a maximal value of securities offered. For the bank a minimum level of security and a minimum real interest rate might define their acceptable outcomes. Within the set of acceptable outcomes of either party the outcomes are (partially) ordered according to preferences. For any artificial party these preferences have to be part of the system specification. For instance, in the sales situation the lower the total price, the better is the outcome for the purchaser (inverse for the seller), and an offer with more additional items is higher ranked. However, whether an offer with additional items and a lower discount is preferred over a large discount, depends on the individual customer and his/her goals. An agreement is an outcome that is acceptable to both parties. Usually, bargaining terminates with an agreement, alternatively with failure. The primary goal of each party is to achieve an agreement that is as close as possible to a maximum in the corresponding set of acceptable results. However, bargaining may also involve secondary goals such as binding a customer (for the seller or the bank). These secondary goals influence the bargaining strategy in a way that the opposite party considers offers made to be fair and the agreement not only acceptable, but also satisfactory. This implies that constraints are classified in a way that some stronger constraints define satisfactory outcomes. This can be extended to more than just two levels of outcomes. In general, the bargaining strategy of each party is representable as a set of rules that determine the continuation of the bargaining process in terms of the offers made by the other party. The bargaining process runs as a sequence of offers and counteroffers started by one party. Thus, in principle bargaining works in the same way as a two-player game such as Chess or Go. Each offer indicates an outcome the of- fering party is willing to accept. Thus it can be used to reduce the set of acceptable outcomes of the other party. For instance, if the seller offers a discount, then all outcomes with a smaller discount can be neglected. Similarly, if the purchaser offers a price he is willing to pay, the seller can neglect all lower prices. Nevertheless, there is a significant difference to normal two-player games, as in bargaining there is no direct analogue of the concept of winner. If there is no agreement, both parties lose, and both may consider themselves as winners, if there is an agreement. We may say that a party considers itself the winner, if the agreement is perceived as being better for the own side. Such a characterisation may help to formalise "winning strategies". Furthermore, each party may indicate acceptable outcomes to the opposite party without offering them. Such playing with open cards indicates trust in the other party, and is usually used as a means for achieving secondary (non-functional) goals. In the following we will not not consider this possibility, i.e. we concentrate on bargaining with maximal hiding. In summary, we can characterise bargaining by the bargaining object, constraints for each participating party defining acceptable outcomes, partial orders on the respective sets of possible outcomes, and rules defining the bargaining strategy of each party. In the following we will link these ingredients of a bargaining process to the conceptual model of e-business systems that is offered by the co-design method. Note that bargaining is significantly different from auctioning system. The latter ones, e.g. the eBay system (see http://www.ebay.com) offer products, for which interested parties can put in a bid. If there is at least one acceptable bid, usually the highest bid wins. Of course, each bidder follows a particular strategy and it would be challenging to formalise them, but usually systems only play the role of the auctioneer, while the bidders are users of the system. 2.3 Context of Bargaining In addition to the outlined characteristics of the bargaining process, the attitude towards bargaining depends on a lot of contextual issues. In some cultures bargaining is an intrinsic part of business and is applied with virtually no limits, whereas in other cultures bargaining follows pre-determined rules. Incorporating bargaining into an ebusiness system has to reflect this spectrum of possible attitudes. That is, all parties involved in a bargaining process act according to a particular personal profile that captures the general attitude towards bargaining, desires and expectations regarding the outcome of the bargaining process, preferences regarding the outcome and the behaviour of the other parties. For instance, if bargaining is offered in an arabic country, the expected latitude with respect to what can be bargained about and how much the result can de- viate from the starting point, etc. must be set rather high. On the other hand, in a European context, bargaining will most likely be limited to rather small margins regarding price discounts, package offers, and preferential customer treatment. Consequently, we also need an extension of the model of user profiles in [11] to capture the attitude towards bargaining. Correspondingly, the bargaining strategy pursued by the system has to be aware of the user profile. This implies that users have to be informed about the bargaining latitude in case this is rather limited. 3 The Co-Design Approach to Web Information Systems If bargaining is to become an integral part of e-business systems, we first need a conceptual model for these systems. We follow the co-design approach [11], but we will only emphasise a compact model that can be used to formalise bargaining. We omit everything that deals with quality criteria, expressiveness and complexity, personalisation, adaptivity, presentation, implementation, etc., i.e. we only look at a rough skeleton of the method. In doing so, we concentrate on the story space, the plot, the views, and the operations on the views. 3.1 Story Spaces On a high level of abstraction we may define each web information system (WIS) - thus also each e-business system - as a set of abstract locations called scenes between which users navigate. Thus, navigation amounts to transitions between scenes. Each such transition is either a simple navigation or results from the execution of an action. In this way we obtain a labelled directed graph with vertices corresponding to scenes and edges to scene transitions. The edges are labelled by action names or skip, the latter one indicating that there is no action, but only a simple navigation. This directed graph is called the story space. Definition 3.1. The story space E consists of a finite set S of scenes, an (optional) start scene so € S, an (optional) set of final scenes F C S, a finite set A of actions, a scene assignment a : A ^ S, i.e. each action a belongs to exactly one scene, and a scene transition relation t C S x S x (AU {skip}), i.e. whenever there is a transition from scene si e S to scene s2 € S, this transition is associated with an action a e A with a(a) = si or with a = skip, in which case it is a navigation without action, and we have (si, S2,a) e t. We write E = (S, so, F, A, a, t). Example 3.1. Take a simple example as illustrated in Figure 2, where the WIS is used for ordering products. In this case we may define four scenes. The scene s0 = product contains product descriptions and allow the user to select products. The scene si = payment will be used to inform the user about payment Figure 2: Story space method options and allow the user to select the appropriate payment method. The scene s2 = address will be used to require information about the shipping address from the user. Finally, scene S3 = confirmation will be used to get the user to confirm the order and the payment and shipping details. There are six actions (their names are sufficient to indicate what they are supposed to do): «i = select_product is defined on so and leads to no transition. a2 = payment_by_card is defined on Si and leads to a transition to scene s2. a3 = payment_by_bank_transfer is defined on s1 and leads to a transition to scene s2. a4 = payment_by_cheque is defined on si and leads to a transition to scene S2. a5 = enter_address and is defined on S2 and leads to a transition to scene S3. ae = confirm_order, is defined on S3 and leads to a transition to scene so. In addition to the story space we need a model of the actors, i.e. user types and roles, and the tasks [11], but for our purposes here we omit this part ot the method. 3.2 Plots With each action we may associate a pre- and a postcondition, both expressed in propositional logic with proposi-tional atoms that describe conditions on the state of the system. In doing so, we may add a more detailed level to the story space describing the flow of action. This can be done using constructors for sequencing, choice, parallelism and iteration in addition to the guards (preconditions) and post-guards (postconditions). Using these constructors, we obtain an algebraic expression describing the flow of action, which we call the plot. In [11] it has been shown that the underlying algebraic structure is the one of a Kleene algebra with tests [5], and the corresponding equational axioms can be exploited to reason about the story space and the plot on a propositional level, in particular for the purpose of personalisation. Definition 3.2. A Kleene algebra (KA) K consists of a carrier-set K containing at least two different elements 0 and 1, a unary operation *, and two binary operations + and ■ such that + and ■ are associative, + is commutative and idempotent with 0 as neutral element, 1 is a neutral element for ■, for all p G K we have p0 = 0p = 0, ■ is distributive over +, p* q ist the least solution x of q + px < x, and qp* is the least solution of q + xp < x, using the partial order x < y = x+y = y for the last two properties. A Kleene algebra with tests (KAT) K consists of a Kleene algebra (K, +, ■, *, 0,1), a subset B C K containing 0 and 1 and closed under + and ■, and a unary operation'on B, such that (B, +, 0,1) forms a Boolean algebra. We write K = (K,B, +, ■, 0,1). Then a plot can be formalised by an expression of a KAT that is defined by the story space, i.e. the actions in A are elements of K, while the propositional atoms become elements of B. Example 3.2. Continue Example 3.1. In this case we can define the plot by the expression (a**(^ia2^2 + a3^3 + a4^4)a5(ae ^5 + 1) + 1)* using the following conditions. Condition = price_in_range expresses that the price of the selected product(s) lies within the range of acceptance of credit card payment. It is a precondition for action a2. Condition = payment_by_credit_card expresses that the user has selected the option to pay by credit card. Analogously, condition = payment_by_bank_transfer expresses that the user has selected the option to pay by bank transfer, and condition ^4 = payment_by_cheque expresses that the user has selected the option to pay by cheque. Condition ^5 = order_confirmed expresses that the user has confirmed the order. 3.3 Media Types On a lower level of abstraction we add data support to each scene in form of a media type, which basically is an extended view on some underlying database schema. Definition 3.3. A media type M consists of a content data type cont(M) that may contain pairs i : M' with a label £ and the name M' of another media type, a defining query qM defining a view on some database schema, a set of operations, a set of hierarchical versions, a cohesion preorder, style options and some other extensions. The database schema, the view formation and the extensions (except operations) are beyond our concern here, so it is sufficient to say that there is a data type associated with each scene such that in each instance of the story space the corresponding value of this type represents the data presented to the user - this is called media object in [11]. In terms of the data support the conditions used in the plot are no longer propositional atoms. They can be refined by conditions that can be evaluated on the media objects. Analogously, the actions of the story space are refined by operations on the underlying database, which by means of the views also change the media objects. For our purposes it is not so much important to see how these operations can be specified. It is sufficient to know their parameters. Example 3.3. Continue Example 3.1. For simplicity, let the content data type of the media type supporting scene s0 be defined as { (product_id, product_name, description, Figure 3: Story space with bargaining action price) }, i.e. we would present a set of products to a user, each of which defined by an id, a name, a description and a price. Then operation ai may take parameters productjd and quantity. The condition from Example 3.2 is to express that the price of the selected products lies within the limit acceptable for credit card payment. If this price limit is a constant L, we obtain the formula src[0,prod o (nprice X nquantity), +] (product select_product) < L. Here we exploit that according to the given plot the action select_product will be executed several times, so we can build a relation with the same name collecting the parameters of all executions. Then we can join this relation with the product relation giving us all selected products including their quantity. The structural recursion operation selects price and quantity of each selected product, multiplies them, and adds them all up, which of course defines the total price. Combining story space, plot and media types, we simply associate with each scene in the story space a data type, replace actions in the story space and the plot by param-eterised operations, and replace conditions in the plot by complex formulae as indicated in Example 3.3. The resulting model will be called the parameterised story space, which will serve us as the basis for formalising bargaining. bargaining is possible, the selection of terms and conditions may become subject to a bargaining process, which will lead to an instantiated loan contract in the database - same as without bargaining. As before, the outcome of the bargaining is different from the one without bargaining, and it is obtained in a completely different way. Therefore, in terms of the story space and the plot there is not much to change. Only some of the actions become bargaining actions. The major change is then the way these bargaining actions are refined by operations on the conceptual level of media types. 4 Bargaining as a Game Let us now look at the specification of bargaining actions in view of the characteristics derived in Section 2. We already remarked that we can consider the bargaining process as a two-player game. Therefore, we want to model bargaining actions as games. There are now two questions that are related with this kind of modelling: 1. What is the ground the game is played on? That is, we merely ask how the game is played, which moves are possible, and how they are represented. This of course has to take care of the history that led to the bargaining situation, the bargaining object, and the constraints on it. 2. How will the players act? This question can only be answered for the system player, while a human player, i.e. a customer, is free in his/her decisions within constraints offered by the system. Nevertheless, we should assume that both sides - if they act reasonably - base their choices on similar grounds. The way players choose their moves will be determined by the order on the set of acceptable outcomes and the bargaining strategy. 3.4 Bargaining Actions In our sales example bargaining could come in at any time, but for simplicity let us assume that bargaining is considered to be part of the confirmation process. That is, instead of (or in addition to) the action confirm_order we may now have an action ar = bargain_order as indicated in Figure 3. As before, the action may have a precondition, e.g. that the total price before bargaining is above a certain threshold, or the user belongs to a distinguished group of customers. If the bargaining action can be chosen, it will still result in a confirmed status of the order, i.e. the bargaining object, in the database. However, the way this outcome is achieved is completely different from the way other actions are executed. We will look into this execution model in the next section. Similarly, in our loan example we find actions se-lect_conditions_and_terms and confirm_loan. Again, if 4.1 Bargaining Games An easy answer to the first question could be to choose playing on the parameterised bargaining object, i.e. to consider instances of the corresponding data type. However, this would limit the possible moves in a way that no reconsideration of previous actions that led to the bargaining situation are possible. Therefore, it is better to play on the parameterised story space that we introduced in the previous section. Each player maintains a set of static constraints on the parameterised story space. These constraints subsume - general constraints to the bargaining as defined by law and policies; - constraints determining the acceptable outcomes of the player; - constraints arising from offers made by the player him/herself - these offers reduce the set of acceptable outcomes; - constraints arising from offers made by the opponent player - these offers may also reduce the set of acceptable outcomes. These constraints give rise to definitions of what a bargaining game is, what a state of such a game is, and which moves are possible in this game. We will now introduce these definitions step by step. Definition 4.1. A bargaining game G consists of a param-eterised story space E, a parameterised plot P, and three sets So, and of static constraints on the parameters in E and P. We write G = (E, P, So, S^, S2). Recall that E results from the story space as defined in Definition 3.1 by assigning a content data type of a media type to each scene, and by replacing the actions by the corresponding parameterised operations. Similarly, P results from a KAT expression as defined in Definition 3.2 by replacing atomic actions by the corresponding param-eterised operations and propositional atoms by the corresponding formulae on the underlying database schema. So formalises legal constraints, while S- formalises the acceptable outcomes for player i (i = 1,2). Example 4.1. Let us look again at our sales example from Example 3.1. Assume that player one is the purchaser. Then a constraint in S1 may be that the total price does not exceed a particular limit, which can be formalised by a formula of the form src[0,prod o (nprice X nquantity), +] (product select_product) x (1 - d) < M. Here d indicates a discount, and M might be a constant. Alternatively, the purchaser may expect a minimum discount depending on the total nominal price. With these constraints each player obtains a set of possible instantiations that are at least acceptable to him/her. The moves of the players just add constraints. This leads to the definitions of states and moves. Definition 4.2. A state of a bargaining game G = (E, P, S0, S1, S2) consists of a partial instance p of P with the last action leading to the bargaining scene, and two sets of S" and S^' of static constraints on the parameters in E and P, such that So U S- U S'/ are satisfiable (i = 1,2). We write s = (p, S'', S^'). Obviously, the initial state of the game is determined by the navigation of the user through the story space before reaching the bargaining state. This defines p, while S'/ and S2 are empty. Example 4.2. In our sales example we may have a partial instance of a plot defined by p = select_product(Ì4,5) select_product(Ì7,2) payment_by_card(... ) en- ter_address(... ), which means that the user selected products with id-s Ì4 and ir with quantities 5 and 2, respectively, then chose payment by credit card - the omitted parameters would contain credit card number, brand, name of the card and expiry date - and finally entered a shipping address - again parameters omitted. This defines the initial state of the bargaining game. At a later stage the purchaser may have indicated to accept a total price m. This would give rise to the constraint src[0,prod o (nprice x ^quantity), +] (product {(i4, 5), (ir, 2)}) x (1 - d) > m in S''. Definition 4.3. A run of a bargaining game G = (E, P, S0, S", S'2) is a sequence s0 ^ s" ^ ••• ^ s^ of states Si = (p-, S'/^, S^'j) satisfying the following properties: - So is the initial state of the game. - pi+i is either equal to some p j with j < i or extends pi. - If i + 1 is odd, then So U S" U S'/i U S2'i must be satisfiable, and S'''(i+1) extends S'/i. - If i + 1 is even, then So U S2 U S2'i U S'/i must be satisfiable, and S 2(i+1) extends S2'i. Each transition from si to si+1 in a run is called a move by player one or two, if i is even or odd, respectively. So a move by a player is done by presenting an offer. For the player him/herself this offer means to indicate that certain outcomes might be acceptable, while better outcomes are not aimed at any more. This includes that a move may manipulate the bargaining object by extending the partial instance of the plot. However, a player may also reject such a change as proposed by the opponent player. In addition, constraints arising from moves will be added to the constraint sets Sj'. For instance, if a seller offers a discount and thus a total price, s/he gives away all outcomes with a higher price. For the opponent player the offer means the same, but the effect on his/her set of acceptable outcomes is different. Moves are only possible as long as the constraints arising from the counteroffers leave the latitude to retain at least one acceptable outcome. If the set of instantiations reduces to a single element, we obtain an agreement. If it reduces to the empty set, the bargaining has failed. Definition 4.4. A run so ^ S" ^ • • • ^ sk is called successful iff So U S" U S2 U S1'i U S2'i is satisfiable, and S'/i U S2'i is maximal with this property. In this case the instance pk of the plot in state Sk is the agreement of the bargaining game. A bargaining game ends with an agreement, or terminates unsuccessfully, if a player cannot continue making a move. In addition to "ordinary" moves we may allow moves that represent "last offers". A last offer is an offer indicating that no better one will be made. For instance, a total price offered by a seller as a last offer implies the constraint that the price can only be higher. However, it does not discard other options that may consist in additional items at a bargain price or priority treatment in the future. Thus, last offers add stronger constraints, which may even result the set of acceptable outcomes to become empty, i.e. failure of the bargaining process. Note that this definition of "last offer" differs from tactical play, where players indicate that the offer made is final without really meaning it. Such tactics provide an open challenge for bargaining systems. 4.2 Bargaining Strategies By making an offer or a last offer, a player makes a move that will result in an acceptable outcome satisfying all constraints arising from counteroffers. In order to make such a choice each player uses a partial order on the set of possible outcomes. Thus, we can model this by a partial order on the set of instances of the parameterised story space. We define it by a logical formula that can be evaluated on pairs of instances. Definition 4.5. For a bargaining game G = (E, P, So, Si, S2) the instances satisfying So U define the set of acceptable outcomes of player i (i = 1,2), denoted as Oi. The preference order of player i is a formula