Informatica An International Journal of Computing and Informatics Special Issue: Intelligent Systems Guest Editors: Costin Badica Maria Ganzha Marcin Paprzycki : 1977 EDITORIAL BOARDS, PUBLISHING COUNCIL Informatica is a journal primarily covering the European computer science and informatics community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international referee-ing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor from the Editorial Board can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the list of referees. Each paper bears the name of the editor who appointed the referees. Each editor can propose new members for the Editorial Board or referees. Editors and referees inactive for a longer period can be automatically replaced. Changes in the Editorial Board are confirmed by the Executive Editors. The coordination necessary is made through the Executive Editors who examine the reviews, sort the accepted articles and maintain appropriate international distribution. The Executive Board is appointed by the Society Informatika. Informatica is partially supported by the Slovenian Ministry of Higher Education, Science and Technology. Each author is guaranteed to receive the reviews of his article. When accepted, publication in Informatica is guaranteed in less than one year after the Executive Editors receive the corrected version of the article. Executive Editor - Editor in Chief Anton P. Železnikar Volariceva 8, Ljubljana, Slovenia s51em@lea.hamradio.si http://lea.hamradio.si/~s51em/ Executive Associate Editor - Managing Editor Matjaž Gams, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 251 93 85 matjaz.gams@ijs.si http://dis.ijs.si/mezi/matjaz.html Executive Associate Editor - Deputy Managing Editor Mitja Luštrek, Jožef Stefan Institute mitja.lustrek@ijs.si Editorial Board Juan Carlos Augusto (Argentina) Costin Badica (Romania) Vladimir Batagelj (Slovenia) Francesco Bergadano (Italy) Marco Botta (Italy) Pavel Brazdil (Portugal) Andrej Brodnik (Slovenia) Ivan Bruha (Canada) Wray Buntine (Finland) Hubert L. Dreyfus (USA) Jozo Dujmovic (USA) Johann Eder (Austria) Vladimir A. Fomichov (Russia) Maria Ganzha (Poland) Janez Grad (Slovenia) Marjan Gušev (Macedonia) Dimitris Kanellopoulos (Greece) Hiroaki Kitano (Japan) Igor Kononenko (Slovenia) Miroslav Kubat (USA) Ante Lauc (Croatia) Jadran Lenarcic (Slovenia) Huan Liu (USA) Suzana Loskovska (Macedonia) Ramon L. de Mantras (Spain) Angelo Montanari (Italy) Pavol Nävrat (Slovakia) Jerzy R. Nawrocki (Poland) Nadja Nedjah (Brasil) Franc Novak (Slovenia) Marcin Paprzycki (USA/Poland) Gert S. Pedersen (Denmark) Ivana Podnar Žarko (Croatia) Karl H. Pribram (USA) Luc De Raedt (Belgium) Dejan Rakovic (Serbia) Jean Ramaekers (Belgium) Wilhelm Rossak (Germany) Ivan Rozman (Slovenia) Sugata Sanyal (India) Walter Schempp (Germany) Johannes Schwinn (Germany) Zhongzhi Shi (China) Oliviero Stock (Italy) Robert Trappl (Austria) Terry Winograd (USA) Stefan Wrobel (Germany) Konrad Wrona (France) Xindong Wu (USA) Executive Associate Editor - Technical Editor Drago Torkar, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 251 93 85 drago.torkar@ijs.si 80th birthday of Prof. Anton P. Železnikar - congratulations to the founder of Informatica This year passes by 80 years from the birth of Prof. Železnikar, one of the pioneers ^ of informatics and informational theory in Slovenia, as well as one of the initiators of M Slovenian society Informatika, the founder and the Editor-in-chief of the journal ™ Informatica which celebrates this year the 32°'' years of uninterrupted publication. We are using this opportunity to present some excerpts from his CV. Anton Pavel Železnikar was born in Slovenj Gradec, Slovenia on June 8, 1928, as the seventh child of Vinko Železnikar (1887), MD, and Pavla, (1898, born Rogina), a teacher. His father was an innovative surgeon and the head physician of the hospital in Slovenj Gradec. There were ten children in the family: two half-brothers, two half-sisters, three brothers, and two sisters. In 1933, before the primary school, he began to learn German and piano. After five classes of the primary school in Slovenj Gradec (1934-1939) he became a student of a real-gymnasium (special type of a classical secondary school) in Maribor (1939-1941). During German occupation, first, he visited the third and the fourth class of Hauptschule in Slovenj Gradec, and then the fifth and the sixth class of the Tegetthoff Gymnasium in Maribor. The schooling in the German gymnasium was decisive for his later professional and value orientation. Majority of the teachers of the gymnasium held the doctor of science degree, and did excellent teaching in German, English, mathematics, and also in Latin. In May, 1945, he was called in the State Security Troops of the National Liberation Army, serving as a soldier up to February, 1946. The experience in this service helped him develop a strong pro-human stand, based on realizing the most perverted human values, especially the cynicism of the then leading communists. In 1946, he finished the so-called nostrification exams for the gymnasium classes three to six. Afterwards he decided to study classes seven, eight, and gymnasium leaving exam (matura in Slovenian language) on a private basis. In 1948, he finally passed the exams and matura. In 1948, he became a student of the Technical Faculty of the University in Ljubljana and, later, of the Electronic Department (altogether 10 semesters). His distinguished teachers at that time were J. Plemelj (mathematics), A. Peterlin (physics), and V. Koželj (theoretical electro engineering). He defended his diploma work in 1956, entitled "Magnetostrictive memory loop", being a part of the amplitude analyzer. From 1955 to 1980 he was employed at Jozef Stefan Institute, in the Electronics Department. His work was oriented into the then emerging digital engineering using vacuum tubes and transistors. On this path he became aware of the significance of the modern technology, proceeding deeply into the sophisticated computer and software engineering and research. Within this period, he received his M. Sc and Ph. D in Electrical Engineering from the University of Ljubljana in 1966 and 1967, respectively. From 1961 to 1978 he served as the head of the Digital Engineering Department and from 1968 to 1978 also as a head of Electronics division of Josef Stefan Institute. In 1968, he became the assistant professor and in 1972 the associated professor at the University of Ljubljana. In 1980 he moved to a fast growing Slovenian company Iskra-Delta Computers where he stayed until his retirement in 1990. Within the company he held the position of the head of Microcomputer lab between the years 1980 and 1982. In 1982 he was promoted to a position of the advisor of the general manager of Iskra-Delta Computers and a member of the research and development strategy board of Iskra Corporation where he stayed until the end of his work career. In his long and fruitful scientific career he published over 100 scientific and research papers in four languages and 2 books. Most of his life was devoted to the research of the informational theory, including philosophy of the informational, theory of informational phenomenalism, informational machines and informational operating systems, informational investigations in literature, media, communication by theory and machines, informational theory of consciousness and informational entity programming. Prof. Železnikar was awarded many times for his work from the National science foundation and from different professional associations from ex-Yugoslavia between the years 1968 - 1990. He is a member of New York academy of science, International academy of science San Marino, Cybernetic Academy ""Stefan Odobleja'', Lugano and International Association for Cybernetics, Belgium. In 1971, he co-organized and was the program committee member of the IFIP '71 Congress in Ljubljana, one of the biggest congresses in the area of electrical and computer science organized ever in ex-Yugoslavia. In the prime of his life, Prof. Železnikar is still following the progress in information sciences and continues his work. We wish him a happy anniversary and plenty more years among us. Niko Schlamberger, president of the Slovenian society Informatika Editor's Introduction to the Special Issue Intelligent Systems The area of intelligent systems is rapidly developing and reaching in multiple directions. In this mini-special issue we present four papers that show some of the area that current intelligent systems enclose. The first paper, entitled "The Cross-Entropy Method for policy search in Decentralized POMDPs," was written by Frans A. Oliehoek, Julian F.P. Kooij, and Nikos Vlassis. It concerns possible approaches to multiagent planning under uncertainty. One of possible methods is utilization of Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). Unfortunately, solving a Dec-POMDP exactly is an intractable combinatorial optimization problem. In the paper, a new Cross-Entropy-based method is applied to the problem. It operates by sampling pure policies from an appropriately parameterized stochastic policy, and then evaluates them to define the next stochastic policy to sample from. This process is repeated until convergence is reached. Experimental results demonstrate that the proposed method can efficiently search large solution spaces. In the second paper, "On the Compilation of Programs into their Equivalent Constraint Representation," Franz Wotawa and Mihai Nica present a novel approach to program analyzing and debugging. They convert programs into their loop-free equivalents and, next, into the static single assignment form. This allows them to derive a corresponding constraint satisfaction problem, which can be used directly for debugging. Specifically, they utilize the hyper-tree representation of the constraint satisfaction problem for debugging, while the width of the hyper-tree is an indicator of the debugging complexity. The next paper, "Automatic Streaming Processing of XSLT Transformations Based on Tree Transducers," written by Jana Dvorakova, recognizes the fact that XML streaming has become an ubiquitous mode for information exchange over the Internet, and deals with an important issue of transforming large XML documents or XML data streams. Specifically, an automatic streaming processor for XSLT transformations, which guarantees bounds on resource usage, is presented. In the proposed approach, resource bounding is achieved by employing tree transducers associated with a set of streaming algorithms. The input XSLT stylesheet is analyzed in order to identify its transformation type, which in turn allows application of the lowest resource-consuming streaming algorithm. Finally, Oana Nicolae, Adrian Giurca and Gerd Wagner, in their paper "On Interchange between Drools and Jess," approach IT and business solutions based on business rules and consider challenges related to the target Platform Specific Implementation Model. The paper discusses an example of the business rules translation from a particular object oriented rule-system (Drools), to another rule-system coming from the AI area (Jess), using the R2ML as the interchange language. The proposed transformation preserves the semantic equivalence for a given rule set, taking also into account the rules vocabulary. Papers presented here are based on those that were presented at the first edition of the International Symposium on Intelligent and Distributed Computing (IDC'2007), which took place on October 18-20, 2007 in Craiova, Romania. Overall, out of 33 full papers published in Symposium Proceedings we have selected six. However, after further refereeing we have eliminated two of them and left four best contributions. Finally, we would like to thank all of our colleagues, who acted as referees of the papers and made sure that material presented here is of highest quality. Costin Badica University of Craiova, Romania Maria Ganzha and Marcin Paprzycki Systems Research Institute Polish Academy of Sciences, Warsaw, Poland The Cross-Entropy Method for Policy Search in Decentralized POMDPs Frans A. Oliehoek and Julian F.P. Kooij Intelligent Systems Lab, University of Amsterdam Amsterdam, The Netherlands E-mail: {faolieho,jkooij}@science.uva.nl, www.science.uva.nl/~faolieho Nikos Vlassis Dept. of Production Engineering and Management Technical University of Crete Greece E-mail: vlassis@dpem.tuc.gr, http://www.dpem.tuc.gr/vlassis Keywords: multiagent planning, decentralized POMDPs, combinatorial optimization Received: March 15, 2008 DecentralizedPOMDPs (Dec-POMDPs) are becomingincreasinglypopularas models formultiagentplan-ning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an ap-propriatelyparametrized stochastic policy and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimizati on methods can bring leverage to the approximate solution of Dec-POMDPs. Povzetek: Prispevek opisuje novo metodo multiagentnega nač:rtovanja. 1 Introduction ning phase we try to find a policy for a fixed number of time steps for each agent. The profile of all individual policies is The construction of intelligent agents is one of the major collectively called a joint policy. The policies should be se-goals in Artificial Intelligence. In the last two decades more lected such that, when they are executed jointly during the and more research has concentrated on systems with mul- on-line execution phase, the resulting behavior is (near-) tiple intelligent agents, or multiagent systems (MASs). In optimal according to a predefined performance measure. this article we focus on the recently proposed model of de- In the single-agent (i.e, POMDP) case, the history of ob-centralized partially observable Markov decision processes servations results in a 'belief' (a probability distribution) (Dec-POMDPs) (7). A Dec-POMDP is a generalization over states, which forms a sufficient statistic for the history to multiple agents of the well-known POMDP model for and can therefore be used to base the action choice on (19). single-agent planning under uncertainty (19). In the multiagent case, however, no such simplification is The Dec-POMDP model presents a decision theoretic possible, and the agents have to base their actions on their formalization of multiagent planning under uncertainty. It entire history. This means that the number of possible demodels a team of cooperative agents that individually re- terministic policies for a single agent is finite, but grows ceive observations of their environment and cannot com- doubly exponentially with the planning horizon. Planning municate. It is a very general model which has for instance for a Dec-POMDP requires finding a profile of such poli-been applied in the context of cooperative robotics (5; 14), cies, one for each agent, which defines a very challenging communication networks (30), and sensor networks (26). combinatorial optimization problem. TheDec-POMDPdoesnotexplicitlyconsidercommunica- In this paper we examine the Cross-Entropy (CE) tion, but communication actions can be modeled as regular method, a recently introduced and promising method for actions. Also, there are extensions that do explicitly incor- combinatorial optimization problems (12) and its applica-porate communication (32), which can typically be used to tion to Dec-POMDPs. The CE method provides a frame-share observations. work for finding approximate solutions to combinatorial In this paper we focus on the regular Dec-POMDP set- problems, and has gained popularity for various applica-ting in which the agents have to select their action based on tions (24; 9; 1; 11), due to its ability to find near-optimal their private observation histories. During an off-line plan- solutions in huge search spaces. Because of the combinato- rial nature of the decentralized setting, the CE method may be a valuable tool in many algorithms for Dec-POMDPs. In this paper we show how the CE-method can be applied in the setting of Dec-POMDPs using a relatively direct translation of the general CE optimization method. This, however, is not entirely trivial, so we discuss the difficulties and propose solutions. Our primary goal in this paper is not to provide a state-of-the art method for solving Dec-POMDPs. In fact, there are some algorithms which are more advanced than the one we present here (14; 35; 34; 3; 28). Rather, we show how the CE method can be use to tackle the combinatorial nature of the problem. This will allow for improvement of existing algorithms in two related ways. First, existing methods may rely on exhaustive evaluation of restricted sets of (partial) policies. The CE method may help to speed up these parts of algorithms. Second, such speed-up may allow the consideration of less restricted sets of policies, increasing the accuracy of the method. 1.1 Related work In the last decade, planning for multiagent systems under uncertainty has received considerable attention. The Dec-POMDP model has been introduced by Bernstein et al. (6, 7), who also proved that finding a solution for a finite horizon Dec-POMDP is NEXP-complete. The infinite horizon problem is undecidable, as it is for the single agent case (i.e., solving a regular POMDP) (23). Pynadath and Tambe (32) introduced the multiagent team decision problem (MTDP), an equivalent model to the Dec-POMDP and a communicative extension, the COM-MTDP. There are three main algorithms for the optimal solution of finite-horizon Dec-POMDPs apart from brute-force policy search. Hansen et al. (17) proposed dynamic programming (DP) for Dec-POMDPs, an algorithm that works by incrementally constructing policies for longer horizons. The second is multiagent A* (MAA*), a form of heuristic search (38). Finally Aras et al. (4) recently proposed a method based on mixed integer linear programming. Due to the complexity results for Dec-POMDPs, however, the focus of much research has shifted to two other topics. First, people have tried to identify special instances of the problem that are less hard. For instance, a lot of research has focused on the case where there are special assumptions on observability and/or communication (10; 16; 21; 33; 36). Becker et al. (5) considered the special case of Dec-POMDPs were agents have their local state spaces and transitions and observations are independent. Similar assumptions are made by Nair et al. (26); Kim et al. (20); Varakantham et al. (39) who exploit resulting locality of interaction in the context of sensor networks. Other special cases are identified by Goldman and Zilberstein (15). Second, research has focused on methods that find approximate solutions. For example, joint equilibrium search for policies (JESP) is a method that settles for a local optimum (25). Emery-Montemerlo et al. (13) proposed to con- struct a policy by approximating the Dec-POMDP with a series of compressed Bayesian games (BGs). Both Szer and Charpillet (37) and Seuken and Zilberstein (35, 34) proposed approximate extensions of DP for Dec-POMDPs. Except for JESP, all these approximate methods in one way or another restrict the search space to provide leverage. This reduced search space is then searched exhaustively to find the best approximate solution. In this work, rather than searching a constrained policy space exhaustively, we examine ways to search the entire space approximately. In particular, we show how the CE method can be used to directly search spaces of up to 10243 joint policies fairly effectively. 1.2 Overview of article Section 2 provides a background on decentralized decision making and formally introduces the Dec-POMDP framework. In section 3 we treat the background of the CE method. Section 4 introduces direct CE policy search for Dec-POMDPs, detailing how we apply the CE method to Dec-POMDPs. Section 5 introduces an extension of our algorithm with approximate evaluation. In section 6 we give an experimental evaluation and finally section 7 concludes and discusses future work. 2 Decentralized POMDPs Here we provide a formal background of decentralized POMDPs. We start by introducing the decentralized tiger (Dec-Tiger) problem, which is a standard benchmark. Next, we introduce the formal Dec-POMDP model. Finally, we formalize histories, policies and the optimality criterion and we discuss naive (i.e., brute force) policy search. 2.1 The decentralized tiger problem In the Dec-Tiger problem, two agents standing in a hallway are confronted with two doors. Behind one door lies a treasure, while behind the other there is a tiger. As such, there are two possible states the world can be in: s;, the tiger is behind the left door, and sr the tiger is behind the right door. Initially, these states have equal probability. The goal is to open the door that holds the treasure, which would result in a positive reward. But if at least one of them opens the other (wrong) door they receive a large penalty. Both agents can take three actions, namely OpenLeft (aol) and OpenRight (oor) to open the left or right door, and Listen (ali) to try to observe carefully behind what door the tiger growls. The two possible observations HearLeft (ohl) and HearRight (ohr) indicate whether the agent heard the tiger behind the left or right door respectively. Observations are received at every time step but are random and uninformative unless both agents performed Listen, in which case they hear the tiger behind the correct door with high probability (each agent has a 85% chance of getting the correct observation). The action Listen has a small cost associated with it, but can decrease the agents' uncertainty about the state (which remains unchanged). Furthermore, the rewards are specified such that it is always beneficial for the agents to act jointly: it is better to open the wrong door jointly than doing it alone. After one or both agents has opened a door, rewards are given and the situation is reset to a random state. More details on the Dec-Tiger problem are provided by Nair et al. (25). Ohl 2.2 The formal model The decentralized partially observable Markov decision process (Dec-POMDP) describes a stochastic, partially observable environment for a set of cooperating agents. Definition 2.1. A Dec-POMDP is a tuple {Ag, S, A, T, R, O, O) where: - Ag = {1,...,n} is the set of agents. - S is a finite set of states. - The set A = XjAj is the set of joint actions, where Ai is the set of actions available to agent i. Every time step one joint action a = {ai,..., a„) is taken.1 - T is the transition function, a mapping from states and joint actions to probability distributions over next states: T : S xA^V(S).2 - R is the reward function, a mapping from states and joint actions to real numbers: R : S x A ^ R. - O = xiOi is the set of joint observations, with Oi the set of observations available to agent i. Every time step one joint observation o = {o1,..., o„) is received. - O is the observation function, a mapping from joint actions and successor states to probability distributions over joint observations: O : A x S ^ V (O). A Dec-POMDP is considered at a number of discrete time steps, or stages, t. At every such stage each agent i takes an individual action ai. As a result of the taken joint action a, the state then stochastically transitions from s to a new state s' according to T. At that point, the environment emits a joint observation o with probability P(o|a, s'), as specified by O. From this o each agent i observes its individual component oi, selects a new action, etc. In this paper we are searching for plans that specify actions for a fixed number of stages h. That is, we assume a finite planning horizon of h time steps. Furthermore, we assume that there is a distribution over states at the initial stage t = 0, also called the initial 'belief' b0 G V(S).3 1 Unless stated otherwise, subscripts denote agent indices. 2We use P(X) to denote the infinite set of probability distributions over the finite set X. 3 Unless stated otherwise, superscripts denote time indices. Figure 1: A part of an arbitrary joint policy for Dec-TIGER. Left shows to policy for agent 1, right for agent 2. The figure shows the actions taken at stages 0, 1 and shows the observations received at stages 1, 2. 2.3 Histories and policies In a Dec-POMDP, an agent i only knows its own taken actions ai and observations oi. As a result it has to base its actions on the histories of those. Therefore, before we formalize the notion of a policy, we first formalize these histories. Definition 2.2. The action-observation history for agent i is the sequence of actions taken and observations received by agent i until time step t: ßii = (a0,oi,ai,...,at-1,oO . (2.1) The joint action-observation history is a tuple with the action-observation history for all agents d * = {Of,...,). The set of all action-observation histories for agent i at time t is denoted ©i. Definition 2.3. The observation history for agent i is the sequence of observations an agent has received: = (oi,...,ot) (2.2) of denotes a joint observation history and Oi denotes the set of all observation histories for agent i. Now we are ready to formalize policies. Definition 2.4 (deterministic policy). A pure or deterministic policy ni for agent i in a Dec-POMDP is a mapping from observation histories to actions, ni : Oi ^ Aj. A pure joint policy n = {n1 ... n„) is a tuple containing a pure policy for each agent. n is the set of all pure joint policies. One could expect that the most general definition of a policy is as a mapping from action-observation histories to actions. This is indeed the case for stochastic policies where an action is chosen at each action-observation history with a particular probability. For a deterministic policy, however, this is unnecessary because an agent can infer the actions it took from its observation history. A pure policy can be visualized as a tree, as is illustrated in Figure 1, which shows a joint policy for the decentralized tiger problem. In this figure, each node marks a point where an agent has to decide upon an action. A path leading to such a node defines the observation history, and the depth corresponds to the stage at which the decision should be taken. A pure policy assigns to each node one action from Ai, thereby defining what action the agent should take given some observation history. Solving a Dec-POMDP amounts to finding an optimal joint policy n* such that when the agents act accordingly, their expected shared reward is maximal. The quantity that we want the agents to maximize is the expected cumulative reward: = arg max En A-i \ ER(S' a) \t=0 (2.3) / Bernstein et al. (7) have shown that optimally solving a Dec-POMDP is NEXP-complete, implying that any optimal algorithm will be doubly exponential in the horizon. This becomes apparent when realizing that the number of pure joint policies is: O / L \ / \ "1 (2.4) O V |O|- 1 (2.7) / 3 Cross-entropy optimization de Boer, Kroese, Mannor, and Rubinstein (12) described the Cross-Entropy (CE) method as a general framework to both rare event estimation and combinatorial optimization. We will focus only on the application to optimization. In particular, it has been illustrated how the CE method can be adapted to find good policies for a particular class of Markov Decision Processes (MDPs) (24; 12). In this section, we first provide a brief introduction to the CE method for optimization, followed by a description of the mentioned application to MDPs. 3.1 General CE Optimization The cross entropy method can be used for optimization in cases where we want to find a—typically large—vector x from a hypothesis space X that maximizes some performance function V : X ^ R. That is, when we are looking for x* = arg max V(x). (3.1) xeX The CE method associates an estimation problem with this optimization. It maintains a probability distribution f^ over the hypothesis space, parametrized by a vector In particular, CE estimates the probability that the performance V(x) of a sample x drawn according to f^ is higher than some Y : where |A*| and |O*| denote the largest individual action and observation sets. The naive way of going about is to enumerate each of these joint policies and evaluate their expected cumulative reward, or value. The value of a specific (state, joint observation history) pair under a joint policy n is given by: Vn(st,ot) = R(st,n(dt)) + ^ P(st+1|st,^(^t)) st+l ^ P(ot+1|st+1,n(ot))Vn (st+1,ot+1) (2.5) ot+1eO where o t+1 is the new joint observation history at stage t +1: o t+1 = (ot, ot+1). The total expected reward V(n), with respect to the initial state distribution b0 is then given by V(n)^ Vn(s,o 0)b0(s), (2.6) s where o 0 is the initial (empty) joint observation history. For one joint policy this calculation requires evaluation of (2.5) for each of the ^fjo \0\' = joint observation histories and |S| states, leading to a total cost of: P«(V(x) > YI(V(x),7)f5(x), (3.2) xGX where I(V(x), y) is an indicator function: I (V (x),Y) = [1 ,V(x) > Y 10 ,V(x) y*) is most likely very small under a uniform f^, which explains the link to rare event estimation. The core of the CE method is an iterative two-phase process: 1. Generate a set of samples X according to f^. 2. Select the best Nb of samples X5, and use those to update the parameter vector The first step is rather trivial. The second step, however, deserves some explanation. Let Y(j) be defined as the minimum performance within the selected set of samples of the j-th iteration. I.e., Y= min V(x). xeXb (3.4) The CE method requires that this lower bound performance is not allowed to decrease over time: Y(j+1) > Y(j). This implies that Xb can contain less than Nb samples because VxeXb V(x) > Y (3.5) 4The number of best samples is often characterized as a fraction 0 < p < 1 of the set of samples X. I.e., Nb = round(p • N). * n n is a hard requirement. The set X5 is then used to create , a maximum-likelihood estimate of the parameters. These new parameters can be smoothed using a learning rate 0 < a < 1 by interpolating with the parameter vector of the previous iteration: ^(j+1) = ae(j+1) + (1 - a)e(j). (3.6) This reduces the probability that some components of the parameter vector will be 0 or 1 early in the CE process, which could cause the method to get stuck in local optima. Usually, the iterative process is stopped when j(j) has not improved over some predefined number of steps. But other conditions such as a time limit or a fixed number of iterations can be used. When the stop condition is finally met, the best sample x found in the entire process is returned as an approximation of x*. 3.2 The CE Method for MDPs In their work, Mannor et al. (24) show how the CE method can be applied to shortest paths MDPs, a class of MDPs for which the optimal value function is stationary, i.e., the expected value of taking a particular action in a particular state is not dependent on the stage. The optimal policy for such a MDP is a mapping from states to actions nMDP : S ^ A, which can be represented as an |S|-vector. As in section 3.1, the goal is to find the vector that maximizes a performance function, in this case the expected total reward. So rewriting (3.1), we are looking for ^Mdp = arg max V (^mdp ), (3.7) 4 Direct CE policy search for Dec-POMDPs In this section we propose an adaptation of the CE method for Dec-POMDP policy search, which we dub direct CE (Dice) policy search for Dec-POMDPs because it directly searches the space of joint policies withouth first restricting it. dice is a direct translation of the ideas presented in the previous section. Still, there are some problems when trying to apply the procedure as outline in the previous section to Dec-POMDPs: First, because we consider finite horizon Dec-POMDPs, there is no stationary value function. Second, the policies of the agents are not defined over states, but over their individual observation histories o/, and these are not a Markovian signal. Third, there is no clear way to sample traces and use those to update the distribution. In the Dec-POMDP case, the hypothesis space is the space of deterministic joint policies n. In order to apply the CE method, we are required to define a distribution over this space and an evaluation function for sampled policies. Also, we show how the distribution can be adapted using the best policies found in each iteration. First, we will present two methods to define the distribution over the joint policy space. After that we describe how the parameter updates are performed. Finally, a we give a summary and complexity analysis of the DICE algorithm. 4.1 Policy distributions In the case of Dec-POMDPs f^ denotes a probability distribution over pure joint policies, parametrized by We will represent this probability as the product of probability distributions over individual pure joint policies: where the performance function now is the value of the MDP-policy nMDp. The CE method tackles this problem by maintaining a parameter vector ^ = ,..., ), where each ^s is a probability distribution over actions. Using these probabilities it is possible to sample N trajectories: starting from some start state actions are randomly selected according to the probabilities as described by ^ until the goal state is reached. Using the N5 best (highest total reward) trajectories X5, the parameter vector can be updated as follows: P (a|s) = Exex, g, a) (3.8) where I(x, s, a) is an indicator function that indicates that action a was performed at state s in trajectory x, and I(x, s) indicates whether s was visited in trajectory x. After updating the parameter vector a new set X of trajectories can be sampled, etc. Empirical evaluation shows that this process converges to (near-) optimal policy in only a few iterations (24). f«(n)^ f«i (ni) (4.1) i=i Here is the vector of parameters for agent i, i.e., ^ = (Cl ,...,en). The question is how to represent the probability distributions over individual pure policies. One clear solution is to enumerate all the pure policies for an agent i and to maintain an explicit discrete probability distribution over this set of policies. I.e., the distribution is represented as a mixed policy (31). However, this approach suffers from two drawbacks. First, the number of pure individual policies ni might be huge, making such an explicit distribution impractical to represent. Second, this representation is hard to parametrize in a meaningful way using some vector Ci, as it gives no access to the internals of the policies: parameters would specify probabilities for entire pure policies, rather than specifying behavior for particular observation histories as in figure 1. Therefore, rather then using a mixed policy representation, we will use a behavioral- (31) or stochastic policy (22) description: a mapping from decision points to probability Figure 2: A part of a stochastic policy for an agent of a fictitious Dec-POMDP. distributions over actions. Note that this is similar to the approach for MDPs, as described in section 3.2, were the states are the decision points. 4.1.1 Observation history based The simplest way to represent a policy distribution is to make a direct translation from CE for MDPs: instead of maintaining a simple probability distribution over actions for each state, we now maintain one for each observation history (OH). Figure 2 illustrates such a representation of the policy distribution. It shows that for each observation history o/ a parameter , that specifies the distribution over actions, is maintained: Vai Cot (ai) = P{ai\oi). (4.2) /«i (ni)^ n Cot (ni(ol)). (4.3) oiieo i We refer to this policy distribution representation as the OH-based representation. 4.1.2 Action-observation history based Defining the parameters as in section 4.1.1 is the most straightforward approach, but does not take into account the action history: the choice for action ni (o/) has no influence on the choice for the action at the next time step ni(oit+1). As explained in section 2, in general a stochastic policy does take into account the taken action. However, we know that there is at least one deterministic joint policy for a Dec-POMDP (which consists of individual policies that are a mapping from observation histories to actions). Moreover, in previous research we did investigate an action-observation history (AOH) based representation, but the influence appeared to be minor (27). Therefore we will not consider this further in this work. 4.2 Sampling and Evaluation Unlike the MDP case, in the setting of Dec-POMDPs there is no trivial way to sample some trajectories given the joint policy distribution /« and use that to update the distribution. Rather we propose to sample complete joint policies and use those for the parameter update. Selecting a random sample of joint policies from the distribution is straightforward. For all the observation histories o/ of an agent i an action can be sampled from action distribution C0t. The result of this process is a deterministic policy for agent i. Repeating this procedure for each agent samples a deterministic joint policy. The evaluation of a joint policy can be done using (2.6). 4.3 Parameter update We described how to represent and sample from the probability distribution over policies. This section describes how the set of best policies X5 sampled from the previous distribution /«(j), can be used to find new parameters . Let I(ni, o*/, a) be an indicator function that indicates whether ni(oit ) = a. In the OH-based distribution the probability of agent i taking action a* G A4 after having observed o/ can be re-estimated as: neXb Note that the distribution differs only from a pure policy (Figure 1) by keeping distributions over actions instead of a single action at the nodes of the tree. Consequently, the parameter vector for agent i is defined as Ci = {Co* )otc(5 , and the probability of a particular policy ni for agent i as where \X5\ normalizes the distribution since Vo/ E E I (ni ,o'/,a) = IXb\. aeAi neXb (4.4) (4.5) Note that the thus computed new parameter vector C^^^^^ will afterward be smoothed using the learning rate a according to (3.6). 4.4 Summary and complexity analysis Algorithm 1 summarizes the DICE policy search method. To start it needs I, the number of iterations, N, the number of samples taken at each iteration, N5, the number of samples used to update C, and a, the learning rate. The outer loop of lines 3-17 covers one iteration. The inner loop of lines 5-13 covers sampling and evaluating one joint policy. Lines 14-16 perform the parameter update. Because the CE method can get stuck in local optima, one typically performs a number of restarts. We have not incorporated these in the algorithm itself, however. Now we consider the complexity of this algorithm. For each iteration we draw N joint policies. The sampling of Vb <--TO initialize {typically uniform random} for i ^ 0 to I do X for s ^ 0 to N do sample n from X ^ X U{n} V(n) ^ Evaluate(n) if V(n) > Vb then Vb ^ V(n) nb ^ n end if end for Xb ^ the set of Nb best n G X Compute {using (4.4) } ^(i+1) ^ a^(i+1) + (1 - a)^(i) end for return nb such a joint policy involves, for each agent, selecting an action for each of its observation histories and has complexity value by simulating r episodes, or traces, and using the average of outcomes as an estimate V/(n) for the actual value V(n). We will refer to the resulting method as Dice policy search with approximate evaluation (DICE-A). Clearly, this approximation might introduce errors. Notice, however, that the CE method does not discriminate between policies within the set Xb of best samples. Therefore, as long as the relative ordering is preserved, the same policies are used to update the policy distribution, yielding the same results. In fact, only when the ranking of policies is disturbed near the cut-off threshold (around the Nb-th joint policy), will approximate evaluation influence the distribution updating process. There is a second potential source of error, though. When the fraction of best samples Xb is used to update 7 using (3.4), the new 7 might in fact be an over-estimation. This could make it very difficult to sample new instances with a higher (approximate) value. In previous work, we therefore also considered a version of our algorithm that did not use the hard threshold 7, but rather always used the best Nb samples (27). The results, however, did not show a significant improvement, nor did we encounter any such difference in further experiments we performed. Therefore we will not consider this further in this paper. O(n •|A*|-|(O,|) = O n -lA* V lai -1 where * denotes the agent index with the largest observation and action set respectively. The complexity of updating ^ is similar, but includes the term Nb (rather then being performed N times). Evaluation (i.e., computing the total expected reward) of a policy V(n), is performed by evaluating equation (2.6) and (2.5) from the last stage h — 1 up to the first 0. The complexity of this calculation for a single pure joint policy scales exponentially with the planning horizon, as explained in section 2. Because |O| = O(|O* |") , we can rewrite (2.7) as O iar-1 |S| \ / 5.1 Complexity Simulating one trace of a joint policy involves looking up the actions for each of the n agents and sampling one of |S| successor states and one of |O| = O(|O* |") joint observations at each of the h stages. Such a simulation is performed r times, so the time complexity of performing approximate evaluation of a single joint policy is: O (r • h • n •|O*|" •|S|) . (5.1) DICE-A performs such an evaluation for each of the N sampled policies in each of the I iterations. Therefore, total time complexity of DICE with approximate evaluation is given by This latter term typically dominates the complexity of sampling and updating therefore the total time complexity of DICE is given by O (I • N • r • h • n • |O*|" •S), (5.2) as long as approximate evaluation dominates the time needed to sample a policy and update the parameter vec- O I • N • |O* |"h — 1 |S| V ' |O*|" — 1 J\ 5 Approximate Evaluation The complexity analysis showed that the time required for DICE scales exponentially with the planning horizon, and that policy evaluation forms the bottleneck of our algorithm. In order to alleviate this bottleneck, we examine the application of approximate evaluation. The idea is that rather than computing the expected value, we sample this 5.2 Error bounds The estimated value V/ (n) is only an approximation of the true value V(n). However, we are able to establish bounds on this error. In particular, we know that V(n) is bounded when the immediate reward function is bounded. Let us write Rmi„, Rmax for the lower and upper bound of the reward function, that is, Vs,a R(s, a) G [Rmin, Rmax]. Then the value of a policy is bounded by V(n) G [hRmin,hRmax]. Wassily Hoeffding (18) proved that the probability that the sum of r independent random variables X1,..., , each bounded Xi G [ai ,6i], exceeds the expectation (of the sum) with re or more is bounded by P ((Xi + ... + Xr) - E [Xi + ... + Xr] > re) < / ,.9 9 \ exp 2r9e9 \ Er=i (bi- ai) / for any e > 0. In our setting, each Xi denotes the outcome of the simulation of the i-th episode, and is an unbiased estimate of V(n). Also, in our setting we are interested in a two-sided error bound, and all Xi respect the same bound Xi G [hRmin,hRmax]. Therefore we can write P (|(Xi + ... + Xr) - E [Xi + ... + Xr]| > re) =P =P =P (Xi + ... + Xr) E [Xi + ... + Xr] >e i=i y(n) - V(n) / >e <2exp >e 2r9e9 y r (hRmax - hRmin) / Using this result we can control the lower bound on the probability that an error of size less than e is made. Suppose we want to approximate V(n) with accuracy e with at least probability I.e., S = [P(error < e)J is the lower bound on the probability of an error smaller than e, yielding P (error > e) = 1 - P (error < e) P (error > e) < 1 - [P (error < e)J = 1 - S. Then we must have that / P E i=i Xi - E(X ) \ >e < 2exp / 2r2e2 y r ( hRmax — hRmin) y < 1 - S. Solving the right-hand side for r goes as follows: 2r2e r (b - a) 2 — < In fl-6 \ 2 / 2re9 > - (b - a)9 In 1 V 2 y r> in. 2 with 2e2 1 - S (b - a)9 = h9 (Rmax - Rmin)9 So, to guarantee an error smaller than e (with probability S), the required number of traces grows only quadratically with the horizon. 5.3 Post-evaluation When using DICE with approximate evaluation, the end result is a joint policy and its estimated value. In order to know the true quality of the joint policy, an exact evaluation can be started at this point. However, due to the exponential complexity of such an exact evaluation, this is not always feasible. In settings where it is not, we propose to do a more accurate sample based estimation of the value of the joint policy. Of course, it may happen that the new exact (or more accurately determined) value of the joint policy is less than the previous estimate, and perhaps also less than the estimated value of other joint policies encountered during dice. To prevent this, one may keep a list of the best k joint policies encountered. At the end of the procedure, one can then exactly evaluate all these k joint policies and select the best one. Alternatively, the CE process can be augmented with one additional iteration, where all sampled policies are evaluated exactly (or more accurately). 6 Experiments Here we give an empirical evaluation of the proposed algorithm. The implementation of DICE follows the description without any further remarks. In our DICE-A implementation we have not implemented an additional evaluation iteration or list of k best policies as suggested. We only apply post-evaluation to the best ranked joint policy. This postevaluation consists of a more accurate evaluation of 20,000 runs when the value function consists of more than 20, 000 (s, Ö)-pairs, and exact evaluation otherwise. First we examine the influence of all parameters of the CE optimization procedure. Next, we briefly discuss some benchmark problems and investigate the performance on these benchmark problems of different horizons and compare it to dynamic programming JESP. Finally, we investigate how dice scales with respect to the number of agents, again comparing to JESP. 6.1 Different CE parameters The CE method has quite a few configurable parameters: the learning rate a, the number of iterations I, the number of samples drawn per iteration N, the fraction of best samples kept for update p and the induced number of samples used for this update Nb = p • N. We have empirically investigated different settings of these parameters for DICE policy search. Additionally, for DICE-A we investigate the parameter r for the number approximation simulations. The results reported in this section are all obtained on the Dec-Tiger problem. The non-varying parameters in these experiments were set as follows a = 0.2, I = 30, N = 50, Nb = 5. First we examine the influence of the learning rate a. Figure 3 shows that low a (0.2) results in low variance, but if too low (a = 0.1) the CE process cannot converge r 1 Dec-Tiger h = 4 for different a Dectiger h = 4 for different I 0.4 0.6 a Dec-Tiger h = 5 for different a ........ ......./ 1 1 — mea 1 - - max n 0.4 0.6 a 1 -......f' 1 -......^ — mean : 1 - - max 1 : il-1-1- 50 100 150 iterations Dectiger h = 5 for different I 1-'-'- 1-......i................"—*........... 1....... iy .....................?.....................-t........ \ i f 1-.....................^...................... ]T ' f 1 — meani : 1-- max 1 i 50 100 iterations Figure 3: Performance for varying a. Top: horizon 4. Bottom: horizon 5. Figure 4: Performance for varying number of CE iterations. Top: horizon 4. Bottom: horizon 5. to a good solution within a limited number of iterations. Also shown is that Dice on Dec-Tiger with 30 iterations a = 0.3 — 0.4 gives the best trade-off between learning and convergence. Next, we examine the number of CE iterations. The complexity analysis shows that this parameter should affect the running time linearly and this was experimentally confirmed. Figure 4 shows the value obtained by the algorithm for different numbers of iterations. It indicates that the algorithm converges after a fixed number of iterations. As we might expect, convergence requires less iterations for a smaller horizon (compare top and bottom figure). However, in both cases we see that the performance grows very rapidly when first increasing the number of iterations and then levels out. This indicates that even with a limited number of iterations, CE might be able to obtain fairly good results fast. Figure 5 shows the values obtained by varying N, the number of joint policies sampled per iteration, which also increases the complexity linearly. The results were obtained using a fixed update fraction p = 0.1. Here too, we see that improvement is strong initially, and then flattens out later. Also note that the higher variance for h = 5 can be explained by looking at the performance for I = 30 in Figure 4. The number of samples used to update the parameter vector ^ only marginally influences the run-time, and we were not able to determine this empirically. Also, using a larger fraction p decreases the performance. In this case, the quality of the sampled joint policies used to re-estimate the distribution parameters degenerates and CE will not converge towards good solutions. The results in figure 6 indicate optimal performance for p between 0.05 and 0.1. We omitted the nearly identical graph for h = 5. In case Dice-A is used for policy search, the number of approximation runs influences the run time linearly. In figure 7 we can clearly see that the quality of the approximate evaluation converges quickly for horizon 4. For horizon 5 the size of the policies increases exponentially and more runs are needed to maintain approximation quality, but also here we see no improvement beyond r = 1000. 6.2 Results on different problems Here we report the results of the performance of DICE on different Dec-POMDP problems and different horizon lengths. In particular we consider the following problems: BROADCAST CHANNEL, DEC-TIGER, MEETING ON A Grid and Factored Firefighting. The Dec-Tiger problem was discussed section 2.1. For the other problems we now provide a brief description. The Broadcast Channel problem involves two agents that control a broadcast channel. Each agent decides at every stage whether or not to send a message across it. When both agents send a message at the same time a collision occurs. When a message is successfully transmitted, the agents get a reward of +1. More information can be found in (30; 17). 0 200 0 150 200 Dectiger h = 4 for different N g-20 I -------:-------:------- -!-1- / / me ma an x N Dectiger h = 5 for different N 10 5 0 -5 -10 e ?-15 -20 -25 -30 -35 Dectiger h = 4 for different p 0.0 0.1 0.2 0.3 0.4 0.5 p (= Nb/N) Figure 6: The fraction of samples p used to update the distribution on horizon 4. N Figure 5: Performance under varying number of samples per CE iteration. Top: horizon 4. Bottom: horizon 5. Meeting on a Grid was introduced by Bernstein et al. (8) and considers two robots on a 2x2 grid. Their goal is to occupy the same square which gives them a +1 reward, but they have imperfect sensors and actuators which complicates the task. We consider the version with 2 observations per agent (2). Finally, Factored Firefighting is a problem with 3 agents that have to fight fires at 4 houses (29). Each agent has 2 different houses it can go to (the sets are overlapping, but not equal). Every stage each agent should choose which of its 2 associated houses it wants to fight fire at. For each house that is burning, the agents receive a penalty of —1 or —2, depending on the level of fire. Before showing results of the proposed CE approaches, Dec-Tiger Broadcast Grid FFF ? 2 2 2 3 l-i'l 2 4 16 81 lAl 3 2 5 2 ItJ.I 2 2 2 2 2 7.290e02 6.400e01 1.563e04 5.120e02 3 4.783e06 1.638e04 6.104e09 2.097e06 4 2.059el4 1.074e09 9.313e20 3.518el3 5 3.815e29 4.612el8 2.168e43 9.904e27 6 1.310e60 8.507e37 1.175e88 7.846e56 7 1.545el21 2.895e76 3.454el77 4.925ell4 8 2.147e243 3.352el53 Inf 1.941e230 0 -10 -20 ^-30 '-40 -50 -60 -70 Dectiger h = 4 for different i !- mean max 1000 r Dectiger h = 5 for different i ■ mean I -- max I 1000 r Figure 7: Performance for varying number of approximate evaluation runs. Top: horizon 4. Bottom: horizon 5. Table 1: The number ofjoint policies for different problems and horizons. Inf denotes a value beyond double machine precision. 10 0 20 40 60 80 100 10 0 -30 0 500 1500 2000 10 0 500 1500 2000 we first report the size of the joint policy space for different considered problems in Table 1. Clearly we are dealing with huge search spaces here. In fact for h = 8 MEETING ON A Grid the number of joint policies was not rep-resentable by a double precision float (the maximum repre-sentable being 1.7977e+308). We emphasize that DICE directly searches these spaces, without first restricting them. We report the results of DICE and DICE-A on these different test problems. We also compare against dynamic programming JESP which was proposed by Nair et al. (25). This method starts with a random joint policy and then iterates over the agents, computing a best-response policy for the selected agent while keeping the other agents fixed. The term "dynamic programming" indicates that this best-response is computed by solving an augmented POMDP for the selected agent. The reason to compare against jESP is that, as mentioned in the introduction, it is the only other approximate method that does not constrain the search space in any way. The settings for DICE used in these experiments are: a learning rate of a = 0.2, N = 50 joint policies per iteration, using the Nb = 5 best for update (i.e. p = 0.1). For Dice-A we used the same settings and the approximation was based on 1000 simulations. The results in this section are averaged over 100 random initializations, or restarts, of each solution method. However, due to the time needed for higher horizons, we have not always restarted 100 times for the highest horizon considered. The numerical results and the number of restarts over which they are obtained are listed in the appendix. The reported timing results are cpuuser times and obtained on an Intel Xeon 3.4 GHz machine with 2GB memory running Debian linux. The results for the Dec-Tiger problem are shown in Figure 8. Note that the run-time results in the right figure use a log scale. Our evaluation shows that for low horizons (h < 6) dice outperforms jESP but has taken, under these settings, more time to finish. However, the run-time of jESP increases much faster for larger horizons: for horizon 8 it is about ten times slower than DICE. The run-time of dice-A is affected least by the value of the horizon. In terms of quality of the found policy, DICE outperforms jESP for lower horizons: although all methods found (near-)optimal solutions for h = 2, 3,4 within the 100 restarts, the variance of jESP is much higher. Unfortunately, the joint policies found by DICE for h > 5 are not very good, as the performance drops below jESP. However, this exposes the potential of approximate evaluation: which allows for much more iterations in less time than regular DICE. We also ran DiCE-A with 200 iterations (20 restarts). While using approximation with the same settings does not seem to significantly change the results of the CE method by itself (DICE and DiCE-A wth 50 iterations perform roughly equal), it does allow for more iterations, leading to good overall results while keeping the run-time acceptable. Only for h = 8, jESP really seems to achieve a better result, but with high variance, and high run-times. For BROADCAST CHANNEL the results are shown in Figure 9. It shows that CE achieves a higher mean value and less variance with again little difference between DICE and Dice-A. The run-time of DiCE-A on the other hand increases again much slower than the other two approaches, which eventually is more beneficial. As indicated by Table 1, the MEETING ON A Grid problem is somewhat larger than the previous problems. This quickly makes exact evaluation problematic. For example, to compute the value of a horizon 6 joint policy 2.18e4 (s, Ö)-pairs have to be evaluated. Figure 10 shows that while jESP requires much more time than DICE, it does not result in better performance. The rightmost plot shows that the run-time of DiCE-A is significantly lower from horizon 5 on. However, starting at h = 6 DiCE-A seems to get trapped in a local optimum. Still, it is the only method to compute a policy for horizons 7 and 8. The last problem, Factored Firefighting , is even larger. Because there are now 3 agents, the number of joint observation histories, and thus the number of entries to compute for exact evaluation and for jESP grows much faster. This is reflected by the results as the graphs in Figure 11 show. dice and jESP can only find solutions for at maximum h = 3 and h = 4 respectively. Again this demonstrates the power of DICE-A, which does not encounter any problems computing results up to h = 8. In general DICE and DICE-A seem to perform quite well in comparison to jESP. Although jESP's maximum value is usually equal or greater than the maximum value found by the DICE methods, its mean value is lower and the standard deviation is high. This indicates the need for many restarts in order to find a good solution and performing a lot of restarts becomes problematic for higher horizons, because of the exponential increase in run-time. The results of DICE, however, have a much lower standard deviation, indicating that less restarts are necessary. It still shares the burden of exponentially increasing run-times, though. Dice-A has proven to be a very powerful method. Even though the standard deviations are somewhat greater than regular DICE, it is able to trade off run-time and accuracy and thus achieves reasonable results even for higher horizons, when the other methods fail. 6.3 Varying the number of agents In the previous section we investigated the ability of DICE to scale with respect to the horizon. Here we investigate the scaling behavior with respect to the number of agents. So far, almost all available Dec-POMDP problems involve only two agents. Two exceptions are the FIREFIGHTING problem introduced by Oliehoek et al. (28), and the Factored Firefighting already mentioned. Here we will use the former (non-factored) version, since this allows us to vary the number of agents, while keeping the number of houses (and thus the number of states) constant. Again we examined the performance of DICE, DICE-A and jESP for this problem, now varying the number of agents. We did this at two chosen horizons h = 3 and 20 0 -20 -40 e -60 > -80 -100 -120 Mean V Max. V Mean time \ * DICE 50iter ♦ •-♦ DICE-A. 50iter •■-•DICE-A. 200 iters »•••■JESP 100 restarts 3 4 5 6 7 horizon -40- ^^ DICE 50iter ♦ DICE-A. 50iter •■-•DICE-A. 200 iters JESP 100 restarts 2 3 456 horizon —» DICE 50iter •-♦ DICE-A. 50iter --•DICE-A. 200 iters ••••■JESP 100 restarts 4567 horizon Figure 8: The Dec-Tiger problem. Left: average value. Middle: maximum value. Right: average run-time. Mean V Max. V Mean time • »—»DICE exact -♦ •■♦ DICE approx. ■•••■ JESP 100 restarts .......^ .......... 1 ^—T .*• . . .. ............... DICE exact ♦ •■♦ DICE approx. ■•••■JESP 100 restarts 34567 horizon 456 horizon 34567 horizon Figure 9: The BROADCAST CHANNEL problem. Left: average value. Middle: maximum value. Right: average run-time. Mean V Max. V 4567 horizon Mean time 456 horizon .[3 10» Ti _______ .......1........ : // X 7.................... DICE exact ♦ •■♦ DICE approx. »•••■JESP 100 restarts 34567 horizon Figure 10: The Meeting on a Grid problem. Left: average value. Middle: maximum value. Right: average time. Mean V -8 K \ \ 1 1 : "S. : i ■■■..I i i ^--.J • DICE exact ..............i.............. ♦ •■♦DICE approx. i i »•••■ JESP 100 restarts i i 34567 horizon -5.0 Max. V Mean time -5.5 -6.0-6.5 e i-7.5 -8.0 -8.5 -9.. * DICE exact ♦ •■♦ DICE approx. JESP 100 restarts 456 horizon A ' .......:......7...... ......K...... ........4.......t.......' f / : / DICE exact ♦ •■♦ DICE approx. »•••■JESP 100 restarts 2345678 horizon Figure 11: The FACTORED FIREFIGHTING problem with several restarts. Left: average value. Middle: maximum value. Right: average run-time. 2 8 4 3 2 2 3 8 2 10 10 8 2 2 2 3 8 Mean time Mean time ........................... 1...... " DICE SOiter • DICE-A. SOiter • ■■■.JESP n Mean value 10* -. - ♦---- ____ " DICE SOiter t -t DICE-A. SOiter »-.JESP n Mean value Figure 12: Performance for varying number of agents for h = 3 of the Firefighting problem. Figure 13: Performance for varying number of agents for h = 4 of the Firefighting problem. h = 4, the results for which are shown in respectively Figure 12 and Figure 13. We used the same settings for the DICE algorithms as before. The number of restarts for all methods was set to 20. The figures show that, for this problem, all methods performed very well in terms of achieved value. The maximum found value of all policies coincided (therefore these plots are omitted), which may indicate that these are true global optima. More interesting are the run time results. For h = 3, we see that JESP outperforms the Dice algorithm for all number of agents. However, its increase of run time when increasing the number of agents is higher than for DICE and particularly DICE-A. This is emphasized by the results for h = 4, that clearly show that run time for Jesp, but also for exact Dice, grow so fast that they are unable to compute results for 5 agents within reasonable time.5 The run times of the two different settings of DICE-A, however, grow much slower and as a consequence these methods are able to find a good solution for 5 agents. Note that in accordance with (5.2), the run time still increases exponentially in the number of agents, simply because simulating an episode (sampling joint observations) takes time exponential in the number of agents. In problems that exhibit observation independence (such as FIREFIGHTING), it is possible to work around this. We have not considered this efficiency increasing measure further, but stress 5Each algorithm was given 8 hours to compute all results for a given horizon. that the efficiency of DiCE-A could be further improved for such problems. 7 Discussion and future work This article has focused on decentralized decision making formalized in the Dec-POMDP framework. We argued that planning in such settings in principle is a complex combinatorial decision process. We demonstrated how to apply the CE-method, a recently introduced method for combinatorial optimization, for policy search in Dec-POMDPs. We detailed the resulting algorithm, direct CE (DICE) policy search for Dec-POMDPs, and performed a complexity analysis, which identified the exact evaluation of sampled joint policies as a bottleneck. Consequently we proposed Dice-A which performs an approximate evaluation, and showed that, under some assumption, its time complexity is polynomial in the CE parameters. We also presented a formalization of the error bounds for this approximate evaluation. We presented an empirical evaluation of the influence of the different CE parameters on policy search and also tested performance on different test problems from literature, over different horizons. In these latter experiments we compared against jESP, which, to our knowledge, is the only other approximate planning method for Dec-POMDPs that does not restrict the search space in any way. The results of this comparison were generally favorable for CE. n In particular, a nice feature of CE is that by adjusting the parameters, one is able to control the run-time. On the other hand, because JESP has no parameters, it is somewhat more easy to apply. In a final comparison we investigated how well the mentioned algorithms scale with respect to the number of agents. Although still exponential, DICE-A outperforms the other methods for larger problems. However, this work does not intend to present a new state-of-the-art Dec-POMDP solver: we compare against JESP, which is one of the older Dec-POMDP algorithms, and more advanced methods have since been proposed. Rather our work shows that viewing these problems as combinatorial optimization problems and applying corresponding methods (such as CE optimization) can bring leverage to the planning process. An interesting direction for future work is the application of the CE method (and potentially other methods for combinatorial optimization) in more recent algorithms. For instance, a Dec-POMDP can be approximated by solving for each stage t a pruned Bayesian game (BG) (13) or a compressed BG (14). The optimal solution of such BGs, however, involves enumeration of all the corresponding joint policies. CE might prove to be very effective to find approximate solutions for the BGs fast. In particular, it might turn out that the pruning/compression is no longer necessary for the same horizon when applying CE, and that when combining pruning/compression and CE, the algorithm can scale to higher h. Another example is the method proposed by Seuken and Zilberstein (34). This method is an extension of dynamic programming for Dec-POMDPs that fixes the maximum number of policy trees that are retained in each iteration to a parameter maxTrees. The authors report that: "Even for a small problem 2 actions and 5 observations, setting maxTrees=5 would be prohibitive because (2 • 52)2 = 39, 062, 500 policy tree pairs would have to be evaluated." It might be possible to apply CE to this smaller policy search problem that occurs in each iteration of the DP process. This could lead to in improved efficiency, or the space could be less restricted in order to find a better approximate solution. Other directions for future research would involve improving the efficiency of the CE method itself. One idea for this would be to use crude value approximation in the first iterations to quickly increase the probabilities of promising policies. In the course of the process, evaluation can be performed more accurately. Exact evaluation can most likely be accelerated by caching (intermediate) evaluation results of (parts of) joint policies. Also, the joint policies resulting from CE search might be improved by using those as a starting point for jESP, leading to a hybrid optimization scheme for multiagent settings. Finally, and somewhat related, the success of approximate evaluation raises the question whether it is necessary to sample complete joint policies if they are only par- tially inspected during approximate evaluation. The CE approach could benefit from a construction that samples parts of (joint) policies. Acknowledgments We would like to thank Matthijs T.J. Spaan for his help and advice. The research reported here is part of the Interactive Collaborative Information Systems (ICIS) project, supported by the Dutch Ministry of Economic Affairs, grant nr: BSIK03024. References [1] G. Alon, D. Kroese, T. Raviv, and R. Rubinstein. Application of the cross-entropy method to the buffer allocation problem in a simulation-based environment. Annals of Operations Research, 134(1):137-151,2005. [2] C. Amato, D. S. Bernstein, and S. Zilberstein. Optimal fixed-size controllers for decentralized POMDPs. In Proc. of the AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), May 2006. [3] C. Amato, A. Carlin, and S. Zilberstein. Bounded dynamic programming for decentralized POMDPs. In Proc. of the AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), May 2007. [4] R. Aras, A. Dutech, and F. Charpillet. Mixed integer linear programming for exact finite-horizon planning in decentralized POMDPs. In The International Conference on Automated Planning and Scheduling, 2007. [5] R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman. Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research (JAIR), 22:423-455, December 2004. [6] D. S. Bernstein, S. Zilberstein, and N. Immerman. The complexity of decentralized control of Markov decision processes. In Proc. of Uncertainty in Artificial Intelligence, pages 32-37, 2000. [7] D. S. Bernstein, R. Givan, N. Immerman, and S. Zil-berstein. The complexity of decentralized control of Markov decision processes. Math. Oper. Res., 27(4): 819-840,2002. [8] D. S. Bernstein, E. A. Hansen, and S. Zilberstein. Bounded policy iteration for decentralized POMDPs. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), 2005. [9] Z. Botev and D. P. Kroese. Global likelihood optimization via the cross-entropy method with an application to mixture models. In WSC '04: Proceedings of the 36th conference on Winter simulation, pages 529-535,2004. [10] C. Boutilier. Planning, learning and coordination in multiagent decision processes. In TARK '96: Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge, pages 195-210, San Francisco, CA, USA, 1996. Morgan Kaufmann Publishers Inc. ISBN 1-55860-417-9. [11] I. Cohen, B. Golany, and A. Shtub. Managing stochastic finite capacity multi-project systems through the cross-entropy method. Annals of Operations Research, 134(1):183-199,2005. [12] P.-T. de Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein. A tutorial on the cross-entropy method. Annals of Operations Research, 134(1):19-67,2005. [13] R. Emery-Montemerlo, G. Gordon, j. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems, pages 136-143, 2004. [14] R. Emery-Montemerlo, G. Gordon, j. Schneider, and S. Thrun. Game theoretic control for robot teams. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 1175-1181,2005. [15] C. V. Goldman and S. Zilberstein. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research (JAIR), 22:143-174, 2004. [16] C. Guestrin, D. Koller, and R. Parr. Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems 14, pages 1523-1530, 2002. [17] E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic programming for partially observable stochastic games. In Proc. of the National Conference on Artificial Intelligence, pages 709-715, 2004. [18] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13-30,Mar. 1963. [19] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99-134, 1998. [20] Y. Kim, R. Nair, P. Varakantham, M. Tambe, and M. Yokoo. Exploiting locality of interaction in networked distributed POMDPs. In Proceedings of the of the AAAI Spring Symposium on Distributed Plan and Schedule Management, 2006. [21] j. R. Kok and N. Vlassis. Using the max-plus algorithm for multiagent decision making in coordination graphs. In RoboCup-2005: Robot Soccer World Cup IX, Osaka, japan, july 2005. [22] D. Koller and A. Pfeffer. Representations and solutions for game-theoretic problems. Artificial Intelligence, 94(1-2):167-215,1997. [23] O. Madani, S. Hanks, and A. Condon. On the un-decidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proc. of the National Conference on Artificial Intelligence, pages 541-548,1999. [24] S. Mannor, R. Rubinstein, and Y. Gat. The cross entropy method for fast policy search. In Proc. of the International Conference on Machine Learning, pages 512-519,2003. [25] R. Nair, M. Tambe, M. Yokoo, D. V. Pynadath, and S. Marsella. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proc. of the Int. Joint Conf. on Artificial Intelligence, pages 705-711, 2003. [26] R. Nair, P. Varakantham, M. Tambe, and M. Yokoo. Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In Proc. of the National Conference on Artificial Intelligence, pages 133-139,2005. [27] F. A. Oliehoek, j. F. Kooij, and N. Vlassis. A cross-entropy approach to solving Dec-POMDPs. In International Symposium on Intelligent and Distributed Computing, pages 145-154, Oct. 2007. [28] F. A. Oliehoek, M. T. j. Spaan, and N. Vlassis. Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research, 32:289-353, 2008. [29] F. A. Oliehoek, M. T. j. Spaan, S. Whiteson, and N. Vlassis. Exploiting locality of interaction in factored Dec-POMDPs. In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems, pages 517-524, 2008. [30] j. M. Ooi and G. W. Wornell. Decentralized control of a multiple access broadcast channel: Performance bounds. In Proc. 35th Conf. on Decision and Control, 1996. [31] M. j. Osborne and A. Rubinstein. A Course in Game Theory. The MIT Press, july 1994. [32] D. V. Pynadath and M. Tambe. The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of AI research (JAIR), 16:389-423,2002. [33] M. Roth, R. Simmons, and M. Veloso. Exploiting factored representations for decentralized execution in multi-agent teams. In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems, pages 467-463, May 2007. [34] S. Seuken and S. Zilberstein. Improved memory-bounded dynamic programming for decentralized POMDPs. In Proc. of Uncertainty in Artificial Intelligence, July 2007. [35] S. Seuken and S. Zilberstein. Memory-bounded dynamic programming for DEC-POMDPs. In Proc. of the Int. Joint Conf. on Artificial Intelligence, pages 2009-2015,2007. [36] M. T. J. Spaan and F. S. Melo. Interaction-driven Markov games for decentralized multiagent planning under uncertainty. In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems, pages 525-532,2008. [37] D. Szer and F. Charpillet. Point-based dynamic programming for DEC-POMDPs. In Proc. of the National Conference on Artificial Intelligence, 2006. [38] D. Szer, F. Charpillet, and S. Zilberstein. MAA*: A heuristic search algorithm for solving decentralized POMDPs. In Proc. of Uncertainty in Artificial Intelligence, 2005. [39] P. Varakantham, J. Marecki, Y. Yabu, M. Tambe, and M. Yokoo. Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies. In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems, 2007. Appendix Tables 2 to 5 give a numerical overview of the results presented in section 6.2 and 6.3. Dec-Tiger horizon avg. std. dev. mas. avg. value value value time Die E (50 iterations) 2 -4.08 0.78 -4.00 0.02 3 5.19 0.00 5.19 0.08 4 3.81 1.28 4.80 0.34 5 -1.58 4.15 4.58 1.37 6 -22.75 4.40 -13.20 6.05 7 -57.11 5.85 -46.56 27.93 8 -98.23 8.47 -80.09 147.73 DICI i-A (50 iterat ions) 2 -4.19 0.70 -4.00 5.90 3 5.19 0.00 5.19 8.43 4 3.35 2.76 4.80 10.55 5 -1.56 3.11 3.45 12.90 6 -24.05 5.14 -12.22 15.60 7 -59.45 6.61 -45.43 18.58 8 -103.59 8.04 -83.27 23.02 DICE-A (200 iterations) 2 -4.14 0.60 -4.00 23.26 3 5.19 0.00 5.19 33.54 4 4.16 0.87 4.80 40.20 5 2.50 1.93 5.63 48.10 6 -2.17 3.42 4.53 56.69 7 -11.46 4.39 -5.28 66.94 8 -31.30 4.38 -21.21 80.40 jesp 2 -17.09 10.22 -4.00 0.00 3 -21.36 10.76 5.19 0.00 4 -27.58 19.26 4.80 0.04 5 -22.72 17.44 7.03 0.49 6 -23.28 20.57 10.38 11.32 7 -21.42 19.16 9.99 144.21 8 -16.20 18.26 12.22 1741.39 Table 2: Results for the Dec-Tiger problem. Statistics over 100 restarts, except for DICE-A with I = 200 (which we performed with 20 restarts) and horizon 8 JESP (which only completed 30 restarts). Factored Firekighting horizon avg. value std. dev. value mas. value avg. time Die E (50 iterations) 2 3 -5.21 -6.66 0.00 0.01 -5.21 -6.65 36.45 347.89 DICE ;-A (50 iterat ons) 2 3 4 5 6 7 8 -5.22 -6.69 -7.74 -8.46 -9.05 -9.51 -9.88 0.02 0.03 0.12 0.22 0.25 0.35 0.44 -5.21 -6.65 -7.48 -8.05 -8.53 -8.83 -9.14 9.89 13.27 17.05 20.77 25.57 30.92 36.81 jesp 2 3 4 -5.28 -6.71 -7.52 0.09 0.06 0.07 -5.21 -6.65 -7.46 0.73 17.41 308.36 Table 3: Results for the FACTORED FIREFIGHTING problem over 100 restarts. Broadcast Channel horizon avg. std. dev mas. avg. value value value time Die i (50 iterations) 2 2.00 0.00 2.00 0.05 3 2.93 0.05 2.99 0.26 4 3.80 0.08 3.89 1.08 5 4.69 0.09 4.79 4.45 6 5.52 0.09 5.67 18.37 7 6.34 0.08 6.48 79.89 8 7.03 0.09 7.26 384.23 DICE -A (50 iterat ons) 2 2.00 0.00 2.00 6.38 3 2.92 0.05 2.99 8.32 4 3.80 0.07 3.89 10.20 5 4.65 0.09 4.79 12.26 6 5.50 0.08 5.66 14.47 7 6.28 0.08 6.46 16.93 8 6.98 0.11 7.22 19.74 jesp 2 1.92 0.21 2.00 0.00 3 2.59 0.44 2.99 0.01 4 3.43 0.48 3.89 0.04 5 4.27 0.47 4.79 0.36 6 5.04 0.55 5.69 3.03 7 5.88 0.57 6.59 24.71 8 6.81 0.59 7.49 202.46 Firekighting n avg. std. dev. mas. avg. value value value time dice (50 iterations) 2 -5.76 0.02 -5.74 13.77 3 -2.41 0.04 -2.39 59.07 4 -1.08 0.01 -1.07 249.88 5 -0.51 0.00 -0.51 1101.54 dice-a (50 iterations) 2 -5.80 0.04 -5.74 17.86 3 -2.48 0.05 -2.39 35.05 4 -1.09 0.02 -1.07 91.02 5 -0.51 0.01 -0.51 263.92 jesp 2 -5.76 0.04 -5.74 0.47 3 -2.41 0.03 -2.39 4.17 4 -1.08 0.01 -1.07 20.15 5 -0.51 0.00 -0.51 83.61 Table 6: Results for the h = 3 FIREFIGHTING problem with varying number of agents over 20 restarts. Table 4: Results for the BROADCAST CHANNEL problem over 100 restarts. Meeting on a Grid horizon avg. std. dev. mas. avg. value value value time DICE (50 iterations) 2 0.91 0.00 0.91 0.71 3 1.55 0.01 1.55 3.43 4 2.23 0.02 2.24 15.37 5 2.91 0.07 2.96 67.60 6 3.57 0.06 3.64 292.88 DICE-A (50 iterations) 2 0.91 0.00 0.91 10.77 3 1.54 0.01 1.55 16.51 4 2.21 0.02 2.24 21.86 5 2.85 0.05 2.93 27.19 6 2.64 0.07 2.73 32.86 7 2.97 0.06 3.07 37.66 8 3.23 0.09 3.37 42.63 jesp 2 0.86 0.08 0.91 0.02 3 1.38 0.17 1.55 0.50 4 1.97 0.24 2.24 12.11 5 2.53 0.30 2.97 287.02 Firekighting n avg. std. dev. mas. avg. value value value time dice (50 iterations) 2 -6.66 0.06 -6.58 68.03 3 -2.52 0.02 -2.45 490.58 4 -1.07 0.00 -1.07 4227.08 dice-a (50 iterations) 2 -6.81 0.08 -6.67 23.07 3 -2.59 0.08 -2.45 45.22 4 -1.09 0.02 -1.07 121.37 5 -0.51 0.00 -0.50 350.04 jesp 2 -6.62 0.03 -6.58 6.71 3 -2.45 0.03 -2.44 110.57 4 -1.08 0.01 -1.07 1040.75 Table 7: Results for the h = 4 FIREFIGHTING problem with varying number of agents over 20 restarts. For n = 4 is dice shows an average over 4 completed runs and JESP over 18 completed runs. Table 5: Results for the Meeting on a Grid problem over 100 restarts. On the Compilation of Programs into their Equivalent Constraint Representation Franz Wotawa and Mihai Nica Technische Universität Graz, Institute for Software Technology, Inffeldgasse 16b/2, 8010 Graz, Austria E-mail: {wotawa,mihai.nica}@ist.tugraz.at Keywords: constraint satisfaction problems, hyper-graph decomposition, debugging Received: March 15, 2008 In this paper we introduce the basic methodology for analyzing and debugging programs. We first convert programs into their loop-free equivalents and from this into the static single assignment form. From the static single assignment form we derive a corresponding constraint satisfaction problem. The constraint representation can be directly used for debugging. From the corresponding hyper-tree representation of the constraint satisfacti on problem we compute the hyper-tree width which characterizes the complexi ty of finding a soluti on for the constraint satisfacti on problem. Since constraint satisfacti on can be effectively used for diagnosis the conversi on can be used for debugging and the obtained hyper-tree width is an indicator of the debugging complexity. Povzetek: CClanek opisuje analiziranje programov in iskanje napak v njih. Figure 1: Interaction between the control and the debugging system 1 Introduction Ideally intelligent systems should provide self-reasoning and reflection capabilities in order to react on internal faults as well as on unexpected interaction with their environment. Reflection capabilities are highly recommended for systems with strong robustness constraints, like space exploration probes or even mobile robots. A scenario, for example, is a robot that although having a broken engine, should reach a certain position. Without self-reasoning or reflection such a robot would simple fail to reach its goal. Another example would be a robot where the software fails because of a bug. In this situation a robot should recover and ideally repair itself. Note that even exhaustive testing does not prevent a program from containing bugs which might cause an unexpected behavior in certain situations. In this paper we do not focus on whole systems which comprise hardware and software. Instead we are discussing how to represent programs to allow for reflection which can be used for enhancing the system with debugging functionality. In the context of this paper debugging is defined as fault localization given a certain test-case. We do not take care of verification and test-case generation which is used for fault detection and repair. In order to compute the fault location we follow the model-based diagnosis approach (22) but do not rely on logical models but use constraints instead for representing programs. The obtained constraint representation can be directly used for computing diagnosis, e.g., by using specialized diagnosis algorithms like the one described in (12; 25; 26). Although, reflection and debugging capabilities are a desired functionality of a system they provoke additional computational complexity which can hardly be handled by the system directly because of lack of computational power. Note that model-based diagnosis is NP complete. Hence, a distributed architecture would be required which separates the running control program from the debugging capabilities. In Figure 1 we depict the proposed architecture. The debugging module takes the source code of the original system which is in this case a control system and a test-case to localize and repair the fault. The changed source code is compiled and transferred back to the original system. In the proposed architecture there are several open issues which have to be solved. The first is regarding the test case. In particular one is interest in the origin of the test case. One way would be to have a monitoring system which checks the internal state of the control system. In case of a faulty behavior, the given inputs coming from the sensors, the internal state and the computed output together with the information regarding the expected output is used as a test case. The second issue is related to fault localization and repair. There is literature explaining fault localization and repair which can be used, e.g., (13; 24; 17; 23). In this paper we focus on modeling for fault localization, i.e., for identifying the root cause of a detected mis- behavior. This does not necessarily directly lead to good repair suggestions. Repair definitely requires knowledge about further specification details and incorporating different test-cases should help a lot. A detailed discussion of repair is outside the scope of this paper and still an important topic of research. The third and last issue is a more technical one. After recompiling the new program has to replace the old one on the fly which might cause additional problems because of ensuring integrity and consistency of the system's behavior. Making systems self-aware and giving those self-healing capabilities increases their autonomy, which is important in missions like space exploration where the system cannot be directly controlled for whatever reason. The paper contributes to this research in providing a methodology for fault localization in programs. The methodology requires the availability of the source code and test cases. It compiles the program into their equivalent constraint representation and uses a failure-revealing test case to compute diagnosis candidates. The paper is organized as follows. We first introduce the basic ideas behind our approach by means of an example. Afterwards, we go through all necessary steps of the conversion process, which finally leads to the constraint representation of programs. This section is followed by a presentation of experimental results we obtained from our implementation. The experiments focus on structural properties of the programs' constraint representation because the structural properties have an impact on the complexity of debugging. Finally, we discuss related research and conclude the paper. 2 Diagnosis/debugging In this section we motivate the idea behind our approach by means of an example program that is expected to compute the area and circumference of a circle from a given diameter. 1. r = d/2; 2. c = r*pi; 3. a = r*r*pi Obviously the statement in line 2 is faulty. A possible repair would be c = 2*r*pi; or c = d*pi;. We know this because of our knowledge in mathematics. However, in the more general case where large programs are involved, we have to rely on a process in order to detect, localize, and correct a fault. Such a process deals with the question: 'How can we find out what's wrong in the program' and is a standard in today's software engineering processes. In software engineering we first have to detect the fault. This is done in practice using test-cases or properties together with formal verification methods. In our case we rely on test cases. One test case which allow to detect the faulty behavior would be setting d to 2 and requiring c to be and a to be n. The program computes the value n for both variables c and a which contradicts the test case. This contradiction can be used to locate the fault. One way would be to have a look at the first definition of variable c which has assigned a wrong value and trace back using the dependencies between variables. In this case r is defined in line 1 and used in line 2. Hence, line 1 and 2 are possible fault candidates. This approach uses only structural information coded in the program and might overestimate the number of diagnosis candidates. Another approach would be to assume some line of the program to be faulty. In this case the statement corresponding to the line does not provide any known functionality. For example, when assuming line 2 to be faulty, we do not know how to compute a value for c. Therefore, we do not get any contradicting information and the assumption is consistent with the given test-case. Unfortunately, this is also the case when assuming line 1 to be faulty and only considering the computation of values like specified in the semantics of the programming language, i.e., from the begin of the program to its end. The situation changes when considering the statement as equations where no direction of information flow is given. An equation like r = d/2 is a constraint which specifies a relationship between r and d. When we now assume the constraint corresponding to line 1 to be faulty we can compute a value for r using the given test-case. From the constraint corresponding to line 3 a = r^n and a = n we derive r =1 and finally using c = rn leads to c = n which contradicts the given expected value for c which is 2n. Hence, the assumption in this case contradicts the observations and cannot be a single-fault candidate. Only line 2 remains as single fault. The reason for this improvement is that when using constraints we have the capabilities for reasoning backwards. Summarizing the above discussion, a solution to the debugging problem would be to convert programs into their constraint representation and use it for debugging. We choose a constraint representation because equations as well as logical sentences can be represented and integrated, and because of the availability of tools. However, there are still some challenges we have to solve. First, variables might be defined more than once in a program. Every definition has to be considered separately. Programs comprise conditional and loop statements. How to handle them? For the first problem and the conditionals we propose the use of the static single assignment (SSA) form of programs which is used in compiler construction. Another challenge is how to handle loops and recursive procedure calls? In order to represent loops we make use of the following observation. Loops can be represented by nested if-statements where the nesting depth is infinite. When restricting the nesting depth to a finite value the nested if-statements still behave the same as the loop statement when considering only inputs which do not cause the number of loop-iterations to exceed the nesting depth. Hence, under this assumption the nested if-statements are good enough to represent loops and we have reduced the problem of handling loops to the problem of handling con- 1. i = 0; 1. if x < y { 2. r = 0; 2. min = x; 3. while (i < x) } else { 4. r = r + y; 3. min = y; 5. i = i + 1; } } Figure 2: A program com- Figure 3: A program for puting the minimum of two computing the product of numbers two natural numbers ditionals. A similar technique can be applied to solve the recursive procedure calls challenge. Finally, we have to handle arrays and other programming language constructs like pointers or objects. In this paper, we present an approach for handling arrays. The other aspects of programming languages are ignored. This is due to the fact that our main application area is the embedded-systems domain. Programs used in embedded-systems usually are restricted and do not use of features like dynamic memory allocation because such features are error prone and likely lead to problems during operation. B if ( C ) { B if ... } } } C represents the condition, and B the statements in the sub-block of the while statement. Of course it is not possible in practice to compile while-statements into an infinite number of conditionals. Instead we assume that the number of iterations of the while-statement never exceeds a certain limit say n. We argue that faults can be detected using test-cases which cause a small number of iteration and that it is therefore - for the purpose of debugging - not necessary to consider larger values of n. Moreover, we might introduce a procedure which is called whenever the limit is reached. This information would give us back additional information which we might use for increasing n in a further step. To set a small bound for the number of iterations is also used in a different context. Jackson (18) uses a similar idea which he called small scope hypothesis in his Alloy system for verification. We formalize the bounded conversion from while-statements into nested if-statements by introducing the function r : PL x N ^ PL where PL represents the programming language. 3 Conversion In this section, we describe the conversion process of programs into their equivalent CSP representation in detail. For more information regarding CSPs we refer the reader to Dechter (11). We start with converting programs into their loop-free equivalents which are used as basis for the conversion into the SSA form. Finally, we show how to extract the CSP from the SSA form. The complexity of the proposed compilation approach is composed of the complexity of each conversion step. While SSA construction and CSP extraction can be both handled in polynomial time computing a loop-free equivalent depends on the number of necessary iterations. This number depends on the program's complexity. In this section we use the programs given in Figure 2 and 3 as running examples. We further assume without restricting generality of the approach that the programs to be converted have a Java like syntax (ignoring object-oriented elements). 3.1 Loop-free programs When executing while-statements they behave like a conditional statement in one step. If the condition is fulfilled the statements in the block are executed and the while-statement is executed again afterwards. Otherwise, the while-statement is not executed. Hence, it is semantically correct to represent while-statements using an infinite number of nested if-statements, i.e., while ( C ) { B } is equivalent to if ( C ) { B if ( C ) r(while ( C ) { B },n) = if(C ){ Br(while (C ){ B},n - 1)} if n > 0 if ( C ) {too_many_iterations; } otherwise Considering the above discussion it is obvious that the following theorem which states the equivalence of the program and its loop-free variant with respect to their behavior has to hold. Theorem 3.1. Given a program n G PL written in a programming language PL and a number n G N. n behaves equivalent to r(n, n) for an input I iff the execution of r(n, n) on I does not reach too_many_iterations. We now use the program from Figure 3 and n = 2 to show the application of r which leads to the following program: 1. i = 0; 2. r = 0; 3. if (i < x) { 4. r = r + y; 5. i = i + 1; 6. if (i < x) { 7. r = r + y; 8. i = i + 1; 9. if (i < x) { 10. too_many_iterations; }}} The loop-free variant can be used in all cases where x = 0 or x = 1 without causing a different behavior to the original program. It is interesting to note that the time complexity of a program which is corresponds to the number of iterations depending on the size of the inputs is now represented by the size of the converted program. Hence, we easily can give an estimate for the size of the converted programs (and also the number of iterations) when limiting the size of the input. 3.2 Static single assignment form The SSA form is an intermediate representation of a program, which has the property that no two left-side variables share the same name, i.e., each left-side variable has a unique name. Because of this reason the SSA form is of great importance when building the CSP. Since all variables are only defined once the SSA form allows for a clear representation of the dependencies that are established between different variables inside the corresponding program. The SSA representation of a program is sometime also an intermediate step in the compiling process; basically before compiling a java file, first a transformation is undertaken and the SSA form is obtained. The obtained form will be the form used as input for the compiler. In order to compile loop-free programs into SSA we have to analyze the program and rename all variables such that each variable is only defined once without changing the behavior of the program. Basically, this compilation is done by adding an unique index to every variable, which is defined in a statement. Every use of this variable in the succeeding statement is indexed using the same value. This is done until a new definition of the same variable occur. For example, the following program fragment on the upper side can be converted into its SSA representation bellow. 1. 2. x = a + b; x = x - c; The SSA representation: 1. x_1 = a_0 + b_0; 2. x_2 = x_1 - c_0; Although, converting programs comprising only assignment statements is easy, it is more difficult for programs with loop or conditional statements. In our case we only need to consider conditional statements. The idea behind the conversion of conditional statement is as follows: The value of the condition is stored in a new unique variable. The if- and the else-block are converted separately. In both cases the conversion starts using the indices of the variables already computed. Both conversions deliver back new indices of variables. In order to get a value for a variable we have to select the last definition of a variable from the if-and else-part depending on the condition. This selection is done using a function Hence, for every variable which is defined in the if- or the else-branch we have to introduce a selecting assignment statement which calls the $ function. For example, corresponding SSA form of the program fragment if (C ) { } else { is given as follows: var_C = C ; .. x_i = .. .. x_j = .. x_k = $(x_i,x_j,var_C); The function $ is defined as follows: ${x,y,h) = x if h y otherwise For algorithms for computing the SSA form and more information regarding the $ function we refer the reader to (9; 7; 27). The SSA representation for the programs from Figure 2 and 3 are depicted in Figure 4 and 5 respectively. Note that we represent the $ function as a function call phi in the program. It is possible to write a function body for phi that exactly represents the behavior of the $ function. Hence, the program and its SSA representation can be executed on the same input even at the level of the programming language. It is easy to see that the SSA form of a program always behaves equivalent to the original program, which we state in the following theorem. Theorem 3.2. Given a program n G PL. The SSA representation n' G PL of n is equivalent to n with respect to the semantics of PL, i.e., for all inputs I both programs return the same output. For debugging purposes the input output equivalence, which is similar to the input output conformance (IOCO) used in testing, is sufficient. The SSA representation allows us to map, with little effort, the diagnosis set of program n' to the original equivalent program n. In the following subsection we describe how arrays and function calls can be handled. Afterwards we discuss the compilation of programs into constraint systems. 3.3 Extensions In the previously described conversion process we still face two important challenges as they are the conversion of arrays and procedure calls. In this section, we tackle both challenges and present conversion rules that have to be applied before converting the resulting source code in a SSA form. We start with the array conversion. We assume that arrays are defined over a data type (although we do not consider this information in the conversion) and are of fixed length. Because of simplicity, we assume that we can only access one element of the array after the other. Note that in some languages there are constructs, which allow accessing an array partially, e.g., element 3 to 5. x= x= 1. i_1 = 0; 2. r_1 = 0; 3. var_3 = (i_1 < x_0); 4. r_2 = r_1 + y_0; 5. i_2 = i_1 + 1; 6. var_6 = (i_2 < x_0); 7. r_3 = r_2 + y_0; 8. i_3 = i_2 + 1; 9. r_4 = phi(r_3,r_2,var_6) 1. var_1 = ( x_0 < y_0 ); 10. i_4 = phi(i_3,i_2,var_6) 2. min_1 = x_0; 11. r_5 = phi(r_4,r_1,var_3) 3. min_2 = y_0; 12. i_5 = phi(i_4,i_1,var_3) 4. min_3 = phi(min_1,min_2,var_1); Figure 4: The SSA form of the program from Fig. 2 Figure 5: The SSA form of the loop-free variant of the program from Fig. 3 Given an array A of length n > 0 with elements (ai...aj...a„). The access to elements is assumed to be done using the [] operator, which maps from A and a given index i to the array element a^. For example, z = A[1] gives you back the first element of array A. So far, arrays seem not to be very difficult to handle in our conversion process. If reaching a statement z = A[1] during SSA conversion A has only be represented as A_k where k is the currently given unique index k for A. But how to handle statements like A[m] = A[n] + 3? In this case, obviously A has to be assigned a new index, but how to handle the following program fragment? A[1] = 2; A[2] = 4; ^ is similar to $ and can be implemented such that the converted program is equivalent to the original one with respect to its semantics. For example, consider the following program fragment: 1. A[1] = 5; 2. A[2] = A[1] + 5; Accordingly to our conversion rule we obtain the new fragment: 1. A = psi(A,1,5); 2. A = psi(A,2,A[1: + 5); The SSA representation is not based on the new fragment and captures the semantic in the appropriate way. 1. A_1 = psi(A_0,1,5); 2. A_2 = psi(A_1,2,A_1[1] + 5); A SSA translation into A_1[1] = 2; A_2[2] = 4; would be wrong because this transformation does not respect previously done array changes in the appropriate manner. In order to respect the underlying semantics, we have a closer look at it. Assume a program fragment A[ i] = f (x) where the i-th element of A is set to the outcome of function f given parameters x. This statement only changes the i-th element but not the others. Formally, we define the semantics as follows: {A}// A before the statement A[i] = f (x) { A' } // A after the statement with A'[i] = f (x) and G {1,...,n},i = j: A'[j ] = A[ j ]. As a consequence, we introduce a function ^ that implements this semantics and replace the original statement with A = ^(A,i,f (x)). The function Note that ^ or psi can be implemented as a function in order to ensure the equivalent behavior even in the context of program execution. Assume that psi has the formal arguments A, i , e and that the length of an array can be accessed via a function length, then the body of the function is given as follows: j = 1; while (j < length (A)) { if (j==i) { B[j] = e; } else { B[j] = A[j]; } j = j + 1; } return B; The function psi returns a new array B of A's size. We now consider the last challenge we want to tackle, the procedure calls and in particular recursive procedure calls. Given a procedure M with its formal parameters xi,.. .,xn, and the body ^(m), which itself is a program written in the same programming language. A procedure call ìA{ai,... ^a^) with actual parameters ai,... ,an causes the execution of M's body where the formal parameters are assigned to their corresponding actual parameters, i.e., xi = ai. Hence, a transformation is easy. We only have to assign values to the formal parameter, which can be done using assignments, and use the body of the procedure instead of the procedure call. In cases where the procedure returns a value, we have to introduce a new variable M_return. We further replace return e with M_return = e, and finally, we introduce a new assignment, where M_return is used appropriately. The described transformation of the return statement is only correct, whenever a procedure has only one of those statements. This is not a restriction because we always can modify a program to fulfill this requirement. For simplicity, we also assume that the variables used in bodies of procedures and the one used in the main program are different except in cases of global variables. We now explain the idea behind the conversion using an example program where a procedure foo is called x = foo(2,y,z-1); Now assume that foo has three formal parameters x1,x2,x3 and the following body: v = x1 + x2; v = v - x3; return v; When applying the described conversion rule, we obtain the following program fragment: x1 = 2; x2 = y; x3 = z-1; v = x1 + x2; v = v - x3; foo_return = v; x = foo_return; This program can be easily compiled into its SSA form: x1_1 = 2; x2_1 = y_0; x3_1 = z_0-1; v_1 = x1_1 + x2_1; v_2 = v_1 - x3; foo_return_1 = v_2; x_1 = foo_return_1; We now extend the above idea to the general case, where we might face recursive procedure calls. The idea behind the conversion is the same, but similar to the handling of iterations of a while statement, we have to set a bound on the maximum number of recursive replacements during conversion. For this purpose we assume that there is such a bound MC given for a procedure M in a calling context. Similar to while statements we introduce a function r : PL X N ^ PL that defines the bounded compilation of a not necessarily recursive procedure call: f([x = ]M( ai,...,a„ ),i) = xl = al;***xn = an; f(^'(M),i - 1) [x = M_return; ] if i < MC too_many_recursions; otherwise Note that S' denotes the body of method M, where the return statement return e has been changed to M_return = e. Moreover, F is assumed to be defined for all other statements where the statement itself is returned without any changes. Hence, when applying F to the body of a method, all statements are examined and left as they are with the exception of a new method call. With these additional assumptions the compilation function can be generally applied. We illustrate the conversion using a program only comprising the call x = foo(1,2); where the body of foo is given as follows: r = 0; if (x1 > 0) { r= foo(x1-1,x2); r = r + x2; } return r; The following program represents the recursion-free variant of the program calling foo with 1 allowed iteration, which is sufficient for specific procedure call foo(1,2) in order to return the correct result. x1 = 1; x2 = 2; //1. recursive call r = 0; if (x1 > 0) { x1 = x1 - 1; x2 = x2; //2. recursive call r = 0; if (x1 > 0) { too_many_iterations; } M_return = r; //returning from 2. call r = M_return; r = r + x2; M_return = r; //returning from 1. call x = M_return; The converted program can be easily compiled in its SSA form. 3.4 Constraint representation Constraint satisfaction problems (CSPs) have been introduced and used in Artificial Intelligence as a general knowledge representation paradigm of knowledge. A CSP (V, D, CO) is characterized by a set of variables V, each variable having a domain D, and a set of constraints CO which define a relation between variables. The variables in a relation r e CO are called the scope of the relation. An assignment of values from D to variables from V is called an instantiation. An instantiation is a solution for a CSP, iff it violates no constraint. A constraint is said to be violated by an instantiation, if the value assignment of the variables in the scope of the constraint are not represented in the relation of the constraint. There are many algorithms available for computing valid instantiations, i.e., solutions for a CSP. A straight-forward algorithm is backtrack search where variable values are assigned and consistency checks are performed until a valid solution is found. Moreover, it is well known that CSPs can be solved in polynomial time under some circumstances, i.e., the CSP must be acyclic, which we discuss later. Yannakakis (30) proposed such an algorithm. Based on this algorithm there are diagnosis algorithms (12; 25) available, which can be used for debugging directly providing there is a CSP representation for programs. For more information and details on CSPs we refer the reader to DechterŠs book (11). The last step in the conversion process is to map programs in SSA form to their CSP representation. This extraction of the constraints from the statements of the SSA representation can be easily done. The algorithm is pretty straight forward and only implies analyzing the SSA representation line by line. In this conversion step, each line of the SSA representation is mapped directly to a constraint. Hence, all variables of a statement map directly to the scope of the constraint. The constraint relation is given by the statement itself by interpreting the assignment operator as an equivalence operator. For example, the statement x_2 = x_1 + 2 is mapped to a constraint having 2 variables x_2 and x_1 and where the relation is stated as an equation of the form x_2 = x_1 + 2. Note that the latter representations is a more flexible one because it can also be read, for example, as x_2 — x_1 = 2. The CSP representation of the program from Figure 2 is given as follows: Variables: Domains: V= Constraints: CO = {var_l, x_0, y_0, min_1, min_2, min_3} D = {D(x) = N|x e V} var_1 = (x_0 < y_0), min_1 = x_0, min_2 = y_0, min_3 = phi(min_1, min_2, var_1) When comparing the CSP representation of the program with its SSA form, which is depicted in Figure 4, we see that both representations are very close to each other. It is easy to prove that the SSA form of a program is equivalent to its CSP representation with respect to the behavior. Theorem 3.3. Given a program n, its SSA representation n, the corresponding CSP Cn. The value assignments of the variables in n', which are caused by executing n' on an input I are a solution to the corresponding CSP Cn and vice versa. From this theorem and the others we conclude that the transformation is behavior neutral and in this way the CSP representation captures the behavior of the program. As a consequence, the CSP representation can be directly used for debugging. As already mentioned there are circumstances under which the algorithm is more effectively and there are situations where the computation of solutions is hard. This holds now directly for debugging and we are interested in classifying programs regarding their debugging complexity. We define debugging complexity as a measure that corresponds to the complexity of computing a solution using CSP algorithms. In the following, we discuss structural properties of CSPs, which can be used for classification and which are based on the hyper-graph representation of programs. In the hyper-graph representation of a CSP the variables of the CSP represent vertices, and the constraint scopes the hyper-edges. Thus hyper-edges connect possible more than one vertex. Hyper-graphs can be used to classify CSPs regarding their complexity of computing a solution. As already mentioned, solutions for CSPs with corresponding acyclic hyper-graphs can be computed in polynomial time (see (30)). Such hyper-graphs can be directly represented as hyper-trees. Unfortunately, not all CSPs are acyclic. But the good story is that cyclic CSPs can be converted into an equivalent CSP that is acyclic. What is needed is to join the right constraints. Joining constraints, however, is a drawback because it is time and space consuming. In the worst case all constraints have to be joined in order to finally receive the acyclic equivalent CSP representation, which is of course intractable. As a consequence, one only gains computational advantages from the conversion of hyper-graphs into hyper-trees if the number of constraints to be joined is as minimal as possible. This number is referred to as hyper-tree width. More information regarding hyper-graphs and hyper-tree composition which allows to convert hyper-graphs into hyper-trees can be found in (14; 15). The hyper-graph and its corresponding hyper-tree for the CSP introduced above is depicted in Figure 6. The hyper-tree width for this example is 2. Having a CSP representation of a program has the advantage of being able to use various algorithms for debugging purposes. However, the performance of debugging depends on the structure of the CSP. Hence, we are interested in the structural properties, i.e., the hyper-tree width, of the CSPs for various example programs. If the hyper-tree width for such examples is low, then computing diagnoses can be done effectively. In the next section we focus C, ^fST C. /x_0 I Vmin 3 \min_1/ \min_2/ V ~ \ Ä / \ m / C„C4 {var_1, min_3, x_0,y_0, min_1 ,min_2} 1 1 Cj {x_0, min_1} C, {y_0,min_2} Figure 6: Hyper-graph (left) and corresponding hyper-tree (right) for the SSA form given in Fig. 4 therefore on the hyper-tree width of programs. 4 Experimental results on the hyper-tree width As explained in the previous section the hyper-tree width has an impact on the complexity of debugging when using constraints a means for intermediate representation formalism. Gaining knowledge about the hyper-tree width of programs is therefore of importance. In this respect this section reports on the hyper-tree width of several programs. For this purpose, we implemented the conversion procedure in Java and used a constraint system that was implemented at out institute. For the hyper-tree decomposition we used an implementation provided by the TU Wien (see (16)). In particular, we made use of the Bucket Elimination algorithm based decomposition which is explained in (29). All experiments were carried out on a PC (Pentium 4, 3 GHz, 1 GB Ram). The experiments are based on small programs that vary from 40 to 400 lines of code. The lines of code of the corresponding SSA forms and the size of constraint system in terms of number of constraints and variables are higher due to used while statement and their transformation to conditional statements. In particular, we wanted to give an empirical answer to the following research questions. - Thorup (28) stated that there is limit of 6 for the hyper-tree width of structured programs. The theorem is based on using control dependence information and does not consider data dependences. Because the latter is of importance for debugging and our compilation respects those dependences, we wanted to know whether given limit still applies. - The compilation of while statements and recursive procedure calls leads to an increase of statements and to a nested if-then-else structure. The question is how the nesting depth, i.e., the number of iterations for unrolling the while statements or recursive procedure calls, influences the hyper-tree width? Moreover, one might be interested whether there is a maximum bound on the hyper-tree with in such cases. The environment for carrying out the empirical study is not the optimal for answering the above research question but the best one can expect today. The reason ist that the ID LOC #W #I It HW T 1 70 1 6 3 5 1 s 1 70 1 6 50 5 364 s 2 110 0 11 - 5 1 s 3 70 4 5 3 5 11 s 3 70 4 5 20 6 2040 s 4 80 0 0 - 4 1 s 5 70 0 0 - 4 1 s 6 70 0 0 - 2 1 s 7 400 0 0 - 9 7s 8 400 2 0 3 50 1000 s 8b 400 2 0 3 12 122 s 9 400 0 5 16 120 s 9 400 0 10 27 621 s 9 400 0 20 54 3959 s 9 400 0 35 51 16450 s 9 400 0 50 55 19245 s 10 400 0 10 20 80s 10 400 0 20 23 274 s 10 400 0 50 25 2400 s 10 400 0 60 29 4120 s 10 400 0 70 25 4770 s 11 400 0 - 10 8s 12 400 0 3 15 120 s 12 400 0 6 27 2580 s 12 400 0 10 43 3415 s 13 400 0 15 53 4010 s 14 60 0 - 2 1 s 15 50 4 3 6 1 s 15 50 4 10 6 5s 15 50 4 100 6 456 s 16 40 7 0 1 2 1 s 16 40 7 0 10 3 600 s Figure 7: Evolution of the hyper-tree width for different complexity programs Bucket Elimination based decomposition algorithm is only an approximation algorithm and thus the solutions needs not to be minimal once. However, because of the size of the corresponding constraint systems other algorithms are hardly of use because they would take too much time and space. The finally obtained results for the programs are depicted in Figure 7. There the programs are given a number (ID). The lines of codes (LOC) of the original program, the number of while statements (#W), the number of if-statements (#I), the number of iterations used to unroll the while-statements (I), the hyper-tree width (HW) obtained, and the time (T) required to compute the hyper-tree width are given. In the case of programs 1 to 6 and 14 to 16, the hyper-tree width tends to be less influenced by the number of iteration of the while-structure. Moreover, for these programs the hyper-tree width reaches its maximal value after 2 to 3 iteration. This cannot be said for programs 7 to 13 were 1. x = 10; 2. y = 20; 3. while (x<100){ 4. x = x + y; 5. y = y + 2; Figure 8: Small example program test the hyper-tree width ranges from 9 to 55. Even in the case where there is no unrolling of while statements (programs 7 and 11) the hyper-tree width ranges from 9 to 10. All of these programs represent digital circuits including some variants (like 8 and 8a) with a complex data flow. Based on the obtained empirical results, we have to retract Thorup's theorem (28) because there are many programs that result in a constraint system of a larger hyper-tree width than 6. Note that it is not very likely to find a hyper-tree decomposition with a smaller width for the given programs. Moreover, the approximation algorithm seems to produce approximations, which are close to the optimum. The second research question is harder to answer. In all experiments, we observed that after a certain number of iterations the hyper-tree width remains almost the same. For example take a look at program 9 and 10. In both cases it seems that the hyper-tree width reaches an upper bound. Of course because of the used approximation algorithm there is a variance in the obtained width. But it seems to be always around a certain value. More experiments have to be carried out in order to justify these findings. For the purpose of motivating why the hyper-tree width stays constant after a certain number of considered iterations of the while statement, we use a small program test, which is given in Figure 8. For test we know that the maximum hyper-tree width is 3. This upper bound is reached after 2 times unrolling of the while statement, i.e., replacements of the while statement with nested if-statements. In our example, we now compute the SSA form and the corresponding constraint systems for program test using 3 nested if-statements. The resulting SSA form and the constraint system are depicted in Figure 9 and Figure 10 respectively. The hyper-graph corresponding to the constraint representation of test is given in Figure 11. Note that for the sake of clarity the graphical representation only comprises the constraints from the while structure in which the variable x is involved. From the hyper-graph it can be easily seen that the edges follow a certain pattern, which is repeated in every unrolling of the while statements. Hence, there is a possibility that the hyper-graph decomposition can be applied to these subparts of the hyper-graph separately and combined afterwards, which might lead to a constant hyper-tree width after a certain amount of unrolling steps. In summary, we obtained the following results from the experimental study: - The hyper-tree width of programs might become very large. Usually problems of hyper-tree width of 5 to 10 depending on the application domain are considered hard problems. - In case of unrolling while-statements or recursive calls; it seems that the hyper-tree width reaches an upper bound. This would be an indication that debugging does not necessarily become more difficult, when the number of iterations increases. Note that the number of unrolling steps of while statements does depend on the considered test case, which is an independent criteria. - The time for computing the hyper-tree width can be very large, which might be unacceptable in some circumstances. This can be for example the case when interactive debugging is a requirement. For automated debugging or off-line debugging the time for conversions is not a limiting factor. However, decreasing the overall analysis time is an open challenge. What remains of interests is the question, why a certain program has a larger hyper-tree width than another similar program? Obviously the data and control dependences influence the overall hyper-tree width. But in case of similar programs the differences regarding the dependencies might not be large enough to justify high differences of the obtained hyper-tree width. This holds especially in case where a program comprises while statements. In the next section, we focus on this aspect and discuss one cause that leads to such an observation. 5 Discussion In the performed experiments we observe that programs, which have a high hyper-tree width and where the number of iterations necessary to reach the maximum hyper-tree width is large, have less data dependences in the sub-block of the while statement. We illustrate these findings by means of two example programs sum1 and sum2 , which are depicted in Figure 12 and Figure 13 respectively. Both programs have about the same structure. In both programs a variable i is increased in every iteration until it reaches 100. Moreover, the hyper-tree width of both programs is about the same (2 and 1 for sum1 and sum2 respectively) when the number of considered unrolling steps is 1. However, the situation changes, when considering more nested if-statements as a replacement for the while statement. For sum1 the maximum hyper-tree width of 4 is reached after 3 iterations. For sum2 the maximum hyper-tree width of 8 is reached after 12 unrolling steps. If considering the difference between the minimum and the maximum hyper-tree width, we obtain another different outcome. For sum1 the difference is only 1, whereas for sum2 the difference is 7. 1. x1_0 =10; 2. y2_0 =20; 3. cond_0=x1_0<100; 4. x1_1=x1_0+y2_0; 5. y2_1=y2_0+2; 6. cond_1=cond_0&&x1_1<100; 7. x1_2=x1_1+y2_1; 8. y2_2=y2_1+2; 9. cond_2=cond_1&&x1_2<100; 10. x1_3=x1_2+y2_2; 11. y2_3=y2_2+2; 12. x1_4=phi(x1_2,x1_3,cond_ _2) 13. y2_4=phi(y2_2,y2_3,cond_ _2) 14. x1_5=phi(x1_1,x1_4,cond_ _1) 15. y2_5=phi(y2_1,y2_4,cond_ _1) 16. x1_6=phi(x1_0,x1_5,cond_ _0) 17. y2_6=phi(y2_0,y2_5,cond_ _0) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. (x1_0 ), (y2_0 ), (X_cond_0 , x1_0 ), (x1_1 , x1_0 , y2_0 ), (y2_1 , y2_0 ), (X_cond_1 , X_cond_0 , (x1_2 , x1_1 , y2_1 ), (y2_2 , y2_1 ), (X_cond_2 , X_cond_1 , (x1_3 (y2_3 (x1_4 (y2_4 (x1_5 (y2_5 (x1_6 (y2_6 x1_2 y2_2 x1_2 y2_2 x1_1 y2_1 x1_0 y2_0 y2_2 x1_1 ), x1_2 ), x1 _3 , X_cond_2 ), y2 _3 , X_cond_2 ), x1 _4 , X_cond_1 ), y2 _4 , X_cond_1 ), x1 _5 , X_cond_0 ), y2 _5 , X_cond_0 ). Figure 9: The SSA form of program test (Fig. 8) Figure 10: The constraints for test (Fig. 8) x1j6 X1J5J x1j4\ /x^jC \ / x^J XJOondJD XJDond_1 XjCon4j2 Figure 11: The hyper-graph of the constraint system from Fig. 10 1. a = f + i; 1. a=f + i; 2. b = g + h; 2. b = g + h; 3. while (i < 100) { 3. while (i < 100) { 4. x = a + b; 4. x= x + a + b; 5. i = i + 1; 5. i= x + i + 1; 6. y = c + d; 6. } 7. } Figure 12: Program sum1 Figure 13: Program sum2 By having a closer look at the programs we detect a major difference in their structure, which might explain this high difference in the hyper-tree width. For the variable x that is used in the block of program sum1 a new value is computed in each iteration of the while statement. The outcome of variable x depends on the number of iterations and therefore on variable i (and the condition of the while statement). This is not the case for variable x and variable y in program sum2. Both variables are assigned the same value in each iteration step. For performance reason such assignment statements should be always placed outside a loop. In case of program sum2 the values of variables in an iteration not necessarily depends on the previous iteration but on a variables that have been computed before calling the while statement. It seems that this difference makes the structure of the hyper-graph more complex, which leads finally to a higher hyper-tree width. The given example shows that hyper-tree width may indicate an unwanted program structure. In program sum2 either there are variables missing at the right side of statements 4 and 6, or both statements should be given outside the while statement because of performance reasons. Obviously, such problems can be detected using some rules like the following: If a variable v is defined in the sub-block of a while statement, then there should be at least one not necessarily different statement in the same block that uses v. Hyper-tree width offers another way for detecting such problematic cases occurring in programs. 6 Related research for every function, which is very often not the case in real-world programs. Furthermore, their work does not properly handle recursive function calls. Various authors, e.g., (13; 24; 20; 21), have described models to be used for fault localization using model-based diagnosis. Almost all of the work makes assumptions regarding the program's structure, uses abstractions which lead to the computation of too many diagnosis candidates, or does not handle all possible behaviors at once. The latter models, for example, consider one execution trace which prevent the diagnosis engine of remove some diagnosis candidates. In our representation we consider all possible behaviors up to a given boundary. This should lead to a reduction of the number of diagnosis candidates. In (19) Köb and Wotawa discussed the use of Hoare logic for modelbased debugging which requires a Hoare logic calculus for computing diagnosis. Although, we share the same ideas on automated debugging, the described approach, which comprises the conversion to CSPs and their direct use, is new. From our point of view the described approach generalizes previous research directions. Other work on debugging include (4; 5; 6). All of them are mainly based on program slicing (10). (6) integrates slicing and algorithmic program debugging (2) and (5) does the same for slicing and delta debugging (1). Critical slicing (4) which is an extension of dynamic slicing that avoids some of the pitfalls, can also be used in debugging. Because of the nature of slicing and the other techniques these approaches require more or less user interaction and cannot be used for really automated debugging. More information regarding other approaches to debugging and a good classification of debugging system is provided in (3). Collavizza and Ruehner (8) discussed the conversion of programs into constraint systems and their use in software verification. Their work is very close to ours. The conversion steps are about the same and they also use the SSA form as an intermediate representation from which constraints can be obtained. However, there are some important differences. In their work the focus is on verification and not on debugging. They do not explain how to handle arrays and procedure calls as we did in the paper. Moreover, the structural analysis of programs using the concept of tree-decomposition methods and in particular hyper-tree decomposition is new. This holds also for the findings derived from the empirical analysis, which are of importance for automated debugging. The authors of (31) proposed to diagnose errors in programs using constraint programming. Their approach requires that the programmer provides contracts, i.e., pre-and post-conditions, for every function. However, the authors do not investigate the complexity of solving the resulting problem and the scalability to larger programs. In particular, they do not consider structural decomposition or other methods which could make the approach feasible. Moreover, the practical applicability of their approach is also limited because it requires that contracts are specified 7 Conclusion In this paper we introduced a methodology for compiling programs into their equivalent CSP representation. The methodology comprises the conversion of while statement into their equivalent nested if-statement representation from which the SSA form is generated. From the SSA form itself we finally obtain the CSP representation. In the paper we argued that the CSP representation can be effectively used for debugging and allows for computing a complexity metrics for debugging, i.e., the hyper-tree width. This is due the fact that the complexity of debugging based on CSPs depends on the hyper-tree width. For the purpose of debugging we assume the existence of the source code and test cases, which reveal the fault. Another advantage of the CSP representation is that the available algorithms for constraint solving and in particular diagnosis can be directly used. The presented empirical analysis showed that the hyper-tree width of programs varies a lot and can be more than 50. For the purpose of debugging this finding is not good, because usually constraint systems with a hyper-tree width of 5 to 10 are considered as complex. However, this fact shows that debugging itself is a complex task. From the empirical analysis we also obtain that previous work, which was mainly based on control dependences, on an upper bound of 6 for the hyper-tree width of programs cannot be justified in case of debugging in general. For a specific program there might be an upper bound even when compiling while statements in their equivalent nested if-statement representation. Future research includes to solve the upper bound problem in the general case and to apply the debugging approach to smaller and medium size programs. The integration of the CSP representation and given program assertions like pre- and post-conditions is also of interest. Acknowledgement This work has been supported by the FIT-IT research project Self Properties in Autonomous Systems project (SEPIAS) which is funded by the Austrian Federal Ministry of Transport, Innovation and Technology and the FFG and by the Austrian Science Fund (FWF) within the project "Model-based Diagnosis and Reconfiguration in Mobile Autonomous Systems (MoDReMAS)" under grant P20199-N15. Moreover, we want to thank Nysret Musliu for providing us with an implementation of the hyper-graph decomposition. References [1] Andreas Zeller and Ralf Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering, 28(2), feb 2002. [2] Ehud Shapiro. Algorithmic Program Debugging. MIT Press, 1983. [3] Nahid Shahmehri, Mariam Kamkar, and Peter Fritz-son. Usability criteria for automated debugging systems. J. Systems Software, 31:55-70, 1995. [4] Richard A. DeMillo, Hsin Pan, and Eugene H. Spafford. Critical slicing for software fault localization. In International Symposium on Software Testing and Analysis (ISSTA), pages 121-134, 1996. [5] Neelam Gupta, Haifeng He, Xiangyu Zhang, and Rajiv Gupta. Locating faulty code using failure-inducing chops. In Automated Software Engineering (ASE), pages 263-272, November 2005. [6] Mariam Kamkar. Application of program slicing in algorithmic debugging. Information and Software Technology, 40:637-645, 1998. [7] Marc M. Brandis and H. Mössenböck. Single-pass generation of static assignment form for structured languages. ACM TOPLAS, 16(6):1684-1698, 1994. [8] H. Collavizza and M. Rueher. Exploration of the capabilities of constraint programming for software verification. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 182196, Vienna, Austria, 2006. [9] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM TOPLAS, 13(4):451-490, 1991. [10] Mark Weiser. Programmers use slices when debugging. Communications of the ACM, 25(7):446-452, July 1982. [11] RinaDechter. Constraint Processing. Morgan Kaufmann, 2003. [12] Yousri El Fattah and Rina Dechter. Diagnosing tree-decomposable circuits. In Proceedings 14th International Joint Conf. on Artificial Intelligence, pages 1742 - 1748, 1995. [13] Gerhard Friedrich, Markus Stumptner, and Franz Wotawa. Model-based diagnosis of hardware designs. Artificial Intelligence, 111(2):3-39, July 1999. [14] Georg Gottlob, Nicola Leone, and Francesco Scar-cello. On Tractable Queries and Constraints. In Proc. 12th International Conference on Database and Expert Systems Applications DEXA 2001, Florence, Italy, 1999. [15] G. Gottlob, N. Leone, and F. Scarcello. A comparison of structural CSP decomposition methods. Artificial Intelligence, 124(2):243-282, 2000. [16] http://www.dbai.tuwien.ac.at/proj/hypertree/ dex.html in- [17] A. Griesmayer, R. Bloem, and B. Cook. Repair of boolean programs with an application to C. In Proc. 18th Conference on Computer Aided Verification (CAV'06), pages 358-371, Seattle, Washington, USA, August 2006. [18] Daniel Jackson. Software abstractions: logic, language, and analysis. MIT Press, 2006. [19] Daniel Köb and Franz Wotawa. Fundamentals of debugging using a resolution calculus. In Luciano Baresi and Reiko Heckel, editors, Fundamental Approaches to Software Engineering (FASE'06), volume 3922 of Lecture Notes in Computer Science, pages 278-292, Vienna, Austria, March 2006. Springer. [20] Cristinel Mateis, Markus Stumptner, and Franz Wotawa. Modeling Java Programs for Diagnosis. In Proceedings of the European Conference on Artificial Intelligence (ECAI), Berlin, Germany, August 2000. [21] Wolfgang Mayer, Markus Stumptner, Dominik Wieland, and Franz Wotawa. Can ai help to improve debugging substantially? debugging experiences with value-based models. In Proceedings of the European Conference on Artificial Intelligence, pages 417-421, Lyon, France, 2002. [22] Raymond Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32(1):57-95, 1987. [23] S. Staber, B. Jobstmann, and R. Bloem. Finding and fixing faults. In Proc. 13th Conference on Correct Hardware Design and Verification Methods, pages 35-49. Springer-Verlag, 2005. LNCS 3725. [24] Markus Stumptner and Franz Wotawa. Debugging Functional Programs. In Proceedings 16th International Joint Conf. on Artificial Intelligence, pages 1074-1079, Stockholm, Sweden, August 1999. [25] Markus Stumptner and Franz Wotawa. Diagnosing tree-structured systems. Artificial Intelligence, 127(1):1-29, 2001. [26] M. Stumptner and F. Wotawa. Coupling CSP decomposition methods and diagnosis algorithms for tree-structured systems. In Proc. 18th International Joint Conf. on Artificial Intelligence, pages 388-393, Aca-pulco, Mexico, 2003. [27] M.N. Wegman and F.K. Zadek. Constant propagation with conditional branches. ACM Transactions on Programming Languages and Systems, 13(2), April 1991. [28] Mikkel Thorup.All Structured Programs have Small Tree-Width and Good Register Alloca-tion,Information and Computation Journal, Volume 142,Number 2,Pages 159-181,1998. [29] Artan Dermaku, Tobias Ganzow, Georg Gottlob, Ben McMahan, Nysret Musliu, Marko Samer. Heuristic Methods for hypertree Decompositions, DBAI-TR-2005-53, Technische Universität Wien, 2005. [30] M. Yannakakis. Algorithms for acyclic database schemes. In C. Zaniolo and C. Delobel, editors, Proceedings of the International Conference on Very Large Data Bases (VLDB-81), pages 82-94, Cannes, France, 1981. [31] R. Ceballos and R. M. Gasca and C. Del Valle and D. Borrego Diagnosing Errors in DbC Programs Using Constraint Programming Lecture Notes in Computer Science, Vol. 4177, Pages 200-210, 2006. Automatic Streaming Processing of XSLT Transformations Based on Tree Transducers Jana Dvoräkovä Department of Computer Science Faculty of Mathematics, Physics and Informatics Comenius University, Bratislava, Slovakia E-mail: dvorakova@dcs.fmph.uniba.sk Keywords: XML transformation, XSLT, streaming processing, tree transducer Received: March 15, 2008 Streaming processing of XML transformations is practically needed especially when large XML documents or XML data streams are to be transformed. In this paper, the design of an automatic streaming processor for XSLT transformations is presented. Unlike other similar systems, our processor guarantees bounds on the resource usage for the processing of a particular type of transformation. This feature is achieved by employing tree transducers as the underlying formal base. The processor includes a set of streaming algorithms, each of them is associated with a tree transducer with specific resource usage (memory, number of passes), and thus captures different transformation subclass. The input XSLT stylesheet is analyzed in order to identify the transformation subclass to which it belongs. Then the lowest resource-consuming streaming algorithm capturing this subclass is applied. Povzetek: Obravnavano je avtomatično pretakanje XSLT transformacij. 1 Introduction XML (28) is a meta-language defined by W3 Consortium in order to store structured data. The initial W3C recommendation for XML was published in 1998 and since then XML has become a popular format. It is commonly used for data exchange among applications since it enables them to add semantics to data explicitly. Furthermore, XML is a suitable tool in every field where it is necessary to create document standards. XML usage is still growing fast and new technologies for processing XML documents are emerging. Transformations of XML documents are needed in many situations. For instance, let us consider two applications exchanging data in XML format, each of them requiring different structure for the same content. Then a transformation must be performed while the data are being transferred between these applications. A typical XML transformation processor reads the whole input document into memory and then performs particular transformation steps according to the specification. References to any part of the input document are processed in a straightforward way by traversing the in-memory representation, and the extracted parts are combined to form the required output fragment. This approach is called tree-based processing of XML transformations. In early days of XML, this kind of processing was sufficient since the existing XML documents were small and stored in files. However, nowadays it is quite common to encounter extensive XML data (e.g., database exports) or XML data streams in practice. In both cases, the tree-based processing is not suitable - in the former case it is not acceptable or even possible to store the whole input document in the memory, while in the later one the XML data become available stepwise and need to be processed "on the fly". In our case, the research on efficient processing of XML transformations was in part motivated by processing large XML data in the semantic repository Trisolda (9). The repository contains semantic annotation for various web resources. A standard format for specify such semantics is Resource Description Framework (RDF) which is an instance of XML. Since the amount of web resources annotated tends to grow very fast, the transformation of RDF data into other representations/views and vice versa cannot be performed in the tree-based manner. At the same time, it is not suitable to write the transformations by hand using a SAX parser. The transformations needed are not trivial, especially due to the complexity of RDF format, and along with adding new functions into Trisolda repository new transformations may become necessary. Therefore, a more flexible approach is employed and a processor is proposed such that the most efficient strategy for performing a given transformation is automatically chosen. A natural alternative to the classical tree-based processing of XML transformations is the streaming (event-driven) processing. Here the input document is read sequentially, possibly in several passes; and the output document is generated sequentially in one pass. Only a part of the input document is available at a time, and thus advanced techniques must be used to process references to the input doc- ument and connect the extracted parts to the proper position within the output document. Currently, the most frequently used XML transformation languages are XSLT (27) and XQuery (29), both general (Turing-complete) languages intended for tree-based processing. There is a great interest to identify XSLT and XQuery transformations which allow efficient streaming processing. When designing an XSLT/XQuery streaming processor, the key task is to find the way of handling the non-streaming constructs of the languages. The streaming algorithms for XSLT and XQuery transformations are however still under development and the complexity issues such as memory requirements and the number of passes needed for specific, clearly defined transformation classes have not yet been analyzed. The main contribution of this work is the design of a system for automatic streaming processing of XSLT transformations yielding the following properties: - The transformation classes captured are clearly characterized. Each such class contains transformations with common properties - it represents an XSLT subset obtained by restricting constructs used in the XSLT stylesheet. - A streaming algorithm is designed for each transformation class. The main design goal is to minimize the upper bound of memory usage, i.e., to use optimal (or close to optimal) amount of memory. Such upper bound is explicitly stated for each algorithm. These features are achieved by employing tree transducers as the underlying formal base. Specifically, the design of the processor is based on the formal framework for XML transformations introduced in (10). In this paper, the framework is simplified and customized in order to facilitate the implementation. It contains a general model - an abstract model of general, tree-based transformation languages, and a set of streaming models that differ in the kind of memory used and the number of passes over the input allowed. Each streaming model can simulate some restricted general model. The framework contains a simulation algorithm for each such pair streaming model ^ restricted general model. The framework is abstract, and thus can be used to develop automatic streaming processors for other general transformation languages as well (e.g., XQuery). The implementation level of the framework for XSLT language includes the implementation of streaming models and two modules: (1) an analyzer that associates the input XSLT stylesheet with the lowest resource-consuming streaming model that is able to process it, and (2) the translator that automatically converts the XSLT stylesheet into the streaming model chosen according to the associated simulation algorithm. The processor based on the framework is easily extensible since new transducers and algorithms may be specified and implemented, as well as op-timizable since the current algorithms may be replaced by the more efficient ones. Although there are some XML transformations such that their streaming processing is always high resource-consuming (e.g., complete reordering of element children), most of the practical transformations can be processed with reasonable bounds on the resource usage and thus, more effectively than when processed in the tree-based manner. The rest of this paper is organized as follows: Section 2 contains description of both approaches to processing XML transformations and the complexity measures related to streaming processing. In Section 3, the customized formal framework for XML transformations is introduced and the underlying tree transducers are described. An example algorithm designed within the framework is presented in Section 4. In Section 5, the design of our automatic streaming processor for XSLT transformations is introduced. In Section 6, the relation to other work is discussed. Section 7 briefly introduces the implementation of the example streaming algorithm and Section 8 concludes with summary and comments on future work. 2 Complexity of streaming processing In this section, the relevant complexity measures for the streaming algorithms for XML transformations are specified. An XML document contains the following basic constructs: elements, element attributes, and text values. The document may be represented as a tree that is obtained by a natural one-to-one mapping between elements and internal nodes of the tree. The text values appear in the leaves of such tree. Reading a document in document order then exactly corresponds to the preorder traversal of the constructed tree. Figure 1: Two types of XML transformation processing: (a) tree-based processing, (b) streaming processing. The tree-based processing of XML transformations (Fig. 1a) is flexible in the sense that the input document is stored in the memory as a tree and can be traversed in any direction. On the contrary, during the streaming processing (Fig. 1b) the elements of the input document become available stepwise in document order and similarly the output elements are generated in document order. The actual context is restricted to a single input node. Clearly, one-pass streaming processor without an additional memory is able to perform only simple transformations, such as renaming elements and attributes, changing attribute values, filtering. It must be extended to perform more complex restructuring. The common extensions are (1) allowing more passes over the input document, (2) adding an additional memory for storing temporary data. The extensions can be com-bined1. We obtain the corresponding complexity measures for streaming processing of XML transformations: 1. the number of the passes over the input tree, 2. the memory size. It is reasonable to consider the complexity of the streaming processing in relation to the tree-based processing. As mentioned in Section 1, all XML transformations can be expressed in both XSLT and XQuery, and processed by their tree-based processors. Various transformation subclasses can be then characterized by putting restrictions on these general transformation languages, typically by excluding certain constructs. When designing streaming algorithms, we have a choice regarding three settings - the type of the memory used (none, stack, buffers for storing XML fragments), and the values of the two complexity measures mentioned. Streaming algorithms with different settings may capture different transformation subclasses. Since the transformation subclasses are characterized as some subsets of the general transformation language considered, the key issue in the algorithms is to realize a streaming simulation of the nonstreaming constructs included in the restricted language (see Fig. 2). XQueryk Basic GXT Restricted simulation Extended GXT1 SXT1 Restricted ^ simuiation Extended GXT2 SXT 2 Restricted general ^ K Transformation Streaming algorithm transformation < > with specified language N 1/ subclass simulation complexity Figure 2: The streaming simulation of subsets of a general transformation language. We use tree transducers to design the streaming algorithms formally and to model transformation subclasses. They are included in the formal framework for streaming XML transformations that are described in the next section. 3 Formal framework The framework is intended as a formal base for automatic streaming processors of the general transformation languages. It does not cover all XML transformations. In order to keep the models employed simple and comprehensible, it is restricted to model primarily the transformations that capture the relevant problems of streaming processing. In Section 5, a way how some of the restrictions on the transformation set can be overcome in the implementation is described. The framework consists of the following formal models: Figure 3: A schema of the formal framework. 1. a basic general model for tree-based processing of XML transformations and its restrictions, 2. a basic streaming model for streaming processing of XML transformations and its extensions. The design of both models results from an analysis of various tree transformation models, XML transformation models as well as real-world XML transformation languages and systems. They are based on tree transducers, models for tree transformations (25) originated in the formal language theory. We introduce two novel models - a general XML transducer (GXT) used as the general model, and a streaming XML transducer (SXT) used as the streaming model. They are defined in common terms in order to facilitate development of the simulation algorithms. The overall schema of the framework is shown in Fig. 3. The basic SXT represents a simple one-pass streaming model without an additional memory. Following the ideas from Section 2, it can be extended by memory for storing temporary data and by allowing more passes over the input document. The basic GXT represents the most powerful general model. As already mentioned, it does not capture all XML transformations, but only a subset significant in the case of streaming processing. For each extended SXT, the transformation subclass captured is identified by imposing various restrictions on the basic GXT. The inclusion is proved by providing an algorithm for simulating this restricted GXT by the given extended SXT. 3.1 Notions and Notations XML Document Abstraction. In what follows, element attributes and data values are not considered2. Let S be an alphabet of element names. The set of XML trees over S is denoted by Ts, the empty XML tree is denoted by e. An indexed XML tree may in addition have some leaves labeled by symbols from a given set X. A set of XML trees over S indexed by X is denoted by Ts{X). In the rightmost indexed XML tree, the element of the indexing set occurs only as the rightmost leaf. The set of rightmost indexed XML trees is denoted by Ts (X)r. A particular XML tree t e Ts (X ) is uniquely specified as a triple (Vt, Et, Xt) where Vt is a set of nodes, Et C Vt X Vt is a set of edges, and Xt : Vt ^ S U X is a labeling function. 1 More passes over the input tree are not possible for XML data streams that must be processed "on the fly". 2We refer the reader to (10) for the definition of the extended framework including both element attributes and data values. /XX x\ / \ßj \yj \yj • /1.1.1 1.2.1 1.2.2 Figure 4: An example of the XML tree. Example 3.1. An XML tree t = (Vt, Et, Xt) over the alphabet S = {a, ß, 7} and the empty indexing set X = 0 is shown in Fig. 4. The nodes of t are uniquely identified by dynamic level numbering. The sets Vt, Et and the labeling function Xt are defined as follows: Vt = {1,1.1,1.2,1.1.1,1.2.1,1.2.2}, Xt(l) = a, Xt(1.1.1) = ß, Xt(1.1) = ß, Xt(1.2.1) = 7, Xt(1.2) = 7, Xt(1.2.2) = 7. Selecting Expressions. Simple selecting expressions, derived from XPath expressions (26), are used to locate the nodes within the XML tree. The selecting expression is a path consisting of a sequence of steps. It can be either absolute (starting with /), or relative. The step consists of two components - an axis specifier axis and a predicate pred. They are specified as outlined below. Comparing to the XPath language, the set of expressions is restricted and the syntax of some constructs is simplified - the meaning is explained in parentheses. The semantics of the selecting expressions follows the semantics of the equivalent XPath expressions. step : axis : axis [pred] X (self), i (child), t (parent), ^ (left sibling), ^ (right sibling), i* (descendant), t* (ancestor), - (preceding), (following) pred : step (select all elements) (select the elements named name) (select the element on i-th position within siblings) (select the elements having context specified by step) 3.2 XML Transducers General XML Transducer (Fig. 5a). The input heads of GXT traverse the input tree in any direction and the output is generated from the root to the leaves. At the beginning of a transformation, the transducer has only one input head, which aims at the root of the input tree, and one output head, which aims at the root position of the empty output tree. During a single transformation step, the whole input tree is available as a context. One or more new computation branches can be spawned and the corresponding input control is moved to the input nodes specified by selecting expressions. At the same time, the output heads may generate a new part of the output. Formally, the GXT is a 5-tuple T = (Q, S, A, qo, R), where - Q is a finite set of states, - S is an input alphabet, - A is an output alphabet, - qo G Q is an initial state, and - R is a set of rules of the form Q X S ^Ta(Q xSy) . For each q G Q and a G S, there is exactly one rhs such that (q, a) ^ rhs G Q. The right-hand side of a rule contains an XML tree over the output alphabet indexed by rule calls - pairs of the form (q, exp), where q is a state and exp is a selecting expression that returns a sequence of input nodes to be processed recursively. A simple example of a GXT transformation follows. Example 3.2. Let T = (Q, S, S, q0, R) be a GXT where Q = {qo}, S = {a, ß, 7}. and R consists of the rules (qo,a) (qo ,ß ) (qo,7) a((qo, i [*])) , 7((qo, i [2]), (qo, i [1])) . (3.1) (3.2) (3.3) The names of the elements are taken from an alphabet S. The set of selecting expressions over S is denoted by S^. The evaluation of a selecting expression in the context of some XML tree t and one of its nodes u G Vt returns the same set of nodes of t as the evaluation of the corresponding XPath expression. Note that the context set contains a single node only. The transducer processes the input trees over alphabet S. The subtrees at nodes named a are completely removed (rule 3.1), the nodes named ß are renamed and get a new name a (rule 3.2), and at last, when encountering a node named 7, the first two children are processed in reversed order (rule 3.3). The GXT is inspired mainly by the tree-walking tree transducer (TWR) (4) and data tree transducer (DTT) (23). It works on unranked trees, but does not handle data values. Similarly to TWR, the computation is high-level and based on rule calls. However, the XPath language is used for pattern matching on the paths of the input tree as it is in DTT. This choice is natural since XPath is used in the common —> —> —> general transformation languages (XSLT, XQuery). The GXT is tree-walking, i.e., the input tree is traversed in any direction. It allows more computation branches, but it is still sequential model in the sense that during a transformation step only a single rule call is processed. InputXML tree i i i Output XML trse Figure 5: The processing model of the transducers: (a) the GXT; (b) the SXT. Streaming XML Transducer (Fig. 5b). The SXT has a single input head that traverses the input tree in preorder, and a single output head that generates the output tree in preorder. Each node is visited twice during a single pass - once when moving top-down, and once when moving bottom-up. Thus, two types of SXT states are recognized (1) the states indicating the first visit of nodes and (2) the states indicating the second visit of nodes. During a single transformation step, the input head either moves one step in preorder or stays at the current position. At the same time, an output action is performed, depending on the type of rule applied. When applying a generating rule, a new part of the output is connected to the current position of the output head, and then the output head moves to the position under the rightmost leaf of the new part. When applying a closing rule, no output is generated, only the output head is moved one step upwards in preorder within the output tree. Formally, the streaming XML transducer (SXT) is a 5-tuple T = (Q, S, A, qo, R), where - Q = Qi U Q2, Qi n Q2 = 0 is a finite set of states, - S, A are the same as in the case of GXT, - qo e Qi is the initial state, and - R = Rg U Rc, Rg n Rc = 0 is a finite set of rules of the form: Rg : Q X S X Pos Rc : Q X S X Pos TA{Q X Ss)r Q xS^ where Pos = {leaf, no-leaf} x {last, no-last}3. For each q e Q and a e S there is at most one rhs such that for each pos e Pos there is a rule {q,a,pos) ^ rhs e R4. Furthermore, for each {q,a,pos) ^ rhs e R, rec{rhs) = (q',exp)^, one of the following preorder conditions holds: 1. moving downwards: q e Q1, and -pos[1] = no-leaf, q' e Q1, exp ={ [1], or - pos[1] = leaf, q' e Q2, exp = x[*], 2. moving upwards: q e Q2, and -pos[2] = no-last, q' e Qi, exp [1], or -pos[2] = last, q' e Q2, exp =| [*], 3. no input move: q, q' are of the same kind, exp = X. The left-hand side of a rule consists of a state, an element name and a node position. The position is used to determine the preorder move within the input tree and it consists of two predicates - the first one indicating a leaf node, and the second one indicating a last node among the siblings. The right-hand side is an XML tree rightmost indexed by a rule call. 4 An example algorithm In this section, a particular streaming simulation designed within our framework is presented. In particular, a top-down GXT is simulated by an SXT extended with stack of the size proportional to the height of the input tree. The stack-based simulation is efficient - in order to evaluate simple top-down selecting expressions in the branches of the input XML tree the memory size proportional to the length of the branches, which equals height of the input tree, is needed. However, it is shown that a restriction to stack is sufficient. First, the models considered are described formally. Restricted GXT. The restricted GXT, called the top-down GXT (TGXT) differs from GXT in the rule definition - R is a set of rules of the form Q X S X top-S^) where top-S^. is a set of simple top-down selecting expressions. It is a subset of selecting expressions such that only top-down axis (child and descendant) and name predicates ([name]) are allowed. The simulated TGXT must in addition satisfy two input-dependent conditions: 1. The TGXT is order-preserving if and only if, for each of its rules, the input nodes returned by the selecting expressions in the rhs are in preorder for arbitrary input tree t and u e Vt. 2. The TGXT is branch-disjoint if and only if, for each of its rules, the input nodes returned by the selecting expressions in the rhs are disjoint for arbitrary input tree t and u e Vt. 3If pos e Pos is a node position, its first component is referred by pos[1] and to its second component is referred by pos [2]. 4This condition is necessary to keep the model deterministic. 5 If rhs is a particular right-hand side, its rule call is referred by rec{rhs). —> —> Intuitively, if any of the conditions is not satisfied, it may happen that a part of the input tree disproportional to the height of the input tree must be stored in the memory and thus the stack-based simulation is not applicable. Extended SXT. The extended SXT, called the stack-based SXT (SSXT) is a 7-tuple T = (Q, S, A, r,qo,zo,R) where - Q, S, A, qo are the same as in the case of SXT, - r is a finite set of stack symbols, - z0 e r is the initial stack symbol, and - R is a finite set of rules of the form: Rg : Q X S X Pos x r ^ Ta(q XS^ X r*), Rc : Q X S X Pos x r ^ Q x S^ x r* The lhs now contains, in addition, the current top stack symbol, and the rhs contains a sequence of stack symbols to be put on the top of the stack. All other symbols have the same meaning as in the SXT. 4.1 Construction of Simulating SSXT The formal proposition follows. It says that, for each order-preserving and branch-disjoint TGXT, it is possible to construct an SSXT inducing equivalent translations. Proposition 4.1. Let T = (Q, S, A, q0, R) be an order-preserving and branch-disjoint TGXT. Then an SSXT T' exists such that, for each tin e Ts and tout e T^, if T translates tin to tout then T' translates tin to tout- The simulation proceeds in cycles. During a cycle, a single transformation step of T is simulated, called the current transformation step. Such simulation consists of several transformation steps of T'. A cycle is driven by the cycle configuration that consists of three items: 1. current context node - the current input node of T during the current transformation step, 2. current rule - the rule of T applied during the current transformation step, 3. matched rule call - a rule call of the current rule. During the whole simulation, the matched rule call represents the leftmost6 rule call, for which a match has been already found. At the beginning of the simulation, the current context node is the root node of the input tree, the current rule is the rule of T of the form (q0, a) ^ rhs where q0 is the initial state of T and a is the name of the root of the input tree. The matched rule call is the left sentinel rule call which is a virtual rule call positioned to the left from all other rule calls. This special rule call is used to initialize a new cycle. In case no error is encountered, a cycle includes two phases - an evaluation phase and a generation phase. Phase Alternation. During the evaluation phase the input head of T' traverses the subtree at the current context node in preorder, and at the same time it evaluates all selecting expressions in the rule calls of the current rule. The evaluation is accomplished in a standard way by means of finite automata7. Three cases are distinguished depending on the result of the evaluation phase: 1. A matching node is found for exactly one rule call, and this rule call (newly-matched rule call) is either positioned to the right of the matched rule call or it equals the matched rule call. This type of cycle is called an entering cycle since it takes place when the input head of T' is moving downwards, and a new rule call of the current rule is matched and "recursively" processed. The generation phase follows: The output head generates a specific part of the output fragment of the current rule. The part is a set of nodes that appear between the matched rule call and the newly-matched rule call. After the generation, the current cycle configuration is stored in the stack. The matched node becomes the new current context node. The rule of the form (q, a') ^ rhs where q' is the state in the newly-matched rule call and a' is the name of the matched node becomes the new current rule, and the left sentinel rule call becomes the new current rule call. A new cycle starts driven by the new cycle configuration. 2. A matching node is found for two or more rule calls, or a matching node is found for a rule call that is positioned to the left of the matched rule call. This situation occurs in case T is non-order-preserving and an error is reported. 3. No matching node is found and the whole subtree at the current context node has been traversed. This type of cycle is called a returning cycle since it takes place when the input head of T' is moving upwards, the processing of some rule call is finished, and the control moves back to the processing of the rule containing this rule call. The current rule is denoted by r. The generation phase follows directly: The last part of the output fragment of r is generated. The top stack configuration becomes the new cycle configuration, and the new cycle starts. 5 Design of XSLT streaming processor An automatic streaming processor for XSLT transformations is described which is based on the framework intro- 6The positions of rule calls are always considered with respect to the preorder of the rhs of the rule. 7This method was, for example, presented in the Y-Filter algorithm (5; 8). duced. The models within the framework are abstract, and thus the framework provides means to develop efficient streaming algorithms for XML transformation subclasses at abstract level, and to adapt them to an arbitrary general transformation language. First, the general issues regarding the framework implementation are described, and then an adaptation for the XSLT transformation language is discussed in more detail. 5.1 Framework Restrictions As mentioned in the previous section, the formal framework is restricted in several ways. Some of the restrictions can be easily overcome in the implementation, while others require more complex handling. 1. Restrictions on the XML document. Attributes and data values are associated with elements. They can be easily added to the implementation - if such construct needs to be processed, it is accessed using the same path like the parent element. On the other hand, if the construct needs to be generated in the output, the action is performed together with the generation of the parent element. 2. Restrictions on the selecting expressions. The simple selecting expressions used capture the typical problems that arise during the streaming location of the nodes in XML document (context references in predicates, backward axis). Other constructs must be handled separately - however, the techniques used for constructs included in our restricted set may be often exploited. Moreover, there has been already carried on a research on the streaming processing of large subsets of XPath language (see Section 6 for overview). 3. Restrictions on the general transformation language. A part of the restrictions in GXT results from the restrictions on selecting expressions, and others are caused by excluding certain general transformation constructs, such as loops, variables, functions. However, the GXT models transformations that reorder the nodes within an XML tree with respect to the document order, which is probably the most important issue in streaming processing of XML transformations if the specific issues concerning selecting expression evaluation are not considered. 5.2 Adaptation for XSLT Let us now describe the design of the prototype XSLT streaming processor. The GXT represents an abstract model for general transformation languages. Since our intention is to adapt the framework for the XSLT language, it does not need to be implemented directly. Instead, we are looking for a correspondence between restricted GXTs and XSLT subsets. The GXT models the XSLT transformations driven by the structure of the input document. Thus, each XSLT stylesheet consisting of a list of simple templates activated by structure and mode can be directly converted to GXT. The matching element of such simple template is referenced by a name only and the body of the template may contain several output elements (possibly nested) and calls for applying another templates. The template is called by a selecting expression and a mode. Specifically, an XSLT stylesheet xsl convertible to GXT consists of (1) an initializing template and (2) several rule templates. The initializing template sets the current mode to the initial state of the GXT. Each rule template can be directly translated to a single rule of GXT. It is of the following form. ... template body ... The resulting GXT rule r is of the form (q, name) ^ rhs Thus, the left-hand side consists of the element name in the match attribute and the state in the mode attribute. The rhs is created by translation of the template body as described below. The template body contains a sequence of (possibly nested) output elements and apply-templates constructs. An output element named name is specified directly as a pair of tags (alternatively, the element construct might be used): ...element content ... The apply-templates construct has a select attribute that contains a selecting expression, and a mode attribute that represents a state of the resulting GXT. Each apply-templates construct can be translated to a single rule call. For the case above, a rule call of the form (q', selexp) is obtained. The rhs of the rule r is created from the template body so that each output element corresponds to a single node of rhs and each apply-templates construct corresponds to a single rule call of rhs. The structure of rhs is determined by nesting of the output elements and apply-templates constructs in the template body. The resulting GXT is of the form T = (Q, S, A, qo, R) where - Q contains the modes appearing in xsl, - R contains the rules created by translation from particular rule templates as described above. Moreover, R contains rule of the form (q,a) ^ (q, child[*]) simulation restricted algorithm extended Gxr SXT J FRAMEWORK Input XIWL document XSLT adaptation implementation stylesheet !7 ^ Analysis Translation Streaming algorithm XSLT subset conversion class for IMPLEIVIENTATION extended SXT FOR XSLT Figure 6: An implementation of the framework for XSLT language. for each mode q and name a e S such that :csl does not contain a template matching a in the mode q. Such additional rules correspond to the XSLT implicit built-in rule templates8. In a similar way, XSLT subsets corresponding to restricted GXTs can be identified. According to the principle of the formal framework, a restricted GXT (GXT, ) can be simulated by some extended SXT (SXTe) such that the simulation algorithm is known. Then XSLT stylesheets from the XSLT subset associated with GXT, can be converted to SXTe using the simulation algorithm. The conversion can be performed automatically since the simulation algorithm exactly determines how to convert constructs of the given XSLT subset into the rules of SXTe. The resulting SXTe is constructed explicitly as an object and its method transform() performs streaming processing of the transformation specified by the stylesheet. The relation between the formal framework and the implementation for XSLT is shown in Fig. 6. 5.3 Modules of Streaming Processor To sum up, the streaming processor works in three steps (see also Fig. 7): 1. Analysis. The analyzer examines the constructs in the input XSLT stylesheet (both XPath constructs and XSLT constructs themselves). It checks whether there is specified an XSLT subset that allows all the constructs encountered. If there are more such subsets, the smallest one is chosen. 2. Translation. The translator creates an object for the extended SXT associated with the XSLT subset chosen. The creation is automatic, following the simulation algorithm provided for the XSLT subset. 3. Processing. The method transform() of the new SXT object is run on the input XML document. The streaming transformation performed is equivalent to the one specified by the input XSLT stylesheet. 8The built-in XSLT rules actually ensure that the resulting GXT is complete. Note that it is also deterministic since xsl cannot contain two templates matching the same name in the same mode by definition of the valid XSLT stylesheet. Output XML document Figure 7: Modules os the automatic streaming processor. 6 Related work Most of the earlier work was devoted to analyzing the streaming processing of the querying language XPath (1; 2; 6; 7; 14; 16; 22; 24). Recently, several streaming processors for the transformation languages XQuery and XSLT have appeared. XML Streaming Machine (XSM) (19) processes a subset of XQuery on XML streams without attributes and recursive structures. It is based on a model called XML streaming transducer. The processor have been tested on XML documents of various sizes against a simple query. Using XSM the processing time grows linearly with the document size, while in the case of standard XQuery processors the time grows superlinearly. However, more complex queries have not been tested. BEA/XQRL (12) is a streaming processor that implements full XQuery. The processor was compared with Xalan-J XSLT processor on the set of 25 transformations and another test was carried on XMark Benchmarks. BEA processor was fast on small input documents, however, the processing of large documents was slower since the optimizations specially designed for XML streams are limited in this engine. FluXQuery (17) is a streaming XQuery processor based on a new internal query language FluX which extends XQuery with constructs for streaming processing. XQuery query is converted into FluX and the memory size is optimized by examining the query as well as the input DTD. FluXQuery supports a subset of XQuery. The engine was benchmarked against XQuery processors Galax and AnonX on selected queries of the XMark benchmark. The results show that FluXQuery consumes less memory and runtime. SPM (Streaming Processing Model) (15) is a simple one-pass streaming XSLT processor without an additional memory. Authors present a procedure that tries to converts a given XSLT stylesheet into SPM. However, no algorithm for testing the streamability of XSLT is introduced, and thus the class of XSLT transformations captured by SPM is not clearly characterized. The effectiveness of the processors mentioned was examined only through empirical tests. The test results show that streaming processors tend indeed to be less time and space consuming than tree-based processors. However, since no formal characterizations of the transformation class captured were given, the results hold only for a few (typically one or two) XML transformations chosen for experiments. In other approaches (3; 13; 21), a new specification language is developed which supports streaming processing, and the streaming processor for this new language is designed. In all cases the connection to the commonly used transformation languages is not clearly stated and the computational complexity of the streaming processing is not addressed. 7 Implementation The formal framework introduced has been implemented on .Net platform. The pilot implementation includes the stack-based algorithm described in Section 4. The evaluation of the algorithm implementation shows that it is highly efficient in practice - it requires memory proportional to the depth of the input XML document. Since this depth is generally not depending on the document size and common XML documents are relatively shallow (99% of XML documents have fewer than 8 levels whereas the average depth is 4 according to (20)), the memory requirements for most of the XML documents are constant, independent to the document size. On the contrary, standard XSLT processors are tree-based and thus require memory proportional to the document size. We refer the reader to (11) for a more detailed description of the stack-based algorithm implementation and evaluation. 8 Conclusion A design of an automatic streaming processor for XSLT transformations have been presented. Comparing to other similar processors, the contribution of our approach is that the resource usage for streaming processing of particular types of XSLT transformations is known. Our processor includes several streaming algorithms, and it automatically chooses the most efficient one for a given XSLT stylesheet. The process of choice has a solid formal base - a framework consisting of tree transducers that serve as models both for the streaming algorithms and for the transformation types. In the future work, we plan to include algorithms for the local and non-order-preserving transformations to obtain a processor for a a large subset of practically needed XML transformations. We intend to demonstrate the usage of such processor by integration into the Trisolda semantic repository and carry out performance tests and comparison to other implementations subsequently. Acknowledgement This work was supported in part by the grant VEGA 1/3106/06. References [1] Z. Bar-Yossef, M. Fontoura, and V. Josifovski. On the Memory Requirements of XPath Evaluation over XML Streams. In PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 177-188, New York, NY, USA, 2004. ACM. [2] C. Barton, P. Charles, D. Goyal, M. Raghavchari, M. Fontoura, and V. Josifovski. An Algorithm for Streaming XPath Processing With Forward and Backward Axis. In Proceedings ofICDE 2003, 2003. [3] O. Becker. Transforming XML on the Fly. In Proceedings of XML Europe 2003, 2003. [4] G. J. Bex, S. Maneth, and F. Neven. A Formal Model for an Expressive Fragment of XSLT. Inf. Syst., 27(1):21-39, 2002. [5] N. Bruno, L. Gravano, N. Koudas, and D. Srivastava. XML & Data Streams. In Stream Data Management, pages 59-82. Springer Verlag, 2005. [6] F. Bry, F. Coskun, S. Durmaz, T. Furche, D. Olteanu, and M. Spannagel. The XML Stream Query Processor SPEX. In Proceedings ofICDE 2005, pages 1120-1121,2005. [7] Y. Chen, S. B. Davidson, and Y. Zheng. An Efficient XPath Query Processor for XML Streams. In Proceedings ofICDE 2006, pages 79-79, 2006. [8] Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer. Path Sharing and Predicate Evaluation for High-performance XML Filtering. ACM Trans. Database Syst., 28(4):467-516, 2003. [9] J. Dokulil, J. Tykal, J. Yaghob, and F. Zavoral. Semantic Web Repository And Interfaces. In International Conference on Advances in Semantic Processing SEMAPRO 2007. IEEE Computer Society, 2007. [10] J. Dvoräkovä and B. Rovan. A Transducer-Based Framework for Streaming XML Transformations. In Proceedings of SOFSEM (2) 2007, pages 50-60, 2007. [11] J. Dvoräkovä and F. Zavoral. An Implementation Framework for Efficient XSLT Processing. In Proceedings of IDC 2008, Studies in Computational Intelligence, Springer-Verlag, 2008. [12] D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, M. J. Carey, A. Sundarara-jan, and G. Agrawal. The BEA/XQRL Streaming XQuery Processor. In Proceedings of VLDB 2003, pages 997-1008, 2003. [13] A. Frisch and K. Nakano. Streaming XML Transformations Using Term Rewriting. In Proceedings of PLAN-X 2007, 2007. [14] P. Genevès and K. Rose. Compiling XPath for Streaming Access Policy. In Proceedings of ACM DOCENG 2005, pages 52-54, 2005. [15] Z. Guo, M. Li, X. Wang, and A. Zhou. Scalable XSLT Evaluation. In Advanced Web Technologies and Applications, LNCS 3007/2004. Springer Berlin / Heidelberg, 2004. [16] K. Jittrawong and R. K. Wong. Optimizing XPath Queries on Streaming XML Data. In ADC '07: Proceedings of the eighteenth conference on Australasian database, pages 73-82, Darlinghurst, Australia, Australia, 2007. Australian Computer Society, Inc. [17] C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier. FluXQuery: An Optimizing XQuery Processor for Streaming XML Data. In VLDB'2004: Proceedings of the Thirtieth International Conference on Very Large Databases, pages 1309-1312, 2004. [18] X. Li and G. Agrawal. Efficient Evaluation of XQuery over Streaming Data. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 265-276. VLDB Endowment, 2005. [19] B. Ludäscher, P. Mukhopadhyay, and Y. Papakon-stantinou. A Transducer-Based XML Query Processor. In Proceedings of VLDB 2002, pages 227-238, 2002. [20] I. Mlynkovä, K. Toman and J. Pokorny. Statistical Analysis of Real XML Data Collections. In CO-MAD'06: Proc. of the 13th Int. Conf. on Management of Data, pages 20-31, 2006. [21] K. Nakano. An Implementation Scheme for XML Transformation Languages through Derivation of Stream Processors. In Proceedings of the Second ASIAN Symposium on Programming Languages and Systems (APLAS'04), 2004. [22] D. Oltenau, H. Meuss, T. Furche, and F. Bry. XPath: Looking Forward. In Proceedings ofXMLDM Workshop, pages 109-127, 2002. [23] T. Pankowski. Transformation of XML Data Using an Unranked Tree Transducer. In EC-Web, pages 259269, 2003. [24] F. Peng and S. S. Chawathe. XPath Queries on Streaming Data. In Proceedings of ACM SIGMOD 2003, 2003. [25] J. Thatcher. Tree automata: An informal survey. In A. V. Aho, editor, Currents in the Theory of Computing, chapter 4, pages 143-172. Prentice-Hall, 1973. [26] W3C. XML Path Language (XPath), version 1.0, W3C Recommendation, 1999. http://www.w3.org/TR/xpath. [27] W3C. XSL Transformations (XSLT) Version 1.0, W3C Recommendation, 1999. http://www.w3.org/TR/xslt. [28] W3C. Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation, 2006. http://www.w3.org/TR/REC-xml. [29] W3C. XQuery 1.0: An XML Query Language, W3C Recommendation, 2007. http://www.w3.org/TR/xquery. On Interchange between Drools and Jess Oana Nicolae, Adrian Giurca and Gerd Wagner Brandenburg University of Technology, Germany E-mail: {nicolae, giurca, G.Wagner}@tu-cottbus.de Keywords: Drools (aka JBossRules), Jess, RuleML, R2ML, RIF, Rete, ReteOO, business rules, interchange, standardisation Received: March 15, 2008 There is a growing demand forresearch in order to provide insights into challenges and solutions based on business rules, related to target PSMs (Platform Specific Model in OMG's MDA terms - Implementation Model). As an answer to these needs, the paper argues on the relevance of business rules target platforms for the actual IT and business context, by emphasising the important role of business rules interchange initiatives. Therefore, the rule-system developers can do their work without any concern about a vendor-specific format, and in particular without any concern about the compatibility between the technologies. The paper provides a description of the business rules translation from a particular object oriented rule-system such as Drools, to anotherrule-system as Jess coming from the AIarea, using R2ML as interchange language. The transformation preserves the semantic equivalence for a given rule set, taking also into account the rules vocabulary. Povzetek: Prispevk opisuje prenos pravil iz objektnega sistema Drools v AI sistem Jess. 1 Introduction There is a growing request for business rules technology standardisation from both UML and ontology architects communities. Due to these reasons, business rules aim to express rules in a platform independent syntax. A number of initiatives on rules interchange have been started. They include the RuleML (2), OMG Production Rules Representation (PRR) (8), RIF (1), and the REWERSE I1 Rule Markup Language (R2ML1) (10). We mention here the efforts to establish some standards for expressing business rules and their vocabularies in natural language such as OMG's SBVR (9) and Attempto Controlled English (ACE) (4). SBVR, this human readable format of business rules comes under OMG's Model Driven Architecture (MDA2) standards and is defined as Computation-Independent Model (CIM3). CIM is most frequently used in the context of the Model Driven Architecture (MDA) approach which corresponds the Object Management Group (OMG) vision of Model Driven Engineering (MDE). The Meta-Object Facility (MOF), is the OMG standard for Model Driven Engineering. The second layer in OMG's MDA is Platform-Independent Model (PIM)4 where rule interchange formats (i.e. RuleML, RIF, R2ML) try to accomplish their general purpose: a PSM to PSM business rules migration through the PIM level. The third MDA level is Platform- 1R2ML - http://oxygen.informatik.tu-cottbus.de/ rewerse-i1/?q=node/6 2MDA - Model Driver Architecture is a framework for distinguishing different abstraction levels defined by the Object Management Group. 3CIM - Computational Independent Model 4PIM - Platform Independent Model Specific Model (PSM5) containing rule specific languages together with their specific engines/platforms like: F-Logic (5), JRules(ILOG6), Jess7 or Drools8. The main purpose of an interchanging approach is to provide means for reusing, publication and interchange of rules between different systems and tools. Actually, it also plays an important role in facilitating business-to-customer (B2C) and business-to-business (B2B) interactions over the Internet. Moreover, an interchange approach always supposes less transformations than PSM-to-PSM translations. Our rule interchange work addresses Drools as source platform and Jess as a target platform, using the approach suggested by OMG's MDA, because these languages are actually in business market interest as popular business logic frameworks, used by Java developers to create complex rule-based applications by combining Java platform and business rule technology. Another reason for choosing these two rule systems is their efficiency in "pattern" matching, especially to handle updates to its working set of facts, as both Drools and Jess use an algorithm known as the Rete (i.e. Latin for "net") algorithm. Computational complexity per iteration of this algorithm is linear in the size of the fact base. The main standardisation communities, OMG9 and W3C10 focus their work on providing business rules specification languages for all MDA layers of models in order 5PSM - Platform Specific Model 6ILog, http://www.ilog.com ''Jess, http://herzberg.ca.sandia.gov/jess/ ^JBossRules, http://labs.jboss.com/jbossrules, 9OMG - http://www.omg.org/ 10W3C - http://www.w3.org/ to obtain rules interchange. Their standards are not sustained by most of business rules management system tools, as they implement proprietary rule languages. The reasons for this situation imply the existence of only a few interchange works in the academia i.e. RIF (1) language still has no well defined guidelines of how to implement the transformations and it also does not specify how to test the correction of the translation. In this context, EU network of Excelence REWERSE11 developed R2ML as an interchange language for deploying and sharing rules between different rule systems and tools (e.g. Object Oriented rule languages, Semantic Web rule languages, Artificial Intelligence rule languages). Actually, R2ML (now at version 0.5) is a mature and experienced enough rule interchange language to provide a concrete interchange format for different rule systems and languages (i.e.http://oxygen.informatik. tu-cottbus.de/rewerse-i1/?q=node/15). R2ML has a rich syntax, so it can represent business rules from both Drools and Jess languages, providing this way the interchange possibility. As an interchange language, R2ML addresses the PIM level. The main idea is to use a model transformation language (MTL), or an application transformation language (ATL) to transform a PIM model into a PSM as in the Figure 1. Business rules are built following a business model representation. In many cases, a business model is first represented in a natural language description based on core on-tologic concepts like classes and variables (OMG's MDA -CIM level). At this stage, we can identify all objects referenced in the rules, and for each object we identify all referenced properties. For each property, we identify all its constraints. 2 Drools to R2ML mapping In this section we describe the general JBoss business rules transformation into R2ML interchange language. Drools engine project, (now at version 4.0.x) is an open source and standards-based business rule engine and it uses an enhanced implementation named ReteOO12. Drools is classified as an Object-Oriented Production Rules engine written entirely in Java language, and more specifically it is a Forward-Chaining rule engine. A Production Rules System (i.e. PRS) relies on an Inference Engine that is able to scale to a large number of rules and facts. The Inference Engine matches facts and data, against PRs, also called Productions or just Rules, to infer conclusions which result in actions. The Rules are stored in the Production Memory. The facts that the Inference Engine matches against the rules are stored in the Working Memory. R2ML is a visual rule markup, XML-based language, 11 REWERSE - http://rewerse.net/ 12RETE adaptation for an object-oriented language, a descendant of the well-known RETE algorithm whose purpose is to capture rules formalised in different languages and to interchange them between rule systems and tools. It provides support for all kind of rules: - Integrity Rules - Derivation Rules - Production Rules - Reaction Rules A R2ML production rule has conditions and postconditions. The conditions and post-conditions of a R2ML production rule are usually interpreted as logical formulae which correspond to a general first order formula: quantified formula, existentially quantified or universally quantified (i.e. R2ML uses the concept of r2ml:QuantifiedFo rmula and by default, all R2ML formulae are universally quantified). Usually, PRS does not explicitly refer to events, but events can be simulated in a production rule system by externally asserting corresponding facts into the Working Memory. The R2ML production rules metamodel is depicted in the Figure 2: The mapping from Drools to R2ML is possible as R2ML supports the representation of the PRs by relying on the OMG's PRR (8) Specification. Following sections describe general principles of mapping from JBoss rules into R2ML PRs. 2.1 Mapping rules vocabularies Object oriented rules systems as Drools and ILOG JRules are build on top of Java vocabularies. Drools is designed to use Java beans as facts. These facts represent the domain of the rules, meaning the rules vocabulary. Java beans objects are defined by users in their applications. These objects inserted into Working Memory represent the valid facts that rules can access. Facts are the application data, meanwhile the rules represent the logic layer of the application. This vocabulary is used by rules through the import declarations, which are specified inside of the rules file (drl files or xml files). For example, a rule from Drools may use one or many Java beans classes in order to describe its own vocabulary. The Java bean classes represent a description of the facts used by the Drools rule engine. A R2ML rule always refers to a vocabulary which can be R2ML own vocabulary or an imported one (i.e. UML13, RDF(S)14 and OWL15 - see lines 3.-4. from Section 2.2 i.e. an example of the importing an OWL external vocabulary for an entire R2ML production rule set). R2ML vocabulary is a serialisation of an UML fragment of class diagrams. Below, we describe the corresponding translation class from a usual Java bean into R2ML elements of the vocabulary namespace with the help of the optionally element r2mlv:Vocabulary i.e. Since almost all names from 13UML - http://www.uml.org 14RDF(S) - http://www.w3.org/TR/rdf-schema 15OWL - http://www.w3.org/2004/OWL Figure 1: Interchanging between Drools and Jess. Figure 2: R2ML Production Rules Representation Metamodel. R2ML rule bases are qualified names (xs:QName), they must have declared the corresponding namespaces (i.e. see above lines 2.-5.). In the same manner, any Java qualified class name will be translated into a qualified name (xs:QName) together with the corresponding names declarations. For example, if we assume the Drools import declaration (i.e. a Java qualified name): org.drools.usecase.Cheese, this will translate into the following namespace declaration xmlns:ex="http://www.drools.org/usecase" used in the qual- ified name (i.e. ex:Cheese), in order to reference the class name. 2.2 Rule Sets Mapping All imported Java beans in Drools rule packages form the rules vocabulary. The set of Drools rules is individualized by its package namespace, declared at the beginning of the rules file, namespace that can be equal or can differ from Drools import declarations i.e. The Drools package of rules package org.drools.rules; import org.drools.usecase.Cheese; /* set of Drools rules */ finds its correspondent into the r2ml:ProductionRuleSet element. It contains three optional attributes: - r2ml:ruleSetID - is the name of the rule set. The name of the Java package of classes identifies in an unique way the name of a R2ML ProductionRuleSet (i.e. see line 2. from below R2ML code example). - r2ml:externalVocabulary - represents an URI of an external vocabulary. We used OWL to represent the vocabulary of the rule. - r2ml:externalVocabularyLanguage - refers the language of the external vocabulary. 1. Excepting the rules and their import declarations, a Drools package may contain other specific constructs like: glob-als, user-defined functions and queries, but they do not represent the subject of our translation. rule "<] when then end // java-like, single line comment # single line comment /* ... java-like, multi lines comment ... */ 2.3 Rule Mapping In Drools, a rule consists of the rule identifier, the conditions part called LHS (i.e. Left Hand Side) and the actions part called RHS (i.e. Right Hand Side). The general principles of mapping a Drools rule into a R2ML production rule is: Every JBoss production rule is translated into a r2ml:ProductionRule element. An optional element r2ml:Documentation can contain elements which comprise the rule text and also the representation of the rule in a specific rules language. A R2ML r2ml:ruleID production rule attribute is generated using the JBoss value. The r2ml:rule ID unique identifies a rule inside a rule set. A JBoss rule has a conditions part (i.e. when part) and an action part (i.e. then part). The condition part of a JBoss rule is mapped into the content of r2ml:conditions role element. The RHS part of a JBoss rule which contains multiple actions maps into the content of r2ml :producedActionExpr role element. The Drools language syntax also contains the comments expressed in Java-like syntax, such as: When translated into R2ML syntax, they map into the XML <[!CDATA[...]]> construct. For example: 1. 2. 3. <[!CDATA[ 4. JBoss rule expressed in natural language... 5. ]]> 6. 7. In the following lines we describe the mapping of Drools conditions into R2ML appropriate ones. The LHS (i.e. when part) of a JBoss rule consists of patterns (i.e. columns) and eval as Conditional Elements (i.e. CE) in order to facilitate the encoding of propositional logic and First Order Logic i.e. FOL. The entire LHS of a Drools rule is in fact a tuple of facts (i.e. a tuple of patterns). Each pattern may have zero or more field constraints i.e. the pattern terms (see Figure 4). The and (i.e. &&) CE is implicit when the JBoss rule condition contains multiple patterns. Field constraints compare and assess the field values from the fact object instances. Drools facts from Working Memory are Java beans objects instances, therefore these field constraints can be accessed from the "no arguments" methods, also called the accessors (i.e. getters). 2.3.1 Mapping Drools patterns without Field Constraints A Drools pattern without field constraints, will map into the r2ml:ObjectClassificationAtom. For example, the following Drools pattern, which corresponds to universally quantified formula from classical logic: yic Cheese{7c) is expressed in Drools as following: $c: Cheese() This Drools pattern finds its R2ML translation into the below code. As an explanation, we mention that all the R2ML formulae are implicitly universal quantified: 1. 3. 5. Following RuleML, R2ML framework defines the generic concepts of variable. However, R2ML makes a clear distinction between object terms and data terms. Typed terms are either object terms standing for objects, or data terms standing for data values. The concrete syntax of first-order non-Boolean OCL (7) expressions can be directly mapped to R2ML abstract concepts of ObjectTerm and DataTerm, which can be viewed as a predicate-logic-based reconstruction of the standard OCL abstract syntax. The bounded variable c represents the value of the r2ml:name attribute of the corresponding term (r2ml:ObjectName and/or r2ml:ObjectVariable) and the name of the Java bean class (i.e. Cheese) is the value of r2ml:class attribute. The above Drools pattern can be declared inside rules conditions also without the c variable, such as: Cheese(), but to be able to refer to the matched facts, usually, the rules conditions use a pattern binding variable such as c (i.e. in Drools terminology we refer to it as a fact variable or declaration). Any JBoss variables translate into R2ML variables. Notice that the translation of the Drools variables into R2ML eliminates the $ symbol (used in Drools only as a notation convention) from the names of the variables. The JBoss fact variable used in the previous pattern example (i.e. c:Cheese() ) is mapped into r2ml:ObjectVariable using the value of r2ml:name="c " property to describe the variable name, which represents an instance of the Cheese class (see lines 3.-4.). The usage of this instance gives us the possibility to call properties and functions of Cheese class in the actions part of a JBoss rule. The optional r2ml:class property (see line 4. from the above example) specifies the type of the object variable (i.e. Cheese). An r2ml:ObjectVariable is a variable that can be only instantiated by objects. 2.4 Mapping Drools patterns with Field Constraints In many cases, a JBoss pattern (see Figure 3) may contain many field constraints, all of them referring to the same context variable. The Drools field constraints may be of the following possible types (i.e. string, numeric, boolean and date). When separated by the following operators (i.e. enumerated here in their priority order see also Figure 5): &&, || and , (i.e. comma), they form a Drools pattern formula. A Drools pattern formula translates into R2ML formula, using the R2ML simple/imbricated concepts of r2ml:qf.Disjunction and r2ml:qf.Con]unction (qf stands for "quantifier free") applied on R2ML atoms, in order to serialize the Drools CE || and &&, respectively. In the example below, we have two Drools patterns that in classical logic have the following representation, taking into account the operators order from Drools i.e. yic yip 37youngCheese (Personilp) A like{lp, lyoungCheese) A (Cheese(lc) A (typeilc, lyoungCheese) A price{7c) < 10) V bestBeforeilc) < "27 - Oct - 2010")) 1.$p:Person($youngCheese:like) 2.$c:Cheese(type == $youngCheese && 3. price < 10 M 4. bestBefore < "27-Oct-2010") The Cheese pattern (see lines 2.-4.) has three field constraints combined with a conjunctive connector (i.e. &&) and a disjunctive connector (i.e. ||). We mention that the comma represents by default the conjunctive logic operator. The pattern refers literal constraints used to match the facts (i.e. instances of Cheese class): type (i.e. string constraint), price (i.e. numeric constraint) and bestBefore (i.e. date type constraint). The valid operators that apply for the numeric and date operands are: The above Drools pattern translates into the following R2ML formula (i.e. the example below describes only the imbrication of the operators (&&, || ) inside the Drools pattern): 10. 11. 12. 14. 15. 16. /r2ml:qf.Disjunction> In the absence of the qf.Disjunction or qf.Conjunction, all atoms from the R2ML rule body are implicitly connected by conjunction. First pattern from the Drools example above (see line 1.) contains as field constraint a bound variable, called declaration. The JBoss variable $youngCheese is bound to the like property, so it can later constrain the type property of the Cheese class. Since the like property is a data type property (i.e. String), the R2ML mapping is an r2ml:AttributionAtom, while for an object type property would find its mapping into the r2ml:ReferencePropertyAtom. 1. 3. 4. 6. 7. 9. 10. 11. 13. 14. The Drools literal date type field constraints are represented into R2ML using XML qualified name xs:dateTime. The Drools numeric operators work analogous for this type of field constraint, so the serialization into R2ML code is also the r2ml:DatatypePredicateAtom i.e. -$c:Cheese(bestBefore<"27-Oct-2007")--> Another meaningful example of Drools field constraints is testing the equality or inequality of a property against the Java null value i.e. $c:Cheese(buyer == null) $c:Cheese(buyer != null) The corresponding formula from the classical logic would be: Cheese{7c) A -buyer{7c, ?t) 3?t Cheese{7c) A buyer{7c, ?t) Taking into account the above classical logical formula, the R2ML serialization results in the r2ml:EqualityAtom, having its meaning negated using r2ml:isNegated=true attribute. The child elements for the r2ml:EqualityAtom are object terms. Our example involves an r2ml:ReferencePropertyTerm with the attribute r2ml:referenceProperty="ex:Cheese.buyer" and a generated object term expressed using r2ml:ObjectVariable with the generated attribute value r2ml:name="t_24 53 58 9 9" and the r2ml:class="ex:Person" as the type of the buyer property. The second logic formula involves the existence of a Cheese fact into Working Memory, whose buyer property is not null. The R2ML first step in the R2ML serialization is the generation of an object term (i.e. the r2ml:ObjectVariable t_57685642) of ex:Person type which is bounded to buyer property. We use the r2ml:ReferencePropertyAtom element i.e. We have mentioned before that implicitly, the and operator binds the Drools patterns inside the rule condition. 1. 3. 6. 9. 10. 11. 14. 1. 3. 4. 6. 7. 8. 10. 11. //$p:Person() 1.$c:Cheese(buyer == $p, inStock == true) A Drools pattern containing a formula that implies only field constraints conjunctions, can be split into as many Drools patterns as field constraints it contains i.e. //$p:Person() 1.$c:Cheese(buyer == $p) 2.$c:Cheese(inStock == true) There are two possibilities to markup the above Cheese patterns (see lines 1.-2.): to use a conjunction of R2ML appropriate atoms or to use the r2ml:ObjectDescriptionAtom construct. First option implies the use of the r2ml:ReferencePropertyAtom in order to markup the first pattern (see line 1. from Drools example) and a r2ml:AttributionAtom for the data type boolean field constraint (see line 2. from Drools example) i.e.