Informatica An International Journal of Computing and Informatics Special Issue: Perception and Emotion Based Reasoning Guest Editor: Aladdin Ayesh The Slovene Society Informatika, Ljubljana, Slovenia EDITORIAL BOARDS, PUBLISHING COUNCIL Informatica is a journal primarily covering the European computer science and informatics community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international referee-ing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor from the Editorial Board can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the list of referees. Each paper bears the name of the editor who appointed the referees. Each editor can propose new members for the Editorial Board or referees. Editors and referees inactive for a longer period can be automatically replaced. Changes in the Editorial Board are confirmed by the Executive Editors. The coordination necessary is made through the Executive Editors who examine the reviews, sort the accepted articles and maintain appropriate international distribution. The Executive Board is appointed by the Society Informatika. Informatica is partially supported by the Slovenian Ministry of Science and Technology. Each author is guaranteed to receive the reviews of his article. When accepted, publication in Informatica is guaranteed in less than one year after the Executive Editors receive the corrected version of the article. Executive Editor - Editor in Chief Anton P. Železnikar Volariceva 8, Ljubljana, Slovenia s51em®lea.hamradio.si http://lea.hamradio.si/~s51em/ Executive Associate Editor (Contact Person) Matjaž Gams, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 219 385 matjaz.gams®ijs.si http://ai.ijs.si/mezi/matjaz.html Executive Associate Editor (Technical Editor) Drago Torkar, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +386 1 4773 900, Fax: +386 1 219 385 drago.torkar®ijs.si Rudi Murn, Jožef Stefan Institute Publishing Council: Tomaž Banovec, Ciril Baškovic, Andrej Jerman-Blažic, Jožko CCuk, Vladislav Rajkovic Board of Advisors: Ivan Bratko, Marko Jagodic, Tomaž Pisanski, Stanko Strmcnik Editorial Board Suad Alagic (Bosnia and Herzegovina) Vladimir Bajic (Republic of South Africa) Vladimir Batagelj (Slovenia) Francesco Bergadano (Italy) Leon Birnbaum (Romania) Marco Botta (Italy) Pavel Brazdil (Portugal) Andrej Brodnik (Slovenia) Ivan Bruha (Canada) Se Woo Cheon (Korea) Hubert L. Dreyfus (USA) Jozo Dujmovic (USA) Johann Eder (Austria) Vladimir Fomichov (Russia) Georg Gottlob (Austria) Janez Grad (Slovenia) Francis Heylighen (Belgium) Hiroaki Kitano (Japan) Igor Kononenko (Slovenia) Miroslav Kubat (USA) Ante Lauc (Croatia) Jadran Lenarcic (Slovenia) Huan Liu (Singapore) Ramon L. de Mantaras (Spain) Magoroh Maruyama (Japan) Nikos Mastorakis (Greece) Angelo Montanari (Italy) Igor Mozetic (Austria) Stephen Muggleton (UK) Pavol Navrat (Slovakia) Jerzy R. Nawrocki (Poland) Roumen Nikolov (Bulgaria) Franc Novak (Slovenia) Marcin Paprzycki (USA) Oliver Popov (Macedonia) Karl H. Pribram (USA) Luc De Raedt (Belgium) Dejan Rakovic (Yugoslavia) Jean Ramaekers (Belgium) Wilhelm Rossak (USA) Ivan Rozman (Slovenia) Claude Sammut (Australia) Sugata Sanyal (India) Walter Schempp (Germany) Johannes Schwinn (Germany) Zhongzhi Shi (China) Branko Soucek (Italy) Oliviero Stock (Italy) Petra Stoerig (Germany) JiHŠlechta (UK) Gheorghe Tecuci (USA) Robert Trappl (Austria) Terry Winograd (USA) Stefan Wrobel (Germany) Xindong Wu (Australia) Guest Editorial Special Issue on Perception and Emotion Based Reasoning 1 Introduction It is a great pleasure to present a special issue on perception and emotion based reasoning in Informatica: An International Journal of Computing and Informatics. This special issue emerged from a successful special session in IASTED AIA2002 conference on the subject of perception and emotions. It became evident during the discussions within the session that there is increasing research in computer modelling of cognitive aspects of human reasoning, hence this special issue. In our special issue we have 10 papers. These papers were selected by peer review from 15 submitted papers of high quality. The review process was long and strenuous. The papers of this special issue cover the aspects of modelling perception and emotions for reasoning about knowledge, environment, actions or a combination of these elements. Some of these papers are application-oriented whilst others are concerned with theory and algorithmic development. The techniques used are as varied. We see connectionist approaches, symbolic and hybrid techniques used. In the first paper, ordered alphabetically, Ayesh uses cognitive maps, a technique increasingly used by agents and robotics researchers, to represent the relationship between emotions, perceptions and objects. Formulas presented enable the mapping of the environment objects to perceptual and emotional models. Using this mapping, given perceptions are used to infer the agent's emotions towards a detected object and thus its reaction towards that object. An algorithmic translation of these formulas is provided. In the second paper, Byl studies the influence of emotions on perception. The result is the Emotionally Motivated Artificial Intelligence (EMAI) architecture. Within this architecture, Byl provides us with an interesting study of different emotions and related cognitive aspects such as pleasantness and responsibility that influence the agent's decision. A well-formulated study of emotions and their effects on perception is presented in conclusion. In the third paper, Lucas et al. deploy emotions as a technique to enhance neuro-fuzzy predicators. This paper is concerned with the use of neural nets for prediction. As neural nets require training, this training is achieved through an emotionally motivated learning algorithm. The results are demonstrated with comparisons to other, non-emotions based, neural nets and neuro-fuzzy models. In the fourth paper, Damas and Custódio merge neural networks and statistical methods to develop emotion-based decision and learning algorithms. Their interest is to develop an adaptive control system for robots and agents comprising memory resources and actions management subsystems. In the fifth paper, Davis and Lewis present in their paper some computational modeling of emotions. The paper's background comes from psychology with a focus on the relationships between emotions, goals and autonomy. These relationships are established through a 4-layered architecture. The technique used to represent emotions and provide the inferencing mechanism is symbolic. In the sixth paper, Fatourechi et al. present an emotions-based learning algorithm that uses emotional critics to direct the learning process. In contrast with the Damas and Custódio paper, the Fatourechi et al. technique uses fuzzy controllers as the basis to develop a control mechanism within and using multi-agents systems. Examples, simulation results and their analysis are provided. In the seventh paper, Gadanho and Custódio provide a contrasting view to the use of emotions in learning and robot control. In their paper, reinforcement learning is revisited in the light of emotions. Gadanho and Custódio present an interesting architecture that consists of adaptive system, perceptual system, behaviour system and goal system. The objective is to deal with tasks comprising multi-goals. In the eighth paper, Magas and Custódio extend the DARE architecture for multi emotion-based agents. The architecture consists of symbolic analysis, cognitive analysis and perceptual analysis layers as its main components. The architecture is developed into a multi-agents environment and results of experiments are presented. In the ninth paper, Neal and Timmis pose a question about the usefulness of Timidity as an emotional mechanism to control robots. This mechanism is biologically motivated and uses neural networks for modelling. Comparisons with traditional neural nets are made and experimental results are presented. In the last paper, Rzepka et al. use emotions in information retrieval agents. They present an interesting application of emotions contrasting with other papers. The emotional agents are used over the Internet to assist in searching the World Wide Web. It attempts to provide human-like interfacing, using techniques such as conversation, for user profiling. Experiments and results are presented. There are a variety of applications and techniques presented in the papers of this special issue. Some techniques are developed especially for cognitive modeling whilst others are re-working of established and proven techniques. Robots and agents, understandably, seem to dominate the experimental side of most of the presented papers. This special issue is targeted for researchers working in cognitive modeling and humanlike machines - e.g. cognitive and humanoid robots and intelligent embedded agents. It would also be of interest to software engineers who are developing alternative solutions using artificial intelligence techniques. Acknowledgment The guest editor would like to express his appreciation to the participant authors for their high quality work. Also many thanks go to the referees: Franz Kurfess, Zhongzhi Shi, George Tecuci, Marco Botta, Se Woo Cheon, Ralf Birkenhead, Peter Innocent, Jenny Carter, and John Cowell, for their careful consideration and reviewing of papers in support of this special issue. Finally, the guest editor is indebted to Marcin Paprzycki and Matjaz Gams for their support and guidance during the preparation of this special issue. Aladdin Ayesh Perception and Emotion Based Reasoning: A Connectionist Approach Aladdin Ayesh Email: aayesh@dmu.ac.uk Center for Computational Intelligence, De Montfort University, The Gateway, Leicester LE1 9BH UK Keywords: Connectionism, Cognitive Maps, Perception, Emotions, Automated Reasoning, Reasoning about Actions. Received: October 18, 2002 Our reasoning process uses and is influenced by our perception model of the environment stimuli and by our memorization of related experiences, beliefs, and emotions that are associated with each stimulus, whilst taking in considerations other factors such as time and space. These two processes of modelling and memorization happen in real time while interspersing with each other in a manner they almost seem as if they are one process. This is often referred to as cognition. In this paper we provide a simplified model of this complicated relationship between emotions, perceptions and our behaviour to produce a model that can be used in software agents and humanized robots. 1 Introduction Human reasoning process is influenced by the perception model of the environment stimuli that humans often develop based on the memorization of related experiences, beliefs, and emotions that are associated with each stimulus, whilst taking in considerations other factors such as time and space. These two processes, i.e. modelling and memorization, happen in real time while interspersing with each other in a manner so they almost seem as if they are one process. This is often referred to as cognition. In this paper, we provide a simplified model of this complicated relationship between emotions, perceptions and our behaviour to produce a model that can be used in software agents and humanized robots. To do so, we deploy some aspects of psychology [1-4] and cognitive maps [5-7]. However, effective computational modelling of our cognitive faculties is very difficult. This did not prevent several attempts to provide reasoning systems that enable us to reason about actions and effects [8-11], about objects and time [9, 12-15], decisions and concepts [16-21] and so on. Many of these attempts suffered either from limitation in modelling [22] or limitation in practical inferencing [23, 24]. However, we can divide these attempts into two main types: connectionist approaches and logicians' approaches. In this classification, we exclude the attempts that may have been successful in their respective domains (e.g. [25]) without formal theory or formalized explanation that may lead to some generalization. The main body of the paper examining inferencing under a set of perceptions and associated emotions to trigger eventually a reaction in response to environment's objects or stimuli. Consequently, a connectionist approach is presented here to develop a reasoning mechanism that models and use perceptions and emotions factors. It is a connectionist approach in the sense that it is based on Fuzzy Cognitive Maps [5, 6] to represent the environment and relations between different components that define this environment such as objects, concepts, and features. The meaning of these relations between the different components is interrelated to perceptions and emotions. The result is presented as Triangular Object Modelling (TOM) technique. The inferencing process is then studied and analysed. We conclude with a critical analysis of the technique highlighting future developments. 2 Modelling Techniques Studying human's mind and cognitive faculties, e.g. recognition and learning, may take two routes [26]. The first route is to look at the mind as a symbols manipulation processor (e.g. [10, 27]). The second route is to view it as a signal processor formed of a web of connections [18, 26]. 2.1 Symbol Manipulation Techniques The symbolic route, which is computationally represented by symbolic artificial intelligence, is dominated by the logic approach. In this approach, logic theories [28-31] are used to explain and imitate our thinking and learning abilities by means of knowledge formation and inferencing [4, 8, 26, 32, 33]. Knowledge formation is often found in the form of logical theories of belief and knowledge [32, 34, 35]. It takes a philosophical approach to formally represent and reason about the notions of belief and knowledge. However, several computational intelligence researchers are currently eschewing more in favour of connectionist approaches or hybrid approaches such as fuzzy logic, belief networks, and cognitive maps [5, 18] because of the computational complexity of logical models, which limits the applicability of such models. Inferencing may be presented in the form of problem solvers [36] or in the form of AI planning [11]. These two fields are the more traditional of symbolic processing fields. Situation calculus is a well-known example of formal AI planning languages [10]. In addition, situation calculus is a good example of the difficulty of using formal logic as a representational and reasoning tool [11, 24]. 2.2 Signal Processing Techniques The signal-processing route [18, 26], which is often referred to as the connectionists approach, is based on the neuro-psychology and neuro-biology fields [2, 37, 38]. The study of the brain shows that it is formed of several millions of neuron cells connected to each other. Signals pass through these neurons producing different results. The connectionist approach aims to produce models of that web of neural connection to model our knowledge and inference processes. Connectionist models often require high computational resources to produce useful results [39]. However, the current surge in computational power provided with the development of fast processors and cheap fast memory enables such models to be deployed. This may also explain the increasing interest in network based models such as neural nets and cognitive maps. 2.3 Cognitive Maps Cognitive maps are graphical and formal representation of crisp cause-effect relationships among the elements of a given environment [5, 40]. They were originated in economics and political sciences. However, they are becoming increasingly popular with computational intelligence researchers especially a version blended with fuzzy logics [5-7, 40]. They are similar to neural nets in the sense that they consist of nodes that are linked together. However, they differ from neural nets, and for that matter from other graph-based approaches in the fact that they represent semantically defined relationships. The Fuzzy Cognitive Maps (FCM) version is represented in the form of fuzzy signed directed graphs with feedbacks. They model the world as a collection of concepts and causal relations between these concepts [5, 6]. This provides, in our opinion, the middle ground between the pure connectionist and the symbolic AI approaches. However, there are only a few studies in formal representation and inferencing using cognitive maps [5, 7, 41]. In this paper, we adapt a version of cognitive maps to enable our proposed Triangular Object Modelling (TOM) in which perceptions and emotions are interconnected to objects and factored in the reasoning mechanism that is explained in section 4. 3 Triangular Object Modelling (TOM) Our proposed technique represents each object that may reside in the memory of an agent as a triangular multi-layered cognitive map forming a Triangular Object Modelling (TOM). This triangular object modelling is multi-layered in the sense that each node of this cognitive map may consist of a cognitive map. This representation is used within a memory architecture we named Observer-Memory-Questioner (OMQ). 3.1 OMQ Memory TOM representation is used within the memory component of OMQ (Observer-Memory-Questioner) model, which was developed and presented in previous papers [42, 43]. The memory subsystem consists of short, surface, long and archive memory components. The design of these components was inspired by psychology research on human memory and its workings [1, 44-46]. The human memory [45, 46] is often represented as consisting of two components: short (or working) memory and long memory. In our design, we felt the need to introduce two additional components, namely surface memory and archive memory. The task of these two new memories is in support of short memory and long memory respectively. Archive memory represents the long-long memory, which is a step prior to 'forgetfulness'. On the other hand, surface memory is the background part of a working memory in which short memory is the frontier. In other words, surface memory is where all relevant information and experiences related to objects in the short memory to be stored. Whilst short memory would only contain information and experiences directly related to the objects of current interest to the agent or robot. The detailed description of these components and their workings was covered in [47] from which we borrow figure 1. Short Memory Archive Memory Long Me ory Figure 1 - Memo The following is a s definitions. Definition 1 - Short memi mind in which current objei with limitation of time and/o^ Definition 2 - Long memory mind in which objects' info relation to concepts, emotions Definition 3 - Surface memory is a mental organization of mind in which relevant information to current objects of the short memory is maintained. The organization of surface memory is prioritised according to time, emotions and/or relevance. ♦ Definition 4 - Archive memory is a mental organization of objects that is the result of a re-organization of the long memory in which objects' information is re-categorized and either maintained in relation to a concept or deleted. Consequently, the archive memory information has a low priority in the retrieval process. ♦ We discussed these components in further details in previous papers [47, 48]; however, Observer and Questioner components do not have complete implementation. 3.2 TOM Architecture TOM architecture is based on a technique that represents each object, which may reside in the memory of an agent, as a triangular multi-layered cognitive map. The three main nodes in that map are: object, perceptions and emotions. Each node may consist of one or more cognitive maps. The nodes in each of these maps are the elements that belong to one of the primary nodes. As an example, emotions are formulated from cognitive maps that links between different types of emotions such as safety-fear, like-desire, and so on. We formalize figure 1 in definition 5. Definition 5 - an object may be defined using TOM model as a tuple (P, E) where P is a set of perceptions and E is a set of emotions in which: P = (p1/^1, p2/^2, pn/^n) and p1 n p2 n _ n pn = 0; and E = (e1/^1, e2/^2, en/^n) and e1 n e2 n _ n en = 0; and Z E n P ^ 0.^ Definition 6 - given two objects Obj1 and Obj2 if Obj1 n Obj2 ^ 0 we say Obj1 and Obj2 are related to each other with strength equal to Pob]1 n Pobj2 ) ^ (Eobjl n Eobj2 ) . The strength of relationships is identified by a fuzzy value. The fuzzy value is driven from an extended interval [-1 0 1], unlike the standard fuzzy interval. Having a minus value of strength indicates that the relationship is of a type 'opposite' whilst having 1 indicates that the two nodes within the map are effectively the same. These facts are used to optimize the map by eliminating irrelevant or duplicated links. Next, we present two assumptions 'the weakest link' assumption and 'face off' assumption. Definition 7 - the weakest link assumption: we identify the weakest link to be a link with strength less than 0 in which case the link is eliminated. Definition 8 - the face off assumption applies only to objects that have a relationship strength of one between them. In this case, these objects are the same and they would be merged by a link of strength one. The links of one of these objects to its emotions and perceptions will be dropped, as they are accessible through its equivalent. The determination of the link strength is currently done by presetting a threshold by the user. Work is being carried out to enhance the system with fuzzy-genetic algorithms to enable automated updates and optimization. In figure 2, the model will receive as an input a set of perceptions, which will trigger a set of objects and emotions. However, objects are also associated with emotions and therefore may trigger emotions that were not triggered by the input perceptions. These emotions may activate some perceptions that can be used to guide the reaction of the model. At the current stage, there is not a great use of this fact. To facilitate this fact an imagination model needs to be developed. Such a model will enable TOM-based agents to build models of consequences and to construct future perceptions and emotions. Thereafter, these models can be used to derive action selection and behaviour. This leads to the question of inferencing that we cover next. Inference i Preset Figure 2 TOM's basic model. 3.3 Pedagogic domain A pedagogic domain has been devised here to help in demonstrating the TOM system and it's working. To simplify matters, we define a fixedly preset set of emotions, perceptions and objects. We limit these to the minimum. Firstly, emotions are fixedly preset to 'safe', 'like', and 'desired'. 'Safe' reflects the stress levels. 'Like' reflects attraction levels. 'Desired' reflects goal-oriented attention levels. Each of these emotions is defined as a set of two values: 'low' and 'high'. These two values can be defined in terms of exact or fuzzy sets. Perceptions are fixedly preset to three feature types that are 'size', 'colour', and 'shape'. Each one of these features takes one value from fixedly preset values. The collection of all these values describes the object to which these values relate. These preset values are as follows: Size (Si) = {large, small, medium} Colour (Co) = {bright, dark, grey} Shape (Sh) = {4-edged, 3-edged, many-edged, uneven} The values of these sets can be defined as fuzzy sets. This is discussed further in the implementation section on extending TOM using fuzzy sets. The artificial world domain constructed here contains three types of objects: 'predator', 'box' and 'food'. 'Predator' is perceived to be dangerous to our robot or agent. It is identified by being large with uneven shape. 'Box' is a desirable object, which is used by the robot to build a refuge from the predator. 'Food' is an essential part for the robot to sustain it's self. Figure 3 shows an example of the map that may be constructed to describe the object predator. Perceptions Size Colour Shape \ Large Small Emotions V Low Safe ■c \ ^ \ Like High Figure 3 Predator's map example. The perception of size refers to the perception of the predator being large from the agent's viewpoint rather than the actual physical size. Similarly, the emotion of safe refers to the feeling of being safe or less safe indicated by the level of emotional stress the agent may feel during the course of interacting with the environment's object, which is in this case the predator. Figure 3 can be summarized as follows: P = {Si, Co, Sh} Si = {La, Sm, Me} Co = {Br, Da, Gr} Sh = {F4e, T3e, M0e, U0e} E={Sa, Li, De} Sa = {Lo, Hi} Li = {Lo, Hi} De = {Lo, Hi} O={Bo, Fo, Pr} Bo = {(U, Br V Gr, F4e v F3e v M0e), (U, Hi, Hi)} FO = {(Sm V Me, U, U), (Hi, Hi, U)} Pr = {(La, Da, M0e v U0e)} Note that each object is defined by set of Perceptions and a set of Emotions. Perceptions are represented as a tuple of values (Si, Co, Sh). Each of the preset perceptions values connect to a preset emotion(s) as an example 'large-size' links to 'low-safe'. A fuzzy representation and membership function will be used to determine these links and their strength in the adaptive version of TOM. There is a globally defined value in the system, which is the undefined (U) value. This value is used, and consequently assigned, in the cases where none of the values given in the system definition applies. In other words, it defines the system ignorance of the presented information [49] or the information is unpredicted as it is the case of the box size in the given example where none of the possible size values takes a precedent over others. The semantics of this value is not our concern at this stage; therefore, we will assume it follows the Bochvar logic semantics as presented in [49]. 4 Inferencing in TOM Inferencing in TOM architecture relies on the perception input in determining two sets of intermediate outputs that are: set of objects and set of related emotions. The result is a set of pairs in which one is an object and the other is a related emotion. Each object may have more than one emotion associated. In this section, we discuss the inferencing process within TOM and its implementation. 4.1 Inferencing engine In definition 9, we provide the basic definition of inferencing within TOM. We use the entailment operator | - to notate the inferencing operation. Two forms of this entailment operator are used, we may refer to the first as a free form | — in which inferencing is done over factors free from association. The second form is a bound form | — p in which the inferencing is done under the binding element that is P in this case. Definition 9 - Given a set of input perceptions P if TOM | — P then TOM | — p (O, E) where O is a set of objects and E is a set of emotions. ♦ Definition 9 states that no inferencing can be done unless the input set of perceptions P is derivable from the model TOM. If so, then under the set of perceptions P TOM can entail a tuple of objects and emotions (O, E). We needed this condition as we limit our system to the preset perceptions. In future development we hope to waive this restriction. The following corollary clarifies the meaning of TOM | — p (O, E) further. Corollary 1 - Given a set of input perceptions P if TOM | -p (O, E) it means: Vp e P.((3 o e O v 3 e e E) v (3 o e O a 3 e e E)) whereby: p ^ o; p ^ e Where ^ identifies a connection between two map elements. ♦ It is clear that to implement this process some rules and restrictions need to be established. These rules aim to counter the possibilities of having the derived set of objects or derived set of emotions to be empty. The most likely is that the combination of the existing set of perceptions and their emotions do not lead to any objects. That means the system is encountering a new type of object. Rule 1 is countering this case by initiating a new object. Rule 1 Initiating new Object - Given P ^0, if | -p O = 0 and | - p E ^ 0 then new O to be asserted and to be associated with P and E. ♦ The system may encounter two conflicts: object conflict and emotion conflict. Definition 6 and rule 2 define and resolve the object conflict. Definition 7 and rule 3 define and resolve the emotion conflict. Definition 10 Object Conflict - Given P ^0, if | - p O ^ 0 with cardinal greater than 1and 3 p1 e P and 3 p2 e P whereby 3 o1 e O and 3 o2 e O we say there is an object conflict if the following conditions are true: p1 ^ o1; p2 ^ o2; o1 = - o2.^ Rule 2 Object Conflict Resolution - If there is an object conflict, determine the strength of connection and choose the highest strength in determining the derived object. ♦ Determining the strength of connection depends on various factors. At the current stage, we use a type of a fuzzy membership grade to determine the strength of individual connections and then choose the highest degree of membership. Definition 11 Emotion Conflict - Given P ^0, if | - p E ^ 0 with cardinal greater than 1and 3 p1 e P and 3 p2 e P whereby 3 e1 e E and 3 e2 e E we say there is an emotion conflict if the following conditions are true: p1 ^ e1; p2 ^ e2; e1 = - e2.^ Rule 3 Emotion Conflict Resolution - If there is an emotion conflict, determine the strength of connection and choose the highest strength in determining the derived emotion. ♦ In TOM, we cannot have a perception that does not have an emotion associated to it. However, some perceptions may not trigger any particular emotions on their own. These perceptions will have a neutral emotion, which we view as zero value of the perception-emotion based system. Definition 12 - Neutral emotion may be defined as zero value of emotions. In other words, if an object O or a perception P is not connected to an emotion, then we infer that the emotion is neutral. ♦ Rule 4 Neutral Emotion - Given P ^0 if 3p e P. V e e E. - (p ^ e) then p ^ N^ Theorem 1 - Given Pi ^0, | -p E = 0 iff TOM | - P = Proof - The proof of this theorem is intuitive and can be derived directly from Rule 4. Corollary 2 - Given P ^0, if | -p O = 0 it is not necessary the case that | -p E = 0. ♦ 4.2 Implementation In implementing TOM we define four classes. TOM agent acts as a controller that initiates the other parts of the TOM architecture. The other three classes are Perceptions, Emotions and Objects. These three classes are effectively cognitive maps with identifiers. TOM Agent TOM Object / \ TOM Perceptions TOM Emotions Perception Emotion Figure 4 TOM class diagram. Notice that TOM Perceptions and TOM emotions are composed of one or many perception and emotion objects respectively. TOM agent has five main operations that are initiate (I), initiate Object (IO), establish (E), assign (A) and derive (D). Initiate (I) is the agent constructor. Its job is to initiate the agent with the provided perceptions and emotions and provoke object initiation and assignment operators. Initiate Object (IO), establish (E) and assign (A), operations are used in handling new objects that are to be introduced into the system (Rule 1) or to a TOM-agent. It may be worth mentioning here that even though agents are used, which allows potentially for multi-agent systems to be developed, the current implementation only uses one agent. Multi-agent implementation will lead to several questions in relation of how these agents communicate, how would they be used in multi-agent controller for robots and so on. Derive (D) operator is the Inferencing operator by which the system retrieve the connected perceptions, objects and emotions given one of them, i.e. object, perceptions or emotions. Algorithm 1 shows how these operations may be used together to initiate a new object within the system and to associate that object to the relevant perceptions and emotions. Algorithm 1 - Initiate Agent - Given agent TOM-A the TOM-A operator initiates the agent as follows: TOM-A.IM = E(E, P); TOM-A.OM = IOO.^ Algorithm 2 - Establish emotions and perceptions E E(E, P) :-For every p ^ P For every e e E Request w E(E, e); M(e, p) |- L ® w; Return M.^ Algorithm 3 - Initiate Object IO Given if P.D (O) = 9 and P.D (E) then: I(new_O); Mo: A(new_O, E); A(new_O, P); Return Mo.^ Algorithm 4 - Assign Object A Given an object O and set of features F, which can be either a set of emotions or perceptions, A(new_O, F) will request a set of weights of relevance W and link between O and F as follows: For every f e F Request (w e W) Assert (O, f) |- L; // L is a link tag identifying the link between O and f. Assert (L, w).^ Algorithm 5 - Inferencing operation - D Given a set of perception P For every p P Do until p.connection (O) is empty { //O is the set of known objects Select (O,p) |- p.objects; } Do until p.connection (E) is empty { //E is the set of known emotions Select (E,p) |- p.emotions; } If p.objects contains more than one member Then Resolve-Object (p.objects, p.emotions, p); If p.emotions contains more than one member Then Resolve-Emotion (p.objects, p.emotions, 5 Future Work The system is by no means complete. The weaknesses lay in the restrictions we imposed on it. First, the system, at its current stage, cannot learn new emotions or perceptions. Secondly, it does not count for concepts and the more complicated knowledge structure proposed by OMQ model and our memory architecture [42, 43, 47]. However, TOM architecture answers the question of inferencing to some degree, which was not discussed in any of the previous work. 5.1 Extending TOM Representation In terms of representation, there are several extensions to be made. Firstly, a fuzzy representation using a modified version of fuzzy cognitive maps will be implemented. That will be extended later on to an adaptive version where the links are constructed, deconstructed and updated according to the robot perception of the environment. Problems to be addressed is the addition of perceptual experiences including definition of perceptions, objects and their associated emotions. 5.2 Extending TOM Inferencing In this paper, we focused on the inferencing operator in terms of identifying objects from perceptions and their associated emotions. As an extension on this work, a 'perception and emotions' based planner is to be constructed and consequently tested on robots. This would require the extension of TOM inferencing mechanism to enable the determination of actions to be executed given set of perceptions and the inferred objects and emotions. Other practical problems need to be addressed such as the limitation of sensors on the robots used. 5.3 Completing OMQ System TOM and its inferencing mechanism will be fed back into the completion of OMQ architecture design and implementation. Subsequently, the TOM model should be able to deal with more complicated knowledge structures. Sensors fusion is currently one of the problems to be addressed in completing OMQ system. In addition, TOM needs to be extended to allow learning of perceptions and emotions in order to widen its use [43]. 6 Conclusion In this paper, we attempted to answer the Inferencing question that emerged from previous work [42, 43]. Consequently, we presented Triangular Object Modelling (TOM) as a way of modelling objects in relation to perception and emotion. As a result, a connectionist approach using a modified version of cognitive maps has been developed to provide an inferencing method that utilises perceptions and emotions. Implementation of TOM and future developments was discussed. 7 References [1] A. Baddeley, Working Memory. Oxford: Clarendon Press, 1986. [2] R. D. Gross, Psychology: The Science of Mind and Behaviour. London, UK: Hodder & Stoughton, 1992. [3] H. C. Lindgren, Psychology, an introduction to a behavioral science. New York, London: Wiley, 1971. [4] A. L. Wilkes, Knowledge in Minds: Individual and Collective Processes in Cognition. UK: Psychology Press (of Erlbaum(uk) Taylor & Francis), 1997. [5] C. Carlsson and R. Fuller, "Adaptive Fuzzy Cognitive Maps for Hyperknowledge Representation in Strategy Formation Process," presented at International Panel Conference on Soft and Intelligent Computing, 1996. [6] B. Kosko, "Fuzzy Cognitive Maps," International Journal of Man-Machine Studies, pp. 65-75, 1986. [7] M. P. Wellman, "Inference in Cognitive Maps," SIAM Journal on Computing, vol. 36, pp. 1 - 12, 1994. [8] R. C. Moore, "A Formal Theory of Knowledge and Action," in Formal Theories of the Commonsense World, J. R. Hobbs and R. C. Moore, Eds. Norwood, New Jearsy: Ablex Publishing Corporation, 1985. [9] J. F. Allen, "Towards General Theory of Actions and Time," in Readings in Planning, J. Allen, J. Hendler, and A. Tate, Eds. San Mateo, CA, USA: Morgan Kaufmann Publishers, INC., 1990/1984, pp. 464-479. [10] J. McCarthy and P. Hayes, "Some Philosophical Problems from the Standpoint of Artificial Intelligence," in Readings in Planning, J. Allen, J. Hendler, and A. Tate, Eds. San Mateo, CA, USA: Morgan Kaufmann Publishers, INC., 1990, pp. 393-435. [11] A. Ayesh, "An Investigation Into Formal Models Of Change In Artificial Intelligence," in School of Computing and Mathematical Sciences. Liverpool, UK: Liverpool John Moores University, 1999. [12] A. Ayesh and G. Kelleher, "Aspects of Temporal Usability Via Constraint-Based Reasoning: The Use of Constraints as a Debugging Mechanism Within General Descriptions of The User's Possible Actions," presented at The IASTED International Conference on Artificial Intelligence and Soft Computing., Cancun, Mexico, 1998. [13] Y. Zhang and H. Barringer, "A Reified Temporal Logic For Nonlinear Planning," Manchester University, Manchester, Technical Report UMCS-94-7-1, July 1994. [14] A. Cesta and A. Oddi, "A Formal Domain Description Language for a Temporal Planner," presented at 4th Congress of the Italian Association For AI, Florence, Italy, 1995/1996. [15] I. Meiri and J. Pearl, "Temporal Constraint Network," AI 49, pp. 61-95, 1991. [16] C. Carlsson and R. Fuller, "Fuzzy Multiple Criteria Decision Making: Recent Developments," Fuzzy Sets and Systems, vol. 78, pp. 139 - 153, 1996. [17] N. Guarino, "Concepts, Attributes, and Arbitrary Relations," in Some Linguistic and Ontological Criteria for Structuring Knowledge Bases: citeseer.nj.nec.com/172862.html. [18] P. R. Van Loocke, The Dynamics of Concepts: A Connectionist Model. Berlin: Springer-Verlag, 1991. [19] A. Doan, "Modeling Probabilistic Actions for Practical Decision-Theoretic Planning," presented at The Third International Conference on Artificial Intelligence Planning Systems (AIPS 96), Edinburgh, Scotland, 1996. [20] J. Pearl, "From Conditional Oughts to Qualitative Decision Theory," presented at Uncertainly in AI 9th Conference, USA, 1993. [21] P. Haddawy and S. Hanks, "Utility Models for Goal-Directed Decision-Theoretic Planners," University of Washington, Seattle, USA, Technical Report 93-06-04, June 15, 1993 1993. [22] V. Lifschitz, "On the Semantics of STRIPS," in Readings in Planning, J. Allen, J. Hendler, and A. Tate, Eds. San Mateo, CA, USA: Morgan Kaufmann Publishers, INC., 1990, pp. 523-530. [23] R. Reiter, "The Frame Problem in the Situation Calculus: A Simple Solution (Sometimes) and a Completeness Result for Goal Regression," in Artificial Intelligence and Mathematical Theory of Computation: Papers in the Honor of John McCarthy, V. Lifschitz, Ed. San Diego: Academic Press, INC.; Harcourt Brace Jovanovich, Publishers, 1991, pp. 359-380. [24] R. Reiter, "Proving Properties of States in The Situation Calculus," AI 64, pp. 337-351, 1993. [25] P. Morris and R. Feldman, "Automatically Derived Heuristics For Planning Search," presented at AICS '89, Dublin City University, 1989. [26] A. C. Grayling, "Philosophy 2: Further Through the Subject," . New York, USA: Oxford University Press, 1998. [27] B. Smith and D. W. Smith, "The Cambridge Companion to Husserl," . Cambridge, UK: Cambridge University Press, 1995. [28] J. Dix, J. E. Posegga, and P. H. Schmitt, "Modal Logics For AI Planning," presented at First International Conference On Expert Planning Systems, Brighton, UK, 1990. [29] S. Haack, Philosophy of Logics. Cambridge: Cambridge University Press, 1978. [30] N. Rescher, Many-Valued Logic. New York: McGraw-Hill, 1969. [31] Y. Murakami, Logic and Scoiai Choice. London, New York: Routledge & Kegan Paul Ltd. Dover Publications Inc., 1968. [32] R. Turner, Truth and Modality for Knowledge Representation. London: Pitman Publishing, 1990. [33] M. Ayers, Locke: Epistemology & Ontology. London & New York: Routledge, 1993. [34] J. Hintikka, Knowledge and Belief: an Introduction to the Logic of the Two Notions: Cornell University Press, 1962. [35] D. Perlis, "Languages with Self-Reference II: Knowldge, Belief, and Modality," AI 34, pp. pp 179-212, 1988. [36] G. J. Klir, Architecture Of Systems Problem Solving. New York: Plenum Press, 1985. [37] J. Mira and F. Sandoval, "From Natural to Artificial Neural Computation," . International Workshop on Aritificial Neural Networks, Spain: Springer, 1995. [38] D. S. Levine, Introduction to Neural & Cognitive Modeling. London: Lawrence Erlbaum Associates, Publishers, 1991. [39] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd Edition ed. New Jersey: Prentice Hall, 1999. [40] R. Axelrod, Structure of Decision: the Cognitive Maps of Political Elites. Princeton, New Jersey: Princeton Universit y Press, 1976. [41] A. Ayesh, "Neuro-Fuzzy Concepts Maps (NFCM),". De Montfort University: Work in progress, 2002. [42] A. Ayesh, "Argumentative Agents-based Structure for Thinking-Learning," presented at IASTED International Conference Artificial Intelligence and Applications (AIA 2001), Marbella, Spain, 2001. [43] A. Ayesh, "Thinking-Learning by Argument," in Intelligent Agent Technology: Research and Development, N. Zhong, J. Liu, S. Ohsuga, and J. Bradshaw, Eds. New Jersey: World Scientific, 2001. [44] J. L. McClelland and D. E. Rumelhart, "Distributed Memory and Represenatation of General and Specific Information," in Human Memory: A Reader, D. R. Shanks, Ed. London: Arnold, 1997, pp. 273-314. [45] D. R. Shanks, "Human Memory: A Reader," . London: Arnold, 1997. [46] L. R. Squire, B. Knowlton, and G. Musen, "The Structure and Organization of Memory," in Human Memory: A Reader, D. R. Shanks, Ed. London: Arnold, 1997, pp. 152-200. [47] A. Ayesh, "Memory Architecture for Argumentation-based Adaptive System," presented at IASTED International Conference Applied Informatics (AI 2002), Innsbruck, Austria, 2002. [48] A. Ayesh, "Towards Memorizing by Adjectives," presented at AAAI Fall Symposium on Anchoring Symbols to Sensor Data in Single and Multiple Robot Systems, 2001. [49] A. Ayesh, "Self Reference in AI," in Computer Science Dept. Colchester: University of Essex, 1995. Emotional Influences on Perception in Artificial Agents Penny Baillie-de Byl, Department of Mathematics and Computing, University of Southern Queensland. penny.baillie@usq.edu.au Keywords: agents, emotion, perception, decision-making, affective computing. Received: October 17, 2002 This paper proposes a model of emotionally influenced perception in an affective agent. Via a multidimensional representation of emotion, a mechanism called the affective space is used to emotionally filter sensed stimuli in the agent's environment. This filtering process allows different emotional states in the agent to create dissimilar emotional reactions when the agent is exposed to the same stimuli. In humans, emotion cannot be a mechanism that enhances intelligence and then isolated from other psychological and physiological functions. As this paper suggests, emotion is an integral part of the human as a biological being and therefore they cannot be turned on an off on a whim. However, this luxury is afforded to the artificial agent. This approach builds on contemporary affective agent architectural concepts as it not only gives an agent the ability to use emotion to produce humanlike intelligence, but it also investigates how emotion should affect the agent's other abilities, such as perception. 1 Introduction The word agent is used within the AI (Artificial Intelligence) domain to refer to a number of different applications. The most popular use of the term pertains to an autonomous artificial being that has the ability to interact intelligently within a temporally dynamic environment. Just how the agent achieves its intelligent interaction has become a popular research topic. In the mid 1990s a small group of researchers became convinced that true human-like intelligence could not be modelled successfully in artificial beings without the inclusion of emotion-like mechanisms. Thus began the field of Affective Computing. Humans sense their environment with five (possibly more) senses for detecting external stimuli and others for tracking their internal states (e.g. hunger). Artificial agents must also implement a number of mechanisms that track not only their external environment but also their internal states in order to interact intelligently with their environment and other agents. Agents therefore, by their very nature perceive, however, is it for the same ends as human perception? Brunswik [1] wrote, "Perception (in humans), then, emerges as that relatively primitive, partly autonomous, institutionalized, ratiomorphic subsystem of cognition which achieves prompt and richly detailed orientation habitually concerning the vitally relevant, mostly distal aspects of the environment on the basis of mutually vicarious, relatively restricted and stereotyped, insufficient evidence in uncertainty-geared interaction and compromise, seemingly following the highest probability for smallness of error at the expense of the highest frequency of precision. " These characteristics of human perception are the exact abilities that Affective Computing researchers are attempting to achieve in artificial agents in order to increase information-processing efficiencies. These facilities emulate the human thought processes of flexible and rational decision making, reasoning with limited memory, limited information and relatively slow processing speed, social interaction and creativity. There are agents that can sense human emotional states [2], agents that can produce outward emotional behaviour [3, 4], and agents that are motivated by their emotions [5, 6]. The agents that internally represent emotional states for the purpose of goal setting and motivation [7] have mechanisms that perceive their environment and internal states for the purpose of calculating their emotions and producing appropriate behaviour. Most of us have heard the phrase, "his mind is clouded by emotion", but to what extent is this type of emotion-perception influence occurring in artificial agents? This paper presents the Emotionally Motivated Artificial Intelligence (EMAI) model whose sensory input is emotionally filtered before processing. It begins by examining the affect of emotion on perception in humans. Next, an overview of the agent architecture is given. This section includes details about the calculations used by an agent to determine emotional states. Following this, the factors of human perception are examined with respect to their emulation in the artificial agent. Next an example of the influence of emotional states on the agent's perception of environmental stimuli is discussed. This incorporates a brief overview of the agent's emotion-based decision making technique. The paper concludes with a summary of the agent's evaluation and a thought for future research. 2 The Molecules of Emotion and Perception Research has revealed that the neuropeptide receptors observed as responsible for emotional states and originally thought only to exist in the amygdala, hippocampus and hypothalamus have now been detected in high concentrations throughout the body. This has included the backside of the spinal cord, the nervous system's first synapse where bodily sensations and feelings are processed. Therefore, all sensory information passing between synapses via the emotion producing neuropeptides undergoes an emotional filtering process [8]. This operation assists in the brain's ability to deal with the deluge of sensory input that it receives. The nervous system also carries signals, not only from the body to the brain, but also from the brain to the body. Emotional states or moods occur when emotion-carrying peptides are produced in the body's neurons. The presence of different emotional neuropeptides can create dissimilar reaction in an individual when exposed to the same stimuli. Worthington, as reported by Malim [9], found that subjects consistently perceived dim spots of light containing consciously unreadable words with a higher emotional rating as dimmer than other words. In another experiment by Lazarus and McClearly [10], subjects were presented with a series of nonsense syllables. Electric shocks were administered to the subjects when particular syllables were shown and the anxiety level measured. Later, the subjects were exposed to the syllables at a rate faster than consciously perceivable. It was found that the syllables associated with the electric shocks raised the anxiety of the subjects. Leuba and Lucas [11] also conducted an experiment on perception and emotion involving the description of six pictures by three people when in each of three different emotional states. Each emotion was induced by hypnosis and then the pictures were shown. Interpretations of the scenes in the pictures related to the emotions of the viewer. For example, a picture of several university students sitting on the grass listening to the radio was interpreted by the same person as relaxing when they were happy, irresponsible when they were in feeling judgmental and competitive when they were anxious. These research examples confirm that human perception is filtered by emotions as Pert [8] suggests occurs at a molecular level. If perception is affected by emotion in humans, then surely artificial agents that attempt to model the emotional intelligence of humans must also represent the emotional filtering process of perception. However, as the emotional filtering process occurs in humans biologically, it is not an inherent process within a machine or piece of software. In affective agent architectures emotions have been modelled at the cognitive level based on a number of appraisal models of emotion [12, 13]. Therefore, the agent presented in this paper, implements the cognitive nature of perception. 3 The EMAI Architecture The Emotionally Motivated Artificial Intelligence (EMAI) architecture is a complex set of mechanisms that process emotional concepts for their use in affective decision-making and reasoning. A full elucidation of this architecture can be found in [14]. As the purpose of this paper is to examine the influence that emotion have on the perception of such an agent, the discussion will be limited to the parts the architecture that achieve this. A condensed overview of the parts of the EMAI architecture we are concerned with is shown in Figure 1. External Sensory Data Environment Sensory Processor ^Internal Sensory Data EMAI Agent Motivational Drive Generator Affective Space plans Intention Generator Goals Figure 1 A Summary illustration of the EMAI architecture There are two types of emotion mechanisms integrated in the EMAI architecture. The first mechanism emulates fast primary emotions [15] otherwise known as motivational drives. These drives can be classified according to their source, pleasure rating and strength. In an EMAI agent, these drives are used to initiate behaviour in the agent. They can include concepts such as hunger, fatigue or arousal. The strength of the drives is temporally dynamic and at particular threshold levels the agent will set goals, that when successfully achieved, will pacify the drives. For example, overtime the strength of the hunger drive will increase. At a certain point, the agent will become so hungry that it will set appropriate goals to ensure it obtains food. On the consuming of food the strength of the agent's hunger drive will decrease. The agent's goals are generated by a motivational drive generator which consists of drive mechanisms and a set of internal state registers representing the primary emotions. Each register is represented by a single gauge and stores the value for a particular drive, for example hunger. The number of internal state registers implemented depends on the application for which the EMAI agent is being used. The second type of emotion implemented in the EMAI architecture is secondary emotion. This category of emotion refers to the resultant mental (and in turn physical) states generated by attempts to satisfy the goals. These emotions include feelings such as happiness, anger, sorrow, guilt and boredom. Secondary emotions are represented in the EMAI architecture as values in the affective space. The affective space is a six-dimensional space defined by six appraisal dimensions. The affective space, based on the psychological model of Smith and Ellsworth [13], defines 15 emotions (happiness, sadness, anger, boredom, challenge, hope, fear, interest, contempt, disgust, frustration, surprise, pride, shame and guilt) with respect to the dimensions of pleasantness, P, responsibility, R, effort, E, certainty, C, attention, A and control, O. The values of the pure emotion points for each of the 15 emotional states in the model are shown in Table 1. Table 1 Mean Locations of Emotional Points (in the range -1.5 - +1.5) as Compiled in Smith and Ellsworth's Study Emotion P R C A E O Happiness -1.46 0.09 -0.46 0.15 -0.33 -0.21 Sadness 0.87 -0.36 0 -0.21 -0.14 1.51 Anger 0.85 -0.94 -0.29 0.12 0.53 -0.96 Boredom 0.34 -0.19 -0.35 -1.27 -1.19 0.12 Challenge -0.37 0.44 -0.01 0.52 1.19 -0.2 Hope -0.5 0.15 0.46 0.31 -0.18 0.35 Fear 0.44 -0.17 0.73 0.03 0.63 0.59 Interest -1.05 -0.13 -0.07 0.7 -0.07 0.41 Contempt 0.89 -0.5 -0.12 0.08 -0.07 -0.63 Disgust 0.38 -0.5 -0.39 -0.96 0.06 -0.19 Frustration 0.88 -0.37 -0.08 0.6 0.48 0.22 Surprise -1.35 -0.97 0.73 0.4 -0.66 0.15 Pride -1.25 0.81 -0.32 0.02 -0.31 -0.46 Shame 0.73 1.13 0.21 -0.11 0.07 -0.07 Guilt 0.6 1.13 -0.15 -0.36 0 -0.29 Each appraisal dimension (explained in the next section) is used to produce a six coordinate point that defines an agent's emotional state. Figure 2 illustrates the locations of the pure emotions with respect to the dimensions of pleasantness and control. Pleasantness versus Control A Interest ^Hope Surprise A O Sadness + Fear D . □ Frustration X Boredom ♦ Happines ■ Challenge iü Pride A Shame DisguistO •Guilt Contempt A Anger Figure 2 Empirical Location of Emotional States with Respect to the Pleasantness and Control Dimensions In addition to representing the agent's emotional state, the agent uses the affective space to associate emotions with all stimuli both internal to the agent (as internal sensory data) and within its environment (as external sensory data). The stimuli are perceived by the agent as part of an event. An event is a behavioural episode executed by the agent. Stimuli can be any tangible element in the agent's environment including the actions being performed, smells, objects, other agents, the time of day or even the weather. The sensory processor of the agent is where high level observation takes place. This information is filtered through the affective space before it is used by the agent to generate outward behaviour (determined by the intention generator). It is at this point that the information has been perceived. Therefore all information perceived by the agent is influenced by the agent's emotional state. Before a stimulus or event can be perceived by the agent, the agent must calculate an associated emotion for each. The emotion associated with a stimulus is determined by examining each of the appraisal dimensions with respect to the agent's last encounter with the stimulus. 3.1 Assessing the Appraisal Dimensions The six appraisal dimensions are orthogonal, and no single emotion can be identified without taking into account each of these six appraisal dimensions. Each of these dimensions will now be reviewed. 3.1.1 Pleasantness This dimension relates to an individual's expression of liking or disliking towards a stimulus be it an event, object or another agent. An EMAI agent assesses and updates the pleasantness dimension as an assessment of the affect of the stimulus on the agent's goals during an encounter. Pleasantness P is the average summation of the rating of pleasantness the agent has given to a stimulus each time the agent has come into contact with it: P = I p. / m (1) where m is the number of times the agent has come in contact with the stimulus, s, and ps is the pleasantness rating of s. For example, assume the agent has driven the same car five times. The first time the car performs as expected and the agent is pleased with the car. In this first instance the agent may set the pleasantness rating to 8 on a scale from 1 to 10 where 1 is unpleasant and 10 is very pleasant. The second time the agent drives the car it breaks down. For this instance the agent rates the pleasantness of the car as 2. For the next three contacts with the car the agent rates the pleasantness as 2, 7 and 9. After these five contacts with the car, the agent, using Eq. (1) assesses the overall pleasantness rating of the car to be (8+2+2+7+9)/5 which equates to 5.6. 3.1.2 Responsibility This dimension correlates with an individual's sense of personal involvement and amount of blame or credit attributed towards the self when interacting with a stimulus. The two extremes of measure are self- Unpleasant responsibility and others-responsibility. In the EMAI architecture, responsibility is a measure of the agent's relationship and attachment toward a stimulus. For example, the agent will calculate a high responsibility for an object where the agent considers it has ownership. More precisely, if the agent were in a team situation and within the team there were different ranks (for example, team leader, second in charge and third in charge), and the team was performing some task and the task failed then the team leader would feel most responsible, the second in charge less responsible and so on through the ranks. Responsibility R is calculated using the function r that returns the level of responsibility related to a stimulus. The function works by determining the nature of the relationship between the agent and 5. For example, if the agent were the owner of or in charge of 5, r(s) would return a high value. Responsibility is calculated as follows: R = r(s) (2) 3.1.3 Effort The values along this dimension are gauged from an agent's exertion with respect to a stimulus that affects the agent either mental or physical. In the EMAI architecture, effort is a function of the depletion of resources used when interacting with an object or performing an event divided by the length of time spent interacting. For an EMAI agent, at the beginning of performing an event the agent's physical state is recorded. During the event, the agent's physical state may be affected by the event's stimuli. At the completion of an event, the change in the agent's physical state is used to calculate effort. For example, if the agent were to perform the task of digging a hole, during the execution of the event the agent's physical state may deteriorate because the agent may get tired and hungry. When the agent finishes digging the hole, the change in tiredness and hunger during the task performance will be directly related to the effort involved in the event. Effort E is the averaged summation of the effort associated with a stimulus each time the agent has been in contact with it: E = ^ fs i=1 5 (3) where m is the number of times the agent has come in contact with the stimulus, s, and f is the amount of effort involved with s. 3.1.4 Attention This dimension is the rating of an individual's regard for a stimulus with respect to the level of concentration exerted towards it during interaction. All EMAI agents are programmed with a maximum attention capacity, and each of these agents can perform one or more tasks, that utilise its maximum capacity. In EMAI, attention is measured as the total amount of an agent's attention that is utilised in performing one or more events concurrently. Attention A is determined by averaging the summation of all the attention ratings the agent has assigned to a stimulus. Each time the agent is involved with a stimulus or event, the agent records how much concentration was exerted during the encounter and uses these values to calculate A: A = E a. / m (4) where m is the number of times the agent has come in contact with the stimulus, s, and a is the amount of attention required by the agent when involved with s. 3.1.5 Control This dimension refers to an agent's authority to manipulate and direct a stimulus. It assesses the agent's ability to control the role of a stimulus during the satisfaction of the agent's goals. Each EMAI agent is initially programmed with its control value over other stimuli as 0. Over time, as an EMAI agent evolves, its control values toward stimuli changes (either increases, or decreases) according to the outcome of performed behaviours. Almost always there will be one or more stimulus involved in an event. So initially, the EMAI agent's control over this event is 0, and the control over each of the individual stimuli involved in this event is also 0. Based on the outcome of this event (that is, success or failure), the control of the agent towards each of the stimuli involved in the event will be either increased by 1 if the event is successful or reduced by 1 otherwise. The overall control over the event is then the average of the control over each individual stimulus in the event. For example, assume that an agent is driving a car on the Princes Highway from Sydney to Melbourne. Here, there are four stimulus involved in this event. They are: the car, the Princes Highway, the source city Sydney, and the destination city Melbourne. Initially, the control the agent has over these four stimuli is 0. Now, assuming that this event was successful, the agent will increment the control value of each of these stimuli by 1, and calculate the overall control value towards this event to be the average control over all the stimuli in the event. In this case, it will be 1 ((1+1+1+1)/4). Further, assume the agent now performs the same event successfully at another time. Now, its control over each of the stimuli will be increased by 1, and the overall control towards the event will be 2. But, if on the third trip, the agent drives the car along Pacific Highway from Sydney to Brisbane, the initial control over Sydney and car is 2, respectively, but the control over Pacific Highway and Brisbane is 0, respectively. So, the overall control over this event will be 1 ((2+2+0+0)/4). If this event was successful, the overall control will be 2 ((3+3+1+1)/4). Control O is calculated by averaging the summation of the amount of control the agent has had over an stimulus or event during every encounter the agent has had with the stimulus or event: (5) O = E Vi=1 m i=1 where m is the number of times the agent has come in contact with the stimulus s, and o is the amount of control the agent had over s. 3.1.6 Certainty This dimension refers to an individual's assessment of a stimulus as to the reliability that its affects or behaviours can be predicted. This is calculated in an EMAI agent by considering the degree of success or failure the agent has had in past encounters with the stimulus. For example, if the agent has carried out 50 attempts at driving from Sydney to Melbourne on the Princes Highway, and it succeeded in 30 of its attempts, and failed the other 20, then the certainty of success on the 51st attempt will be 0.6 and the certainty of failure will be 0.4. Certainty C is calculated by determining the probability of success for the agent's involvement with a stimulus. Given that S is a function that returns the number of times that s has been used by or involved with the agent for a successful event, C can be calculated as: C = S (s)/ m (6) where m is the number of times the agent has come in contact with the stimulus s. 3.2 Expressing An Agent's Emotional State An agent's emotional state, n, can be expressed as, Q= {P.E ,C,A.R O} (7) Given n, the emotional state of the agent can be deduced, for the purpose of expression in natural language, by determining the distance that n is from each of the 15 pure emotion points. To do this a simple linear distance function is applied1. To determine a word that best describes the agent's emotion state of n the distance between n and each of the pure emotions (n1 _ n15) is calculated using Aq =\(P -Pj) + (e -Ej) + (c -Cj) + (A -Aj))+(R -Rj)'+(O -Oj)' (8) where 15 values are calculated for j = 1,..,15. The pure emotion closest to the agent's emotional state, expressed as a written word, Em, closest to n is then determined by using 15 (9) Em = emotion_ name(mm(U Aq )) j=1 where the function min returns the pure emotion point in closest proximity to n and the function emotionname converts the pure emotion point into a string. 1 While more complex distance functions could be implemented and examined, for simplicity this will not be examined further in this investigation of the EMAI architecture. For example, assume an agent with an emotional state point of n =[0.15, 0.87, 0.35, -0.3,0.1,-0.5]. To find the name of the emotion that best describes the item's emotional state, the first step is to find the distance between this point and the 15 pure emotion points in the Affective Space using Eq. (2). The results are shown in Table 2. Table 2 Distance Between Item's Emotional State and Pure Emotions in Affective Space Emotion Distance (An) Happiness 0.0208 Sadness 0.0222 Anger 0.0218 Boredom 0.0215 Challenge 0.0159 Hope 0.0146 Fear 0.0170 Interest 0.0211 Contempt 0.0168 Disgust 0.0174 Frustration 0.0193 Surprise 0.0268 Pride 0.0164 Shame 0.0080 Guilt 0.0084 It can be seen from Table 2 that the agent's emotional state can best be described as shame. 3.3 Assigning Emotion to Stimuli An EMAI agent primarily perceives a stimulus and associates an emotion with it based on how the agent assess the stimulus with respect to the six appraisal dimensions. The emotion associated with a stimulus, s, is expressed in the same manner as was used for the agent's emotional state in Eq. (7), thus Qs={Ps ' E. .Cs ' As ^ Rs ^Os} (10) As events occurring in an EMAI agent's environment rarely consist of just one stimulus, stimuli are rarely processed individually. Based on the outcome of processing an event, E, the agent will assign a weighting w, to the emotional state of each of the stimuli. As the weighting of a stimuli and resulting emotional state with respect to an event are dynamic, the time, t, at which the emotional state is being calculated must also be taken into consideration. Therefore, the emotional state resulting from an event E, written as Qe t is calculated as n Qe,,=Z Ws.t Qs.t (11) where n is the number of stimuli associated to event E, and n Z ws = 1 and 0 < ws After the event, each of the stimuli involved in the event have their emotional associates updated with respect to the change in the emotional state of the agent evoked by the outcome of the event, ^o,t+i. ^o,t+i represents the emotional state of the agent after an event has occurred where o stands for the outcome emotion and t+i is the time at which the event ended. This value is not the same as the emotional state assigned to the event after it has been executed. This is calculated later in this section. A change in the emotional state of the agent occurs when the values for each of the appraisal dimensions ( P, E, C, A, R, o ) are updated during and after an event. While each of six appraisal values of an individual stimuli involved in the event influence how these values are changed for the agent's emotional state, the final emotional state cannot be determined before the event occurs. The agent can only speculate. For example, an event the agent believes will make the agent happy may fail during execution, may take longer to complete than initially thought or may require extra effort. These factors would change the values of the appraisal dimensions independently of any influence over these values by the stimuli of an event or the event itself. The resulting emotional state in this example may be sad rather than the expected happy. Therefore, QO tt+1 cannot be calculated by combining the appraisal dimensions of the stimuli of an event, but can only be determined after the event has occurred. Only then can an analysis of the appraisal dimensions take place. This would include values from the appraisal dimensions of the stimuli in the event and also takes into consideration changes in the agent's physical and mental states. The new emotional state of the agent is used to update the values of the appraisal dimensions for each of the event stimuli. The agent attributes a change in its emotional state to be the result of the event and therefore, updates the emotional state of the event and its stimuli accordingly. Having said this, QO t+1 can be predicted by combining the appraisal of an event with the agent's current emotional state, thus QO,t+,=Q+ (12) where n is the number of stimuli associated with the event E, and 0 ^ we The change in the emotional state of an event is calculated using A„=QO,t+,-QE,t (13) After this has been calculated the emotional states for each stimuli in the event can be updated as Qs,t+1 = Qs,t + Ws,t+: Aa (14) Instead of the stimulus taking on the final emotional state of the event, the previous emotional state of the stimulus is taken into account along with the effect the stimulus had in the resulting emotional state for the event. If the event's resulting emotional state is the same as its initial state and ws,t = ws,t+1 then the emotional state for the stimulus will not change. 4 Results The way in which emotional state of an EMAI agent influences its perception and in turn its decision making process will now be examined. 4.1 Emotions Influencing Perception In psychology, perception is viewed as being influenced by motivation, emotion, social and cultural factors [9]. Each of these factors has the following effect on an individual: • readiness: a greater inclination to react to a stimulus • precedence: ensuring priority stimulus are processed before others • selection: the choice of one stimulus over another • interpretation: the effect of a stimulus is predicted before it is experienced An EMAI agent's perception is influenced by the factor of emotion and influences the agent's behaviour with respect to the four effects listed above. An agent's inclination to react to a particular stimulus is influenced by agent's current emotional state, its internal states, and the emotion the agent associates with the stimulus. For example, if the agent is hungry it will be ready to react to a food stimulus. The precedence that an agent gives to the processing of a stimulus is dependant on the agent's current emotional states and its internal state. The agent's internal state determines which of the agent goals have the higher priority. If there are a number of goals with the same priority, the agent determines which goal to attempt using emotion-based decision making. For example, if the agent would prefer to be happy, it will perform an action to satisfy the goal that would make it most happy. This also makes the agent prioritise the stimuli that are processed. Stimuli involved in the fulfilment of the agent's goals are given higher priority. The same processing of agent goals and priority also give the agent the ability to select between the stimuli that are processed. Finally, the way in which stimuli are associated with emotions in the EMAI architecture, give agents the ability to interpret or predict the effect that a stimulus will have on the agent. It is this prediction process that allows the agent to select which goal and stimuli to process. Whenever an EMAI agent encounters a stimulus, that stimulus is perceived with respect to the agent's current emotional state and the last emotion associated with the stimulus. For example, if in a previous encounter the agent associated a stimulus, s, with the emotion surprise, as, the next time it encounters the stimulus it will evaluate how the emotion surprise would affect the agent's current emotional state. Figure 2, shows an agent's emotional state at two independent time intervals. Q1 is happiness and Q1 is anger. Using Eq. (14) with a weighting of 0.5, the resulting perceived associated emotion is shown in Figure 3 as Qsj (surprise-happiness) when the agent is happy (Q1) and Qs,2 (close to challenge) when the agent is angry (Q2). Pleasantness versus Control ElQs Surprise X Boredom Aas,i A Shame Happiness ■ Challenge 0 Disguist * Guilt a Pride Ans,2 Contempt □ Anger [Ä|n2 Pleasant Unpleasant an event that would change the agents mood to happy from guilty. Lets consider two events, E1 and E2 where QE1 = [0.44,-0.5,-0.12,0.08,-0.07,-0.63] and QE' = [0.44,-0.17,0.73,0.03,0.63,0.7] QEi is best described as contempt and QE, is best described as fear using Eq. (8). If the agent were in a happy mood such that Q = [-1.49,0.09,-0.46,0.15,-0.33,-0.21] the predicted outcome of each event could be calculated using Eq. (12) (assuming a weighting of 0.5) thus Qo,^ = Q+ w., (Q-Qe„, ) Figure 3. The agent's emotional state and its perception of a stimulus. 4.2 Emotion-based Reasoning The way in which emotion affects the perception of stimuli in an EMAI agent also influences its decision-making process. This procedure is twofold. Firstly, the agent prioritises its behaviours by ordering them according to strength of the associated internal state register (representing a primary emotion). Secondly, the agent further orders its intended behaviours by calculating the resulting emotional effect that performing the behaviour would have on the agent's emotional state. Given a number of behaviours that have the same priority, the agent will select a behavioural event that will most likely update the agent's emotional state to a more preferred emotional state. For example, if the agent had two events of equal urgency from which to select, the agent would further prioritise these events emotionally. The agent calculates the emotional point for each event and then interpolates how this event, when performed, will update the agent's emotional state. If the agent would prefer to have an emotional state closer to happy it would select the event that would, when combined with its current emotional state, make the agent happy. Assume the agent is experiencing hunger according to a high level on the internal register representing this state. The prioritised goal would be eat. The agent's environment may contain a number of stimuli that could help satisfy the eat goal (e.g. apple and chocolate). If eating chocolate would make the agent more happy than eating the apple, the agent would choose the chocolate. As the agent's perception of the stimuli involved in a behavioural event changes with the agent's current emotional state, an event chosen that would change the agents mood to happy from angry will be different from o,E; Q ■■wE,„ = [-1.49,0.09,-0.46,0.15,-0.33,-0.21] + 0.5 X ([-1.49,0.09,-0.46,0.15,-0.33,-0.21] -[0.44,-0.5,-0.12,0.08,-0.07,-0.63]) = [-0.51,-0.205,-0.29,0.115,-0.2,-0.42] and a,,E' = Q+ WE',, (Q-QE',,) = [-1.49,0.09,-0.46,0.15,-0.33,-0.21] + 0.5 X ([-1.49,0.09,-0.46,0.15,-0.33,-0.21] - [0.44,-0.17,0.73,0.03,0.63,0.7]) = [-0.51,-0.04,0.135,0.09,0.15,0.245] The agent would predict, using Eq. (9) that E1 would make it feel happy and E2 would make it feel fear. These calculations are shown graphically in Figure 42. Pleasantness versus Control • nE2 + Fear A Interest • Hope Frustration A Surprise □ X Boredom Happiness X Challenge Shame 0 Disguist • Guilt X Pride ♦ n(0,Ei) ^E1 □ Contempt Anger A Pleasant Unpleasant Figure 4 The predicted resulting emotions from events Ej and E2 on a happy agent. If however, the agent was in a guilty mood, that is 2 Although, in this 2D figure, E2 appears to be closer to hope, linearly in six-dimensions it is not. Q = [0.6,1.31,0.15,-0.36,0, - 0.29 ] the outcomes from Ei and E2 would be perceived differently. Using Eq. (12) and in turn Eq. (9), Qo,,ei and Q0E2 would be described as shame with the point for Qo,E2 lying closer to pure shame than QO,Ei in the affective space. The emotions for El, E2, the agent and the predicted emotions are shown in Figure 5. Pleasantness versus Control 1 1 1 1 • nE2 1 + Fear 1 S, .c A nterest • Hope Frustration trn 1 b: A Surprise 1 1 >^o„,doS!(0,E2) l Happiness 1 Shame □ X Challenge ' 0 Disguist T!-0 1 1 Guigà Q § X Pride 1 ♦ n(0,Ei) 1 QE1« □ 1 Contempt 1 Anger 1 A Pleasant Unpleasant Figure 5 The predicted resulting emotions from events Ei and E2 on a guilty agent. It can be seen from this example that the agent's current emotional state influences how stimuli and their associated events are perceived by the agent. Once a prediction has been made as to the outcome of a number of events the agent can select from these which event will become the agent's outward behaviour based on a preferred emotional state. For example, if the agent would prefer to be in a happy emotional state, it would select the event that was predicted to make the agent's outcome emotion closest to happiness in the affective space. 5 Summary Emotions are a difficult concept to define let alone integrate into the domain of artificial intelligence. Emotions have been studied in fields such as philosophy, physiology, neurology and psychology. All have their own and often distinct ideas and models explaining how emotions are generated and affect behaviour. One distinct mechanism in all agent architectures is perception and if agents are to be designed that integrate emotion to enhance general intelligence, the influence that emotion has on the agent's senses should also be examined. The EMAI architecture was developed to examine the multidimensional appraisal of emotion and the use of such a construct in artificial emotion processing. As emotion and perception are complexly intertwined in humans, it was concluded that such a relationship should exist in an affective agent. Perception in an EMAI agent does not occur at the sensing level, however, all incoming sensory data is emotionally filtered through the affective space. The current emotional state of an EMAI agent affects how the agent perceives any incoming information. The evaluation of the use of the EMAI architecture to model a computerised character, presented in [14], determined if the model was sufficiently capable of using its set of highly integrated mechanisms for generating motivation, goal setting, emotional intelligence and event prioritisation and scheduling. The results gave positive feedback about the EMAI architecture's ability to produce reasonable emotional states and associated behaviours. The data also confirmed the agent's ability to set and execute goals using motivational mechanisms related to the agent's physical and mental states. What the future holds for the field of affective computing is unclear. As it is very much in its infancy, researchers need to continue to examine and assess the elementary concepts of emotion generation and emotion influence. No one theory stands out from the rest as being the ideal. The complexities of human emotions may be too extreme to include them holistically within an artificial intelligence at this time. Only those segments of emotional behaviour that are advantageous to the goals of an artificial being should be considered. References [1] Brunswik, E. (1956) Perception and the Representative Design of Psychological Experiments. 2d ed., rev. & enl. Berkeley: Univ. of Calif. Press. [2] Picard, R. W. (2000) Toward Computers that Recognize and Respond to User Emotion?, IBM Systems Journal, Volume 39, No. 3 & 4, 2000. [3] El-Nasr, M. S. (1998) Modeling Emotion Dynamics in Intelligent Agents, M.Sc. Dissertation, American University in Cairo. [4] Reilly, W. S. N. (1996) Believable Social and Emotional Agents. Ph.D. Dissertation, Carnegie Mellon University. [5] Canamero, D. (1997) Modeling Motivations and Emotions as a Basis for Intelligent Behaviour, in Proceedings of the First International Conference on Autonomous Agents, New York, 1997, ACM Press, New York, pp.148-155. [6] Padgham, L. & Taylor, G. (1997) A System for Modeling Agents having Emotion and Personality, Lecture Notes in Artificial Intelligence, SpringerVerlag. vol.12, no. 9, pp. 59-71. [7] Baillie, P. Lukose, D & Toleman, M. (2002) 'Engineering Emotionally Intelligent Agents', in Intelligent Agent Software Engineering, eds. V. Plekhanova & S. Wermter, Idea Publishing Group, Hershey. [8] Pert, C. B. (1997) Molecules of Emotion, Simon and Schuster, New York. [9] Malim, T. (1994) Cognitive processes. Attention, perception, memory, thinking and language . London: MacMillan. [10] Lazarus, R. S., & McCleary, R. (1951) Autonomic discrimination without awareness: A study of subception. Psychological Review, 58, 113-122. [11] Leuba & Lucas (1945) The effects of attitudes on descriptions of pictures', Journal of Experimental Psychology, American Psychological Association, Washington, vol. 35, pp. 517-524. [12] Ortony, A., Clore, G.L. & Collins, A., (1988) The Cognitive Structure of Emotions. Cambridge University Press, Cambridge. [13] Smith, C. A. & Ellsworth, P.C. (1985) Attitudes and Social Cognition, in Journal of Personality and Social Psychology, American Psychologists Association, Washington, vol. 48, no. 4, pp. 813838. [14] Baillie, P. (2002) The Synthesis of Emotions in Artificial Intelligences. Ph.D. Dissertation, University of Southern Queensland. [15] Koestler, A. (1967) The Ghost in the Machine, Penguin Books Ltd., London. Enhancing the Performance of Neurofuzzy Predictors by Emotional Learning Algorithm Caro Lucas Control and Intelligent Processing Center of Excellence, Electrical and Computer Eng. Department, University of Tehran, Tehran, Iran and School of Intelligent Systems, Institute for studies in theoretical Physics and Mathematics, Tehran, Iran lucas@,ipm.ir Ali Abbaspour, Ali Gholipour and Babak N. Araabi Control and Intelligent Processing Center of Excellence, Electrical and Computer Eng. Department, University of Tehran, Tehran, Iran aabbaspr@,ut.ac.ir, gholipoor@,ut.ac.ir, araabi@,ut.ac.ir Mehrdad Fatourechi Electrical and Computer Eng. Department, University of British Columbia, BC, Canada mehrdadf@,ece.ubc.ca Keywords: Emotional Learning, Prediction, Nonlinear Time Series, Neurofuzzy Model Received: October 8, 2002 Neural networks and Neurofuzzy models have been successfully used in the prediction of nonlinear time series. Several learning methods have been introduced to train the Neurofuzzy predictors, such as ANFIS, ASMOD and FUREGA. Many of these methods, constructed over Takagi Sugeno fuzzy inference system, are characterized by high generalization. However, they differ in computational complexity. The emotional Learning, which is successfully used in bounded rational decision making, is introduced as an appropriate method to achieve particular goals in the prediction of real world data. For example, predicting the peaks of sunspot numbers (maximum of solar activity) is more important due to its major effects on earth and satellites. The emotional learning based fuzzy inference system (ELFIS) has the advantages of simplicity and low computational complexity in comparison with other multi-objective optimization methods. The efficiency ofproposed predictor is shown in two examples of highly nonlinear time series. Appropriate emotional signal is composed for the prediction of solar activity and price of securities. It is observed that ELFIS performs better predictions in the important regions of solar maximum, and is also a fast and efficient algorithm to enhance the performance of ANFIS predictor in both examples. 1 Introduction Predicting the future has been an interesting important problem in human mind. Alongside great achievements in this endeavor there remain many natural phenomena the successful predictions of which have so far eluded researchers. Some have been proven unpredictable due to the nature of their stochasticity. Others have been shown to be chaotic: with continuous and bounded frequency spectrum resembling white noise and sensitivity to initial conditions attested via positive Lyapunov exponents resulting in long term unpredictability of the time series. There are several developed methods to distinguish chaotic systems from the others, however model-free nonlinear predictors can be used in most cases without changes. Comparing with the early days of using classical methods like polynomial approximators, neural networks have shown better performance, and even better are their successors: Neurofuzzy models [1], [2], [3], [4]. Some remarkable algorithms have been proposed to train the neurofuzzy models [4], [5], [6], [7]. The pioneers, Takagi and Sugeno, presented an adaptive algorithm for their fuzzy inference system [5]. Some other methods, including adaptive B-spline modeling [6] and adaptive network-based fuzzy inference system [7], fulfill the principle of network parsimony which leads to high generalization of performance. Generalization is the most desired property of a predictor. The principle of parsimony says that the best models are those with the simplest acceptable structures and the smallest number of adjustable parameters. Following the directions of biologically motivated intelligent computing, the emotional learning methodology has been introduced on the base of emotions which are argued, in contemporary psychology, to be better predictors of future achievements than IQ [8], [9]. The simulated approach is formulated on the base of an emotional signal which shows the emotions of a critic about the overall performance of the system. The emotional signal can be produced by any combination of objectives or goals which improve the estimation or prediction. The loss function will be defined as a function of emotional signal and the training algorithm will be simply designed to minimize this loss function. Thus the need for elaborated definitions of loss function in multi objective problems, which results in high computational complexity, is simply handled by defining an appropriate emotional signal. The cost which should be paid is that the result will be just satisficing rather than optimizing. As a result, the model will be trained to provide the desired performance in a holistic manner. The emotional learning algorithm has three distinctive properties in comparison with other learning methodologies. For one thing, one can use very complicated definitions for emotional signal without increasing the computational complexity of algorithm or worrying about differentiability or renderability into recursive formulation problems. For another, the parameters can be adjusted in a simple intuitive way to obtain the best performance. Besides, the training is very fast and efficient. As can be seen these properties make the method preferable in real time applications like control and decision making, as have been presented in literature [10],[11],[12],[13],[14],[15],[16],[17],[18]. In this research the emotional learning algorithm has been used in the purposeful prediction of some real world data: the sunspot numbers and the price of securities. In predicting the sunspot number time series, the peak points, related to solar maximum regions, are more important to be predicted than the others due to their strong effects on space weather, communication systems and satellites. Additional achievements are fast training of model and low computational complexity. The main contribution of this paper is to provide accurate predictions using emotional learning for Takagi Sugeno neurofuzzy model. The results are compared with other methods of training neural and neurofuzzy models like RBF and ANFIS. The paper consists of six parts; the main aspects of Takagi-Sugeno fuzzy inference system along with associated learning methods are described in the second section. The third section deals with the various forms of utilizing emotional learning in the prediction problem. The results of applying the proposed prediction method to benchmark time series are reported and analyzed in sections four and five. Finally, the last section presents some remarkable properties of emotional learning and some concluding remarks. 2 NeuroFuzzy models Two major approaches of trainable neurofuzzy models can be distinguished. The network based Takagi-Sugeno fuzzy inference system and the locally linear neurofuzzy model. The locally linear model is equivalent to Takagi-Sugeno fuzzy inference system (1) under certain conditions, and can be interpreted as an extension of normalized RBF network as well [2]. Therefore, the mathematical description of Takagi Sugeno neurofuzzy model which is the most general formulation will be described in this section. The Takagi-Sugeno fuzzy inference system is constructed by fuzzy rules of the following type Rule^ : If u1 = A^ And ... And up = Aip then y = f.(ui,U2,...,u^) Where i = 1...M and M is the number of fuzzy rules. u1,..., u^ are the inputs of network, each A^ denotes the fuzzy set for input u j in rule i and f. (.) is a crisp function which is defined as a linear combination of inputs in most applications }! = ®,.0 + (ou + w^ 2u2 + K + a)ipup (2) Matrix form ^ = aT (u)- W Thus the output of this model can be calculated by M ^^f ^u)^,(u) ; () ( ) (3) jy = - ' (u) = n^j(uj) (3) j=1 i=1 Where ju^ (uj ) is the membership function of jth input in the ith rule and ji (u ) is the degree of validity of the ith rule. This system can be formulated in the basis function realization which clarifies the relation between Takagi-Sugeno fuzzy inference system and the normalized RBF network. The basis function will be Ui ^u) (4) h (u ) = ■ (u ) j=1 as a result (5) (u )=1 j=1 This neurofuzzy model has two sets of adjustable parameters; first the antecedent parameters, which belong to the input membership functions such as centers and deviations of Gaussians; second the rule consequent parameters such as the linear weights of output in equation (2). It is more common to optimize only the rule consequent parameters. This can be simply done by linear techniques like least squares [2]. A linguistic interpretation to determine the antecedent parameters is usually adequate. However, one can opt to use a more powerful nonlinear method to optimize all parameters together. Gradient based learning algorithms can be used in the optimization of consequent linear parameters. Supervised learning is aimed to minimize the following loss function (mean square error of estimation): 1 N J=-1 Y(y(' )-y ( ))2 N i=1 where N is the number of data samples. According to the matrix form of (2) this loss function can be expanded in the quadratic form J = WtRW - 2WtP + YtYIN (7) Where R = (1/N)AtA is the autocorrelation matrix, A is the N x p solution matrix whose ith row is a(u(i)) and P = (1/N)ATy is the p dimensional cross correlation vector. From dJ dW = 2RW - 2P = 0 (8) the following linear equations are obtained to minimize J: RW = P (9) and W is simply defined by pseudo inverse calculation. One of the simplest local nonlinear optimization techniques is the steepest descent. In this method the direction of changes in parameters will be opposite to the gradient of cost function AW(i) = -^L). = 2P - 2RW(i) (10) dW (i) and (11) w (i +1)=w (i )+n-AW (i ) where n is the learning rate. Other nonlinear local optimization techniques can be used for this purpose, e.g. the conjugate gradient or Levenberg-Marquardt which are faster than steepest descent. All these methods have the possibility of getting stuck at local minima. Some of the advanced learning algorithms, that have been proposed for the optimization of parameters in Takagi-Sugeno fuzzy inference system, include ASMOD (Adaptive B-Spline modeling of observation data) [6], ANFIS (Adaptive network based fuzzy inference system) [7] and FUREGA (fuzzy rule extraction by genetic algorithm) [2]. ANFIS is one of the most popular algorithms that has been used for different purposes, such as system identification, control, prediction and signal processing. It is a hybrid learning method based on gradient descent and least square estimation. ASMOD is an additive constructive algorithm based on k-d tree partitioning. It reduces the problems of derivative computation, because of the favorable properties of B-spline basis functions. Although ASMOD has a complicated procedure, it has advantages like high generalization and accurate estimation. One of the most important problems in learning is the prevention of over fitting. It can be done by observing the error index of test data at learning iterations. The learning algorithm will be terminated, when the error index of test data starts to increase, in an average sense. Prevention of over fitting is the most common way of providing high generalization. 3 Emotional Learning Satisficing approaches to decision making has, is recent years, been widely adopted for dealing with complex engineering problems [18]. New learning algorithms like reinforcement learning, Q-learning, and the method of temporal differences [19], [20], [21], [22], [23] are characterized by their fast computation and in some cases lower error in comparison with classical learning methods. They can be interpreted as approximations to dynamic programming, which although furnishes a well known computational algorithm, via recursive solution of the Bellman-Jacobean-Hamilton equation and perhaps the best example of fully rational approach to decision making, is notorious for its computational complexity, sometimes referred to as the "curse of dimensionality" [24], [25]. Fast training is a notable consideration in control applications. Prediction applications also belong to the class of decision making problems where two desired characteristics are accuracy and low computational complexity. The Emotional learning method is a psychologically motivated algorithm which is developed to reduce the complexity of computations in prediction problems with particular goals. In this method the reinforcement signal is replaced by an emotional cue, which can be interpreted as a cognitive assessment of the present state in light of goals and intentions. The main reason of using emotion in a prediction problem is to lower the prediction error in some regions or according to some features. For example predicting the sunspot number is more important in the peak points of the eleven-year cycle of solar activity, or accurate prediction of the peaks and valleys in the price of securities may be desired. This method is based on an emotional signal which shows the emotions of a critic about the overall performance of prediction. The emotional signal can be produced by any combination of objectives or goals which improve estimation or prediction. The loss function will be defined just as a function of emotional signal and the training algorithm will be simply designed to decrease this loss function. So the predictor will be trained to provide the desired performance in a holistic manner. If the critic emphasizes on some regions or some properties, this can be observed in his emotions and simply affects the characteristics of predictor. Thus the definition of emotional signal is absolutely problem dependent. It can be a function of error, rate of error change and many other features. Finding an appropriate formulation for emotion is not usually possible; in contrast a linguistic fuzzy definition of it is absolutely intuitive and plausible. A loss function is defined on the base of emotional signal. A simple form is 1 N J = 2 K X es( )) 2 i=1 (12) where es(i) is the of emotional signal to the ith sample of training data, and K is a weighting matrix, which can be simply replaced by unity. Learning is adjusting the weights of model by means of a nonlinear optimization method, e.g. the steepest descent or conjugate gradient. With steepest descent, the weights are adjusted by the following variations: A® = -n dJ_ da (13) where n is the learning rate of the corresponding neurofuzzy controller and the right hand side can be calculated by chain rule: dJ dJ des dy da des dy da (14) According to (12): dJ_ des = K .es and -dy is accessible from (3) where f. (.) is a linear da function of weights. Calculating the remaining part, des 'dy is not straightforward in most cases. This is the price to be paid for the freedom to choose any desired emotional cue as well as not having to impose presuppose any predefined model. However, it can be approximated via simplifying assumptions. If, for example error is defined by e = y. - y where y^ is the output to be estimated, then des des (15) (16) dy de can be replaced by its sign (-1) in (14). The algorithm is after all, supposed to be satisficing rather than optimizing. Finally the weights will be updated by the following formula: dy M Z (U-) Aa = - K • n • es ^^^ = - K • n • es ■ da (17) (u ) The definition of emotional signal and the gradient based optimization of the emotional learning algorithm in neurofuzzy predictors are clarified among two examples in the next sections. 4 Predicting the Sunspot numbers Solar activity has major effects not only on satellites and space missions but also on communications and weather on earth. This activity level changes with a period of eleven years, called solar cycle. The solar cycle consists of an active part, the solar maximum, and a quiet part, solar minimum. During the solar maximum there are many sunspots, solar flares and coronal mass ejections. A useful measure of solar activity is the observed sunspot numbers. Sunspots are dark spots on the surface of the sun which last for several days. The SESC sunspot number is computed according to the Wolf's sunspot number ,R=A^(10g+s), where g is the number of sunspot groups, s is the total number of spots in all the groups and k is a variable scaling factor that indicates the conditions of observation. A variety of techniques have been used in the prediction of solar activity, most of which are based on the sunspot number time series. The sunspot number, which has been saved since 1700, shows low dimensional chaotic behavior and its prediction has been a challenging problem for researchers. However, good results are obtained by methods proposed in several articles [26], [27], [28], [29], [30]. In this research, both the monthly and the yearly averaged sunspot numbers are used to be predicted. Figure 1 shows the history of solar cycles on the base of yearly sunspot numbers. The error index in predicting sunspot numbers, similar to most of the previous studies, is the normalized mean square error (NMSE): NMSE = Z(y - y )2 1=1_ ZZ (y - y )2 (19) In which y, and y are observed data, predicted data and the average of observed data respectively. Figure 1: The yearly averaged sunspot number As the first observation, the emotional learning algorithm has been used to enhance the performance of a neurofuzzy predictor, initially trained by ANFIS. The emotional signal is computed by a linguistic fuzzy inference system with error and rate of error change as inputs. Five and three Gaussian membership functions, negative large, negative, zero, positive and positive large, are used for the inputs (error and rate of error change, respectively) and the emotional signal is calculated by a center of average defuzzifier from the rule base depicted by the surface in figure 2. rale of Error change -20D -20Ü Figure 2: The surface generated by linguistic fuzzy rules of the emotional critic i =1 1=1 There are seven Gaussian membership functions for the emotional signal as the output of fuzzy critic. The simulated fuzzy definition of the critic is motivated from our knowledge of emotions in human, and can be extended by inserting more inputs to the system. Figure 3 presents the targeted and predicted outputs of the test set (from 1920 to 2000). The lower diagram shows the results of best fitted data by ANFIS. The training is done with optimal number of fuzzy rules and epochs (74 epochs) and has been continued until the error of validation set had been started to increase. The other diagram shows the targeted and predicted values after using emotional learning. The emotional algorithm is used in one pass of the training data to fine tune the weights of neurofuzzy model which has been initially adjusted by ANFIS. The error index, NMSE, has been decreased from 0.1429 to 0.0853 after using emotional learning. The improvement of prediction accuracy, especially among the solar maximum regions, is noticeable. It's interesting that training ANFIS to the optimum performance takes approximately ten times more computation effort than the emotional learning to improve the prediction. Thus combining ANFIS with the emotional learning is a fast efficient method to improve the quality of predictions, at least in this example. Prediction of Sunspot number by ANFIS + Emotional learning 1340 1950 1360 1970 13B0 1930 Prediction of Sunspot number by ANFIS 2000 Figure 3: Enhancement in the prediction of sunspot numbers by emotional learning, applied to ANFIS: Targeted and predicted values; lower: by ANFIS, upper: by ANFIS + Emotional Learning The next results are reported as a comparison of the quality of predicting the monthly sunspot numbers by the Emotional Learning based Fuzzy Inference System (ELFIS) with some other learning methods, the orthogonal least squares learning for the RBF network and Adaptive Network based Fuzzy Inference System (ANFIS). All methods are used in their optimal performance. Over fitting is prevented by observing the mean square error of several validation sets during training. ELFIS is constructed over Takagi Sugeno fuzzy inference system. The emotional signal is computed by a fuzzy critic whose linguistic rules are defined by means of error, rate of error change and the last targeted output. By defining appropriate membership functions for each of the inputs and 45 linguistic fuzzy rules, the desired behavior of emotional critic is provided to show exaggerated emotions in the solar maximum regions. Figure 4 shows the surface generated by the fuzzy rules among the two dimensional space of the more important inputs (prediction error and last observed value of sunspot number). The emotional signal is used as the input to the learning formula (17) where the weights of neurofuzzy model (2) are adjusted. Just three Sugeno type fuzzy rules, like (1), are used in ELFIS to comply with the principle of parsimony. As a result, the matrix of adjustable weights has 9 elements (three weights for the three inputs of each rule). The specifications of methods, NMSE of predictions and computation times (on a 533 MHz Celeron processor) are presented in Table 1. It is observed that learning in ELFIS is at least four times faster than the others and is more accurate than ANFIS. Note that using a functional description of emotional signal rather than the fuzzy description generates a faster algorithm, but finding such a suitable function is not easy. Prediction Error Figure 4: The surface generated by linguistic fuzzy rules of the emotional critic in ELFIS; in the prediction of monthly sunspot numbers Table 1: Comparison of predictions by selected neural and neurofuzzy models Specifications Computation Time NMSE ANFIS 8 rules and 165 epochs 89.5790 sec. 0.1702 RBF 7 neurons in hidden layer 84.7820 sec. 0.1314 ELFIS 3 Sugeno type fuzzy rules 22.3320 sec. 0.1386 Figures 5 to 7 show the predictions by RBF network, ANFIS and ELFIS respectively. These diagrams are a part of test set, especially the cycle 19 which has an above average peak in 1957. It's observable that ELFIS generates the most accurate predictions in the maximum region; however, NMSE of RBF is the least, indicating that RBF generates more accurate predictions through the total test set. By modifying the validation sets affecting the stop time of Predicting the monthly sunspot number by RBF Network 1350 1355 1360 1365 Prediction Error of monthly sunspot number by RBF Network 1365 Figure 5: Predicting the sunspot numbers by RBF learning procedure, even better NMSEs can be obtained in RBF, but this results in higher prediction errors especially in 1957. Predicting the monthly sunspot number by ANFIS 1350 1355 1360 1365 Prediction Error of monthly sunspot number by ANFIS 1365 Figure 6: Predicting the monthly sunspot numbers by ANFIS 5 Predicting the Security Price The second example is the prediction of securities such as stocks, treasury bonds and government bonds; etc. If there is a predictor that predicts the future exactly; then the best investment is on the maximum rate of return. For this reason, the performance of prediction is significant. Some researchers have used neural networks e.g. MLP and RBF for the prediction of securities. In this research, the emotional learning algorithm is applied to the network initially trained by ANFIS to predict the stock price of General Electric (GE) in S&P index 500. For this case one can use 300 200 Predicting the riionthly sunspot number by Emotional Learning 1350 1355 1360 1365 Prediction Error of monthly sunspot number by Emotional Learning 100 50 0 -50 -100 _____________________J_______I_____ ; ' i\ li i i 1350 1355 1360 1365 year Figure 7: Predicting the monthly sunspot numbers by ELFIS various definitions of emotional signal, as a function of prediction error and differential of error, or even any significant event like crossing of spot price with some well monitored moving average. Here the emotional signal is taken as the output of a linguistic fuzzy inference system with the error and the rate of error change as inputs. Five and three Gaussian membership functions are used for the inputs respectively. Figure 8 shows the surface generated by the fuzzy rules of emotional critic. 0.6-+. - -0.4., -0.2, E -0.4 -0.6 2 'I' ' '/AA AXW./ YvvS^ rate of Error change -2 -2 Figure 8: The output surface of a linguistic fuzzy inference system for producing emotional signal. In this research, the daily closed price for the stock is considered. The model parameters, number of regressors and number of neurons are optimized to prevent over fitting. The stock price of 800 days and the price of 400 following days are used for train data and test data respectively. The result of predicting the stock price by ELFIS is presented in figure 9. 2 Error the price of GE per share Figure 9: Predicting the security price using emotional learning plus ANFIS Table 2 presents a comparison of the quality of predicting the daily closed stock price of General Electric (GE) by ELFIS with some other networks such as ANFIS, RBF, and MLP. These results are obtained by considering the over fitting and the optimal neurons in the hidden layer (on a 1.8 GHz Celeron processor). As this practical example shows, the emotional learning algorithm provides more accurate predictions with lower computational complexity. Table 2: A comparison of various neural and Specifications Computation Time NMSE MLP 37 neurons in hidden layer 6.5600 sec. 0.0347 ANFIS 12 rules and 257 epochs 13.8390 sec. 0.0370 RBF 31 neurons in hidden layer 7.2000 sec. 0.0395 ELFIS 12 Sugeno type fuzzy rules 1.8320 sec. 0.0358 6 Conclusion Training a system to make decision in the presence of uncertainties is a difficult problem especially when computational resources are limited. Supervised training can not be used because the desired values for the decision variables are unknown. However, the desirability of past decisions can usually be assessed after the outcomes of their implementations are observed. Therefore, unsupervised training methods that do not utilize those assessments can not take full advantage of the available knowledge. Several approximate methods like back propagation through plant, and identification of the plant or (pseudo) inverse plant model have been successfully used in the past couple of decades [31], [32], [33], [34]. Behavioral and emotional approaches to control and decision making can also be classified in this category [35]. Besides providing biological plausibility, they have the extra advantage of not being confined to cheap control problems like set point tracking [10]. The emotional approach is a step higher in the cognitive ladder and can be more useful in goal-aware or context-aware applications (e.g. dealing with multiple objectives in decision problems even when the objectives are fuzzy or can not be differentiated or directly evaluated with simple mathematical expressions). The main contribution of this paper is the application of those ideas to prediction domain. Although prediction is easier to deal with because we do not have the further complexity of unknown plant, and so the proposed learning methods should also be compared to error minimization methodologies, model free prediction has also become of great importance in the past few decades, and there have been many past efforts to train neuro- and/or fuzzy predictors with alternative loss functions. In this paper, we have used the emotional learning interpolation to two very important benchmark problems. The motivation is not confined to achieving computational efficiency or improving the total prediction accuracy. In both problems, achieving more accurate results in desired regions or according to some important features is an important goal towards which some increase in error indices alongside the total test set can be tolerated. Specifically, one wishes to improve the prediction quality of solar activity (the sunspot number time series) in solar maximum regions (the peak points of sunspot number) at the expense of the prediction accuracy in less interesting regions. In the case of stock market prediction too, the quality of predictions in trend reversal regions (peaks and valleys) are of greater importance for supporting investment decisions. The achievements reported in this paper are twofold. On the one hand, excellent prediction quality has been achieved for the two different benchmark problems with considerable reduction in computational complexity. On the other hand, a psychologically motivated framework for considering alternative or even multiple goals in decision making (in this case prediction) has been proposed, which is easy to apply even when the goal can not be expressed via well known mathematical expression, or is not differentiable. The goals are satisfied by tuning the predictor so that an emotional signal indicating how the present state is assessed to be non-conductive to the goals, is continually minimized (i.e. we shift gradually to states assessed as more satisfactory with respect to the goals). The proposed emotional learning based fuzzy inference system (ELFIS) has been used in the prediction of solar activity (the sunspot number time series) where the emotional signal is determined with emphasis on the solar maximum regions (the peak points of sunspot number) and it has shown better results in comparison with RBF network and ANFIS. In the prediction of security price, the emotional learning algorithm is defined by emotions of a fuzzy critic and results in good predictions. In fact the use of a combination of error and rate of error change leads to late overtraining of the model. The definition of emotional signal is an important problem in emotional learning algorithm and provides higher degrees of freedom. In the prediction of security price, better performance can be obtained through the use of variables in addition to the lagged values of the process to be predicted (e.g. fundamentalist as well as chartist data). References: [1] Brown M., Harris C.J. (1994), Neuro fuzzy adaptive modeling and control, Prentice Hall. [2] Nelles O. (2001), Nonlinear system identification, Springer Verlag, Berlin. [3] Bossley K.M. (1997) Neurofuzzy Modeling Approaches in System Identification, PhD thesis, University of Southampton, Southampton, UK [4] Leung H., Lo T., Wang S. (2001) Prediction of noisy chaotic time series using an optimal radial basis function neural network, IEEE Tran. On Neural Networks, 12(5), pp. 1163-1172. [5] Takagi T., Sugeno M. (1985), Fuzzy identification of systems and its applications to modeling and control, IEEE Tran. On systems, Man and Cybernetics, vol. 15, pp. 116-132. [6]. Kavli T. (1993) ASMOD: An algorithm for adaptive spline modeling of observation data, Int. J. of Control, 58(4), pp. 947-967. [7] Jang J.R. (1993) ANFIS: Adaptive network based fuzzy inference system, IEEE Tran. On systems, Man and Cybernetics, 23(3), pp. 665-685. [8] Goleman D. (1995) Emotional Intelligence, New York: Bantam Books. [9] Picard R.W., Vyzas E., Healey J. (2001) Toward machine emotional intelligence: Analysis of affective physiological state, IEEE Trans. on Pattern Analysis and Machine Intelligence, 23 (10), pp. 1175-1191. [10] Lucas C., Shahmirzadi D., Sheikholeslami N. (2003) Introducing BELBIC: Brain Emotional Learning Based Intelligent Controller, Accepted for publication in the International Journal of Intelligent Automation and Soft Computing (Autosoft). [11] Jazbi A., Lucas C. (1999) Intelligent control with emotional learning, 7'h Iranian Conference on Electrical Engineering, ICEE'99, Tehran, Iran, pp. 207-212. [12] Lucas C., Jazbi S.A., Fatourechi M., Farshad M. (2000) Cognitive Action Selection with Neurocontrollers, Third Irano-Armenian Workshop on Neural Networks, Yerevan, Armenia. [13] Fatourechi M., Lucas C., Khaki Sedigh A. (2001) An Agent-based Approach to Multivariable Control, Proc. of IASTED International Conference on Artificial Intelligence and Applications, Marbella, Spain, pp. 376-381. [14] Fatourechi M., Lucas C., Khaki Sedigh A. (2001) Reducing Control Effort by means of Emotional Learning, Proc. of 9th Iranian Conf. on Electrical EngineeringICEE'01, Tehran, Iran, pp. 41-1 to 41-8. [15] Fatourechi M., Lucas C., Khaki Sedigh A. (2001) Reduction of Maximum Overshoot by means of Emotional Learning, Proceedings of 6th Annual CSI Computer Conference, Isfahan, Iran, pp. 460-467. [16] Perlovsky L.I. (1999), Emotions, Learning and control, proc. of IEEE Int. symp. On Intelligent control/Intelligent systems and semiotics, Cambridge MA, pp. 132-137. [17] Ventura R., Pinto Ferreira C. (1999) Emotion based control systems, proc. of IEEE Int. symp. On Intelligent control/Intelligent systems and semiotics, Cambridge MA, pp. 64-66. [18] Inoue K., Kawabata K., Kobayashi H. (1996) On a Decision Making System with Emotion, proc. 5th IEEE International Workshop on Robot and Human Communication, pp. 461-465. [19] Sutton R.S., Barto A.G. (1998) Introduction to reinforcement learning, MIT press, Cambridge. [20] Barto A., Sutton R., Watkins C. (1990) Learning and sequential decision making, in Learning and Computational Neuroscience, MIT press, Cambridge. [21] Watkins C. (1989) Learning from delayed rewards, PhD thesis, University of Cambridge, UK. [22] Watkins C., Dayan P. (1992) Q-Learning, Machine Learning, 8, pp. 279-292. [23] Sutton R.S. (1989) Learning to predict by the method of temporal differences, Machine Learning, 3, pp. 9-44. [24] Ungar L.H. (2002) Reinforcement learning from limited observations, Workshop on Learning and Approximate Dynamic Programming, Playacar, Mexico. [25] Brown M., Bossley K.M., Mills D.J., Harris C.J. (1995) High dimensional neurofuzzy systems: Overcoming the curse of dimensionality, proc. of IEEE International Conference on Fuzzy Systems, New Orleans, USA, pp. 2139-2146. [26] Izeman A. J. (1985) Wolf J.R. and the Zurich sunspot relative numbers, The Mathematical Intelligence, vol.7, no.1, pp. 27-33. [27] Tong H., Lim K. (1980) Threshold Autoregressive limit cycles and cyclical data, J. Roy. Statistics. Soc. B, no.42, pp. 245-292. [28] Tong H. (1996) Nonlinear time series: A dynamical system approach, Oxford press, UK. [29] Weigend A., Huberman B., Rumelhart D.E. (1990) Predicting the future: a connectionist approach, Int. J. Of Neural systems, vol. 1, pp. 193-209. [30] Weigend A., Huberman B., Rumelhart D.E., (1992) Predicting sunspots and exchange rates with connectionist networks, in Nonlinear Modeling and Forecasting, Casdagli, Eubank: Editors, Addison-Wesley, pp. 395-432. [31] Werbos P.J. (1974) Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, PhD thesis, Harvard University, USA. [32] Rumelhart D.E., Hinton G.E., Williams R.J. (1986) Learning internal representations by error propagation, in D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, chapter 8, MIT Press, Cambridge. [33] Narendra K.S., Parthasarathy K. (1990) Identification and Control of dynamical systems using neural networks, IEEE Tran. On Neural Networks, 1(1), pp. 4-27. [34] Xu X., He H.G., Hu D. (2002) Efficient Reinforcement Learning Using Recursive Least-Squares Methods, Journal of Artificial Intelligence Research, 16, pp. 259-292. [35] Bay J.S. (1997) Behavior Learning in Large Homogeneous Populations of Robots, IASTED International Conference on Artificial Intelligence and Soft Computing, pp. 137-140. Emotion-Based Decision and Learning Using Associative Memory and Statistical Estimation Bruno D. Damas and Luis M. Custódio Institute for Systems and Robotics Instituto Superior Técnico, Av. Rovisco Pais, 1, 1049-001, Lisboa, Portugal bdamas@isr.ist.utl.pt, lmmc@isr.ist.utl.pt Keywords: Agents, Emotions, Decision, Learning Received: October 31, 2002 An emotional architecture for artificial agents, inspired in Damasio s concept of somatic markers, is described in this paper. An associative memory, capable of providing estimates of action consequences for given situations, is also developed. The role emotions play on behaviour triggering is also discussed. This architecture has been successfully applied to robotic soccer, leading to an effective emotion-based machine learning. 1 Introduction There is an apparent simplicity in most of the human decision processes: deciding which restaurant should we dinner at or choosing a movie to watch does not usually take us to an explicit and exhaustive enumeration and consideration of all the information, conditions and constraints considered relevant to the decision making process. Yet, despite this simplicity, most of the times our decisions lead us to advantageous situations, or, at least, prevent potentially undesirable ones. Antonio R. Damasio has pointed out the role of emotions on those human decision processes (1). According to Damasio, emotions have a key importance in the whole human rationality, as they effectively provide a selection mechanism that distinguish, in a situation where a decision is needed, the best alternatives, according to the individual past experience. This emotion-based decision mechanism is based on the concept of somatic marker (1). Damasio defines an emotion as a perturbation in the state of a set of human body variables, caused by situations or thoughts. These variables include, for example, blood pressure, endocrinous glands activity, musculature parameters, to name but a few. Strong experiences lead to heavy perturbations of these variables, thus generating significant emotions. On the other hand, a feeling, according to Damasio, is considered an association of an emotion with the mental image of the situation that give rise to it. Damasio names a feeling that is kept in the human long-term memory as a somatic marker. From an artificial intelligence point of view, one can think those feelings as consisting of pairs (Perception, Emotion), where a perception is defined as the processed information that respects to a state of the world and the agent itself. When a decision has to be made some neural dispositions, representing some of the previous acquired somatic markers, are triggered as a consequence of several possi- ble future scenarios that are brought to the mind. When one of these scenarios is similar to the perception hold by a somatic marker the corresponding emotional variation is replicated in the human body. This allows a fast classification of that future scenario in terms of its desirability, and hence provides a benefit evaluation of the action that would lead to that hypothetical scenario. When a decision has to take place, emotions automatically discard some "actuation paths", while emphasising others, therefore contributing for narrowing the decision search space. The model proposed here aims at the implementation, in an artificial agent, of the decision and learning capabilities provided by somatic markers in human beings. There are no references in this work to concepts like sadness, anger, fear or joy, since it is intended not to replicate human emotions on an artificial machine but rather to implement the functionality of emotions, namely their contribution to the decision making process. In Section 2 the proposed architecture and all of its modules are presented, namely, the associative memory module, its management for finite resources, the decision module and the action switching module. In Section 3 some experiments performed to evaluate the performance of the proposed architecture are described and their results discussed. Finally, Section 4 presents some related work and Section 5 finishes the paper, drawing some conclusions and pointing directions to future investigation on the architecture presented in this paper. 2 The Proposed Architecture An intelligent agent is usually considered as an entity that, for each perception, tries to maximize some kind of utility function over time, choosing its most adequate available action(2). Usually the instantaneous utility function is built upon a finite set of attributes, also called motivations in this paper. Let us define the Connotation Vector as the vector collecting the attributes deviations from their optimal values, i.e., deviation from the values for which the instantaneous utility function takes its maximum value. This vector has Nc components, where Nc is the number of attributes considered. It is easy to see that the instantaneous utility function is maximized when the connotation vector is equal to the null vector. The agent using this architecture should maximize (■to +T ^ = 11. u{t) dt , (1) where t0 is the current time, T is a time horizon, U is the utility value and u(t), the instantaneous utility, is obtained from the connotation vector according to C (t) R u(t) = ^(C (t)) (2) where C, the connotation vector, is calculated using the function Qp P (t) C (t) = x(P (t)) (3) where P is a perception and QP is the perception space. ^ and X are functions depending on the environment and on the agent objectives. Maximizing the instantaneous utility function is equivalent to minimizing the connotation vector, and the agent tries therefore to keep its connotation vector as close to the null vector as possible. Notice that the emotional decision system of a human being also tries to keep a set of body variables as close as possible to their equilibrium values, usually called home-ostatic values (1). There is a parallelism between the connotation vector of our artificial agent and the emotional response of an individual to some situation. In order to build a decision and learning system based on emotions and somatic markers two different mechanisms have to be implemented: An associative memory: Somatic markers are essentially associations between perceptions and emotional variations. Therefore, a memory that holds a set of perceptions and the corresponding connotation vector variations must be built. This associative memory, given a perception P, should allow to estimate aACp , the connotation variation taken as a consequence of choosing action A. It should also provide AIP, a measure of the quality of the information used to estimate aACp . A decision policy: Given ACP and IP for every possible action Aj, which action should the agent carry out? That decision depends on C, the agent's current connotation vector, but the information AIP also plays a role on this decision process — it introduce the notion of exploration, as it will be explained in section 2.3. Fig. 1 provides a global view of the proposed model's architecture. 2.1 Estimation Using an Associative Memory An associative memory ^ is defined here as a set of quadruples = (tj, Pj, Aj, ACi), with 0 < i < N^ , where N^ is the number of quadruples stored in memory, Pi is a perception acquired at instant ti, Ai is the corresponding chosen action and ACi is the connotation vector variation, taken as a consequence of performing Ai in the situation represented by Pi. The similarity between two pairs of perceptions and corresponding actions, (Pi,Ai) and (Pj ,Aj ), is defined by Pij = p ((Pi, Ai), (Pj, Aj)), with Pij e [0,1], (4) where p is an environment dependent similarity measure function which gives pj^j = 0 when two pairs are entirely distinct and p= « 1 when there is a perfect match between them. Then, for an arriving perception P, the connotation variation caused by selecting action A is estimated using = E.'j'i [ACi • P ((P, A), (Pi, Ai))] (5) P EN^i P ((P, A), (Pi, Ai)) ' This estimate is obtained by simply averaging all records stored in memory, weighted by their similarity to perception P. Similar perceptions stored in memory cause the corresponding records to have a higher contribution for estimating the connotation variation. An information measure is also obtained using Ip = P ((P, A), (Pi,Ai)) , (6) ie O-h where Qk is the set formed by the K nearest neighbours of (P, A), according to the similarity measure p. aIp « 1 when the associative memory has records similar to the pair (P, A), meaning that the generated estimate of the connotation variation caused by action A in situation P is build using information from similar previous experiences; on the other hand, when AIP « 0 the estimate does not have access to similar situations, hence suggesting an unreliable estimation value. Note that the information gain, as defined by classical information theory, can be obtained by Ic = 1 —A Ip, where Ic stands for "classical information". Ic « 0 when records similar to (P, A) are present in memory, meaning that there will not be a significative information gain if action A is chosen to be carried out in situation P (as the memory already contains the results of previous similar experiences). Note that the estimation of the connotation vector variation is a classical statistical regression problem. Given a sample of points (Pi, Ai, ACi) — the associative memory — it is desired to estimate a (possible stochastic) function AC = f (P, A) that maps the perception and action spaces to , where, as previously seen, Nc is the connotation vector dimension. C C A Figure 1: Architecture 2.2 Finite Resources: Memory Management Computational limitations are inherent to any practical artificial agent, leading consequently to finite size associative memories. The special case of agents acting in realtime demanding environments strongly restricts the memory maximum capacity for which the computation of (5) is still performed in a reasonable time. In those cases an associative memory can easily become filled up. When a new record is ready to be inserted into memory, the agent must therefore pick and discard some stored quadruple. The choice policy of the record to be eliminated is a crucial one: on the one hand, it should not increase the mean estimation error over time and, if possible, should even contribute to decrease it; and on the other hand, a removal policy should not have a computation time higher than the estimation process itself. The complexity of the estimation performed using equation (5) is O{N^) in terms of basic arithmetic operations and associative memory accesses. Several possible heuristics can be used: Antiquity: The oldest record is always picked and removed. The associative memory thus becomes a firstin first-out queue; Estimation Error: This heuristic picks the record whose elimination would produce the least variation in the estimate, i.e., chooses a record that does not appreciably change the estimate supplied by associative memory, after the record being removed; Variance: Records in memory where the local variability of AC is low are removed first, as they are unlikely to be useful in the estimation process; Information: Records located in a highly dense region of the perception space have priority for elimination, since their deletion does not change noticeably the value of AIp for perceptions P located in that region; Meaning: This heuristic tries to preserve high |AC| records, i.e., records associated with strong experiences. 1 ' x\ X V .. y 5 Perception Figure 2: Connotation variation estimation for a 15 point memory. These points are represented by crosses, while the estimates with and without the presence of record 3 are represented respectively by the blue and cyan lines. Record 2 will be chosen for removal if meaning heuristic is used, as it corresponds to the smallest connotation variation. On the other hand, comparing points 1 and 4 shows that the latter has a lower local variance, and thus will be removed before the former if variance heuristic is used. Point 5 will hardly be eliminated if information based removal is carried out, due to its relative isolation. Finally, represented by an arrow is the estimation change produced by point 3 removal. Estimation error heuristic tries to remove points that do not change noticeably the estimation curve. Fig. 2 illustrates the use of these heuristics. Estimation error based heuristic should pick a memory record i that minimizes the estimation variation over all the perception space, H (i) = -AC!p A(C'P- dP, where Na is the number of possible actions and ACJ^j-is the estimate obtained in the absence of record i. Computational constraints, however, force this heuristic to only A take into account the estimate error for the point i where the heuristic H (i) is evaluated, Since H (i) = Pi (7) = eNTi Pij -1 (8) AC7 = Ejl"! ACj Pij EN?! Pij At i = ^p = ENI?! p ((P, A), (Pi, Ai)) (9) \C| ^wi\ci\ (10) i=1 Nc Z' i=1 \C \ = y WiCi (11) d\C \ dAèi = 22wi\ACi\ The complexity of the error estimation based heuristic thus become O(N^), since complexity of (7) is O(1) if some auxiliary values are maintained in the associative memory records. Local variance of record i is given by where ACJ, the connotation variation local mean, is given by Choosing a record to eliminate based on local variance also is O(N^) if proper auxiliary values are kept in associative memory records. Information based heuristic, on the other hand, is O(N';2) if (6) is used. Instead one can use the average similarity, the derivative of the connotation norm is proportional to the absolute value of each connotation component. In this manner, vectors with a high absolute value of some of their components are effectively "punished" with a higher norm value. When \ C\ is normalized between 0 and 1 one can define AUP, the action A utility for perception P, as AUp = 1 - \C\. Normalizing the connotation components, —1 < ci < 1, assures \C\ normalization when (10) or (11) are used. Such a greedy strategy is not always the best thing to do, especially when the agent interacts with a complex and challenging environment. In those cases, preferring an exploration behaviour may reveal itself more useful. Remember that AIP takes a value near 0 when the associative memory does not hold situations similar to the pair (P, A); in those situations, the agent should consider exploring the world. The decision policy of the agent is consequently based on choosing the action A that maximizes to obtain a O(N^) information-based heuristic. 2.3 Decision How do the agent select the action to execute, when, in a given situation represented by perception P, a connotation variation estimate, for every possible action A, AACP, is available? Recall that this agent tries to keep the connotation vector as close as possible to the null vector. Consider A(CP = CP +A ACP, the expected connotation vector if action A is performed. In most cases choosing an action that minimizes the Euclidian norm of ACCP is inadequate, as this norm does not make a distinction between vector components with different importances. Suppose a weight wi is assigned to each connotation component, with the usual restriction EN=C1 wi = 1. Selecting an action that leads to the most favourable expected situation becomes therefore dependent on the distance metric used for the connotation vector. One can consider, for example, (1 - ß) AUp + ß (1 -AIp) (12) where ci is the ith component of the connotation vector. On the other hand, if actions that lead to high absolute values for some of the components are to be avoided, then one can use the following metric: where ß e [0; 1], an exploration coefficient, may vary with time. 2.4 Action Triggering Deciding when to stop executing an action, and choosing and starting another one is a delicate problem. Periodically triggering the action switching does not allow an agent to quickly respond to unexpected events. Making the trigger period shorter does not solve the problem, as it may prevent an acceptable perception of action consequences, thus leading the agent to an erroneous learning. In practice, this may lead to a global behaviour not very different from a random behaviour selection (3). Emotions are often pointed out as a source of behaviour interruption (3; 4; 5). Typically, a new behaviour is triggered when a significative emotional change happens. This corresponds to a new action being chosen whenever a relevant variation on the connotation vector is detected. Statistical tests are robust methods that take agent sensors noise into consideration, and consequently they are used in this architecture to detect connotation changes. Let D1 be a sample formed by the n most recent acquired connotation vectors, and let D2 collect the n connotations directly preceding last action triggering instant. Suppose D1 elements have a gaussian distribution, with mean and variance a2. Suppose also that D2 elements are gaussian too, with same variance a2 but distinct mean . A significative variation on the connotation vector is assumed when hypothesis = is statistically rejected, i.e., when, for some pre-defined significance level a, \tobs\ >Frl_2 ((1 - a)/2), with ^obs — {Di - D2) - (Mi - M2) (13) Pij = ' (Pi-Pj)2 .,, . . e D if Ai — A, ; 0 otherwise, where D is a distance parameter whose value is equal to 0.5. Fig. 5 illustrates the obtained estimates, for each of the five discarding policies presented in section 2.2. Fig. 6 shows the average estimation error, calculated for -10 < P < 10 using the true connotation variation value of Fig. 3 as a reference. Fig. 7 shows the mean value of AIp, using equation (6), with K — 1. These results show how poorly a meaning-based heuristic performs. Keeping only the higher connotation variation records leads usually to severe estimation errors. Such a policy forces the agent to "see the world in black and V(S2 + S22) /n ' where the statistic observed value, tobs, has a t-student distribution with 2n - 2 degrees of freedom and 2 (x) is the t-student inverse distribution function. D) 1 and I)2 are, respectively, Di and D2 sample means. Choosing a new action and learning the consequences of the previous one is hence triggered whenever the statistical test fails to prove an equality of means. 3 Results In order to test the proposed architecture, a first experiment was conducted to check the estimation quality of the associative memory. Then, a soccer emotion-based agent was created and tested. This agent does not have any a priori knowledge of the consequence of its available behaviours, and therefore must learn how to score using exclusively the mechanisms developed in this work. 3.1 Estimation Consider a scalar perception, gaussian with zero mean and standard deviation equal to 3. There are two possible actions: the first one is chosen with 70% probability, while the second has 30% chances of being executed. There is a connotation variation associated with every perception and with each action, as shown in Fig. 3. However, this connotation variation is corrupted with 0.2 standard deviation white noise. 10 000 points are generated, shown in Fig. 4, and presented sequentially to the associative memory. Two different size memories were tested, with 100 and 1 000 points. The similarity measure between two perceptions, pij, is assumed to be -Action 1 -Action 2 " A K A A K A A 1 \ / \ \ \ \ \ A / \ ri—n \ \J L / \ / \ V/ V/ W -10 -5 0 5 10 Perception Figure 3: Connotation variation for each possible action. -10 -5 0 5 10 Perception Figure 4: 10 000 points sample (corrupted with additive white noise). white". This policy, however, may prove itself useful when some critical (high AC) situations must be avoided, since such a "superstitious" agent may be able to prevent them. Information-based heuristic seems to work better when a low capacity memory is used; on the other hand, estimation error policy provides better results when a large associative memory is employed. The similarity measure has an important role on estimation: increasing the value of D effectively acts as low-pass filter over the estimate. This may be desirable when handling with a sparse associative memory, but it might deteriorate the estimate when the memory size is large enough. Notice, in Fig. 8, how the estimate gets better if D is set to 0.1. Nevertheless, even with a memory with a dimension of only 100 records, estimation has proven to be quite accurate, as seen in Fig. 4. 3.2 Simulated Soccer Robotic soccer provides a demanding and complex environment for artificial agents, thus being a natural test bed where decision and learning mechanisms can be developed and tested. The emotion-based architecture presented in this paper was applied to the simulation soccer league of Robot World Cup Initiative, also known as RoboCup (6). Although simulation league teams comprise eleven players for each side, only two distinct situations were considered: a player with no opponents (a solo game) and a player against one opponent. Increasing further the number of players in the match necessarily leads to some questions and problems, such as the credit assignment problem in multi-agent systems. This is a complex problem that is not considered in this paper. Solo Game: In the first experiment, an emotion-based agent is left alone in the field. It does not have a clue on its available actions consequences, although it have some a priori motivations: it wants to keep its stamina high, it wants to be near the ball and it wants the ball to be as near as possible to the opponent goal. These motivations are essential, since the agent needs a learning reference, i.e., it must know the connotation of experienced perceptions. Its perception vector consists of information on positions of the ball and the player itself. There is also a set of available behaviours such as getting near the ball, dribbling to goal, kicking to goal and clearing the ball away, to name but a few. The agent is then allowed to play a few matches — each match duration was set to 5 minutes — and its performance is evaluated. Table 1 shows the game results, while Fig. 9 presents the agent score evolution. Game Goal Score 1 8 2 13 3 22 4 21 1: Lonely match: game r 4000 6000 8000 Number of Perceptions Figure 9: Lonely match: goal average. Table 2 shows that some of the actions were completely discarded after some time. This may be considered an intelligent behaviour, since centering the ball or dribbling it to the near corner hardly lead to a high motivation satisfaction. On the other hand, the emotion-based agent quickly develops a very efficient style: it constantly tries to get near the ball; when this happens, the agent then dribbles and/or shot it to opponent goal direction. Sometimes, however, it just stands facing the ball; this happens when the agent gets tired. One vs. One: The presence of an opponent enlarges the perception and action sets. This opponent is modelled as a finite state machine with state transitions represented in Fig. 10. Lost Ball Figure 10: Opponent state machine. Satisfying emotion-based agent motivations then becomes a trickier task, since the opponent player permanently tries to steal the ball, driving it afterwards to the agent own goal. Table 3 presents game scores for a sequence of twelve matches of ten minutes each. Opponent player easily wins first games, since at that time the emotion-based agent is still trying to learn action consequences. Game Score 1 2-6 2 0-10 3 5-9 4 6-9 5 5-11 6 3-16 7 9-7 8 9-6 9 11-8 10 16-4 11 10-8 12 13-6 Table 3: One vs. one: score and ball possession. However, after some games the emotion-based agent learned how to play against the reactive agent, beating it on the subsequently matches. Table 4 shows the dispended time of each action for the diverse matches. 4 Related Work Until a decade ago, most of Artificial Intelligence researchers associated emotions to human rationality loss: emotions were believed to induct "a partial loss of control of high level cognitive processes" (7). Recent neurological evidence, however, has shown the fundamental role that emotions play in human decision and learning (1; 8; 9). There has been, especially since the publication of Dama-sio's "Descartes' Error" book, an increasingly interest on artificial emotions and their assistance to decision and cognition in artificial agent design. Velasquez (10; 11; 12) presents an emotion architecture based on Damasio's work, as well on the Society of Mind concept (13). Also following Damasio's reference book, Gadanho et a^l. models a hormonal and emotional system where emotions provide a reinforcement value to a Q-Learning decision scheme (3). In this work, Gadanho also uses emotions to trigger learning and behaviour switching on an autonomous robotic agent. Ventura et al. proposes the double processing paradigm, where stimuli are processed simultaneously by a fast, perceptual layer — corresponding to primary emotions — and a slow, cognitive layer — inspired on Damasio's secondary emotions (14; 15; 16; 17; 18). While the latter implements a learning mechanism based on somatic markers, the former, using a priori knowledge, can provide a quick response when the agent is confronted with a situation demanding urgent action. The perceptual image of a stimulus, created by the perceptual layer, also contributes to narrow the search space of the cognitive layer. Finally, several other models of emotions have been built in the last few years, most of them oriented to human-machine interaction, such as the Oz Project (19) or the recognition of emotions developed by Picard (20). 5 Conclusions The model presented in this paper is strongly inspired on Damasio's concept of somatic markers. It uses an associative memory to implement those markers in artificial agents. One can think of such a memory as a collection of records, each of them corresponding to an "artificial somatic marker". This paper also proposes some fast estimation and memory management mechanisms, which make this model suitable to real-time, "short time to think" agents. Emotions also inspired the development of a statistical mechanism for deciding when to interrupt behaviour, i.e., when to start executing another action. It was shown in this paper how well a limited resourced memory estimates consequences of actions. Future work will fall upon the development of more sophisticated removal heuristics, as well as studying more deeply the kinds of relations that exist between memory size and similarity measure between perceptions, and how can they contribute to better estimation. A soccer-playing emotion-based agent was also developed and successfully tested. This agent was able to effectively learn on a challenging and demanding environment, in the presence of an opponent whose objectives consisted only in preventing the emotion-based agent from satisfying its own. While developing such an agent, some difficulties were raised when defining an exploration/exploitation compromise. Nevertheless, the emotion-based agent was able to beat the reactive agent after some played matches. The triggering mechanism presented in this paper does not solve the cause-effect problem, although it performs better than a periodic action switching. The proposed model does not consider either the credit assignment problem in multi-agent systems. Future work will also fall upon both these problems. Acknowledgement This work has been developed under the framework of a research project founded by the Portuguese Foundation for Science and Technology project PRAXIS/P/EEI/12184/1998. References [1] Damasio, A. O Erro de Descartes: Emq^o, Razäo e Cérebro Humano, Publica^öes Europa-América, Lisboa, 1995 [2] Russel, S. and Norvig, P., Artificial Intellig;ence: A Modern Approach, Prentice-Hall International Editions, New York, 1995 [3] Gadanho, S. and Hallam, J. Emotion-triggered Learning in Autonomous Robot Control in Workshop: Grounding Emotions in Adaptat^ive Systems, pag. 3136, Dolores Canmero, Chisato Numaoka, and Paolo Petta, ed., August 1998 [4] Sloman, A. and Croucher, M., "Why Robots Will Have Emotions", in IJCAI'81 — Proceedings of the Se^en International Joint Conference on Art^ificial Inteligence, pag 2369-71,1981 [5] Simon, H., "Motivational and Emotional Controls of Cognition", in PsychologicalRev^iew, 74:29-39, 1967 [6] http://socrob.isr.ist.utl.pt [7] Sloman, A., What sort of control system is able to have a personality?, 1995 (ftp.cs.bham.ac.uk/pub/groups/cog\ _affect/Aaron.Sloman.vienna.ps.z) [8] Damasio, A. O Sentimento de Si, Publica^öes Europa-América, 2000 [9] LeDoux, J., The Emotional Brain, Simon & Schuster, New York, 1996 [10] Velasquez, J., Modeling Emotion-Based Decision-Making, 1998 (http://alpha-bits.ai.mit.edu/ people/jvelas/research.html) [11] Velasquez, J., When Robots Weep: Emot^ionalMemo-ries and Decision-Making, 1998 (http://alpha-bits.ai.mit.edu/ people/jvelas/research.html) [12] Velasquez, J., Modeling Emotions a^nd Other Mo^iv^a-tions in Synthetic Agents, 1997 (http://alpha-bits.ai.mit.edu/ people/jvelas/research.html) [13] Minsky, M., The society of mind, Simon and Schuster, New York, 1985 [14] Ventura, R. Emotion-Based Agentes, Master Thesis, 2000 [15] Ventura, R. and Pinto-Ferreira, C., "Emotion-based agents", in Proceedings AAAI-98, pag. 1204, AAAI, AAAI Press and The MIT Press, 1998 [16] Ventura, R. and Pinto-Ferreira, C., "Meaning Engines — Revisiting the Chinese Room", in Workshop: Grounding Emotions in Adap^ative Systems, pag. 6870, Dolores Canamero, Chisato Numaoka, and Paolo Petta, ed., Agosto 1998 [17] Ventura, R., Custódio, L. and Pinto-Ferreira, C., "Artificial Emotions — Goodbye, Mr. Spock!, in Progress in A^rt^fìcial Intelligence, Proceedings ofIB-ERAMIA'98, pag. 395-402, Ed. Colibri, 1998 [18] Ventura, R., Custódio, L. and Pinto-Ferreira, C., "Emotions — The missing link?, in Emotional and Intelligeent: The Tangled Knot of Cognition, Dolores Canamero, ed., pag. 170-175, 1998 [19] Bates, J., Loyall, A. and Reilly, W. An Architect^ui^e for Action, Emotion, a^d Social Behaviour,1992 (http://www.cs.cmu.edu/Groups/oz/ papers/CMU-CS-92-144.ps.gz) [20] Picard, R., Aff^ec^i^e Computing, 1995 (http:// www.media.mit.edu/~picard) - Ideal - Antiquity - Estimation Error - Information - Meaning - Ideal - Antiquity - Estimation Error - Variance - Information - Meaning -10 -8 -6 -4 4 6 8 10 -10 -8 -6 4 6 8 10 (a) Action 1, 100 points capacity associative memory. (b) Action 2, 100 points capacity associative memory. - Ideal - Antiquity - Estimation Error - Information - Meaning - Ideal - Antiquity - Estimation Error - Variance - Information - Meaning -10 -8 -6-4-2 0 Perception 4 6 8 10 -10 -8 -6 -2 0 Perception 4 6 8 10 (c) Action 1, 1 000 points capacity associative memory. (d) Action 2, 1 000 points capacity associative memory. Figure 5: Connotation variation estimate, AA(Jp. (a) 100 points capacity. (b) 1 000 points capacity. Figure 6: Estimate average error over time. 0 Perception 0 Perception Antiquity Estimation En Information Meaning 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Time 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Time - Antiquity - Estimation Error - Variance - Information - Meaning_ 05 - Antiquity - Estimation Error - Variance - Information - Meaning (a) 100 points capacity. (b) 1 000 points capacity. Figure 7: Average information AIp. — Ideal - Antiquity - Estimation Erri — Variance - Information - Meaning -0 5 - Ideal - Antiquity - Estimation Error - Variance - Information - Meaning_ -10 -8 -6 -4 4 6 8 10 -10 -8 -6 4 6 8 10 (a) Action 1, 1 000 points capacity associative memory. (b) Action 2, 1 000 points capacity associative memory. Figure 8: Connotation variation estimate, aACp (D = 0.1). Action Dispended time( % ) Game 1 Game 2 Game 3 Game 4 Ge^Ball 48.5 49.0 55.1 51.2 FaceBall 14.1 13.7 9.7 17.0 HoldBall 3.1 3.1 0 0 DribbleToFarCorner 6.1 3.1 0.5 0 DribbleToNearCorner 5.6 3.1 0 0 DribbleToGoal 4.3 4.3 8.8 7.8 Shot 9.2 20.5 26.3 23.9 ClearBall 6.1 0.6 0 0 CenterBall 3.1 2.5 0 0 Table 2: Lonely match: time dispended to each action 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 ) 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 Perception 0 Perception Action Dispended time( % ) Game 1 Game 2 Game 3 Game 4 Game 5 Game 6 Shot 3.4 1.2 4.2 1.8 1.3 1.0 ClearBall 3.8 1.9 2.8 0 0.4 0 CenterBa^ll 2.2 0.4 0 0.4 0 0 Ge^Ball 27.8 34.0 40.7 48.9 54.1 50.8 FaceBall 9.1 4.3 1.1 1.1 2.6 1.3 HoldBall 11.8 4.3 0.7 0.4 0.9 0.3 DribbleToFarCorner 11.8 9.3 2.8 1.8 0.9 1.9 DribbleToNearCorner 3.1 10.0 5.2 5.1 0.4 10.0 DribbleToGoal 9.6 14.3 13.7 23.9 33.6 22.5 TrackOpponent 9.6 8.9 12.2 13.0 0.9 6.8 MarkOpponent 7.6 11.6 16.5 3.6 4.8 5.5 Action Game 7 Game 8 Game 9 Game 10 Game 11 Game 12 Shot 3.4 16.3 20.0 18.4 19.7 21.9 ClearBall 0.8 0 0 0.2 0.3 0.3 Centei-Ball 0 0 0 0.7 0 0 Ge^Ball 46.8 50.8 54.2 50.5 58.0 61.2 FaceBall 0.8 2.1 0.3 0.7 0.9 0 HoldBall 0.4 0.6 0.6 0 0 0 DribbleToFarCorner 0.4 1.2 1.7 1.8 1.7 3.9 DribbleToNearCorner 6.1 1.5 3.2 1.6 1.4 0 DribbleToGoal 35.0 22.1 19.7 24.4 12.2 6.9 TrackOpponent 5.3 5.1 2.3 0.4 4.1 5.8 MarkOpponent 1.1 0.3 0 1.4 1.7 0 Table 4: One vs. One: time dispended to each action Computational Models of Emotion for Autonomy and Reasoning Darryl N. Davis and Suzanne C. Lewis Department of Computer Science, University of Hull, Cottingham Road, Kingston-upon-Hull, UK, HU6 7RX. D. N. Davis@dcs.hull.ac.uk, http://www2.dcs.hull.ac.uk/NEAT/dnd/index. htm S. C. Lewis@dcs.hull.ac.uk, http://www2.dcs.hull.ac.uk/NEAT/index. html Keywords: Emotion, Computational Models, Autonomy, Perception, Reasoning Received: October 25, 2002 Recent evidence suggests that the emotions play a crucial role in perception, learning and rational decision making. Despite arguments to the contrary, all artificial intelligent systems are, to some extent, autonomous. This research investigates how emotion can be used as the basis for autonomy. We propose the use of an emotion-based control language that maps over all layers of a computational architecture. We report on how theoretical work and both design and computational experiments with this concept are being used to direct perception, behavior selection and reasoning in cognitive agents. 1 Introduction Definitions of intelligence in artificial systems have involved the use of many concepts including deliberation, reasoning [19] and the selection of appropriate behaviors for any given situation [4, 17]. In reasoning, information is granted belief status, either partially in probabilistic systems or wholly in logic based reasoning systems, and used as the basis for further deliberation. This deliberation may give rise to altered belief states and leads to the selection of goals, plans and behavior. Typically an agent chooses from alternative responses because of design decisions or learning. The choice is made on the basis of information or control metrics or a-priori ranking of alternative behaviors. Underlying these perceptual and reasoning processes is the concept of autonomy. Alterman [1] suggests task-effective artificially intelligent systems need not be designed in terms of autonomy. Intelligence arises out of the interaction of the system and the user. However when system goals and resource allocation are in conflict, considerable interaction is required with a user. This interaction, itself a system goal (whether implicit or explicit), may never be satisfied unless the system can decide to perform the appropriate actions. To decide between actions, given that not all preconditions for any action will be specified at design time, requires the system to be, in some sense, autonomous. Emotions and their nature have been studied for a considerable time, with many contrasting theories and views of emotion being formed. A traditional perspective of emotion is of something that is irrational and detracts from reasoning. However, recent evidence [8] suggests that emotions are an essential part of human intelligence, and play a crucial role in perception, rational decision-making and learning. Most major current theories of emotion agree that emotions constitute a very powerful motivational system that influences perception and cognition in many important ways. For example neurons in the amygdala are driven particularly strongly by stimuli with emotional significance, indicating an important role in the coding of the emotional significance of sensory data. Further research suggests that motivation and emotion serve as filters that guide perception and affect the evaluation of perceptual information [3]. This view is supported by Izard [16] who argues that emotion is a guiding force for perception. If emotion is a primary source of motivation, it must play a significant role in both initiating and providing descriptors for the types of disequilibria described by Mearleu-Ponty [18] as underlying behavior in biological agents. From a computational perspective, Sloman considers that intelligent machines will necessarily experience emotion (-like) states [24]. Following on from Simon [23], this developing theory of mind considers how perturbant control states ensue from attempting to achieve multiple goals, or goals at odds with resource availability and environment affordances. Perturbant states will arise in any information processing infrastructure where there are insufficient resources to satisfy all current and prospective goals. This will occur not only at the deliberative belief and goal management levels but over all layers of the architecture as goals are mapped onto internal or external behaviors and actions. An agent must be able to recognise and regulate these emotion-like states or compromise its task effectiveness. The aim of this research is to investigate theories of emotion and understand how they can used to underpin computational autonomy, to direct and inform perception and behavior selection and to form a better model of computational reasoning. This paper describes this ongoing research and the integration of an emotional model into two different types of computational architectures. 2 Psychology and Emotions Research has shown that emotion affects many different aspects of cognitive function including memory [5], reasoning and social interaction [15]. There has never been any doubt that emotion disrupts reasoning under certain circumstances and that misdirected or uncontrolled emotion can lead to irrational behavior. However, evidence from Damasio and other sources also suggests the contrary and that emotions play a fundamental role in rational and intelligent behavior such as decision-making and reasoning. The Somatic Marker Hypothesis [8], for instance, states that decisions, made in circumstances whose outcome could be potentially either harmful or advantageous, and which are similar to previous experience, induce a somatic response used to mark future outcomes. When the situation arises again the somatic marker will signal the danger or advantage. Thus, when a negative somatic marker is linked to a particular future outcome it serves as an alarm signal to be wary of that particular course of action. If instead, a positive somatic marker is linked it becomes an incentive to make that particular choice. The appraisal approach to emotion has cognition as the core element in emotion. The OCC (Ortony, Clore and Collins) model [21] synthesises emotions as outcomes to situations. Emotions arise out of a valanced reaction to situations consisting of events, objects and agents. The emotion type elicited is dependent upon appraisals made at each branch of the model. The oCc model allows for an emotional state to be a situation itself, so emotions can trigger additional emotions or the same one repeatedly. The OCC model is well suited to computational modeling as shown in the work of Elliot [11]. The goal-oriented approach suggests that emotions arise from evaluations of events relevant to goals. Again cognition is central to the elicitation of emotion. Oatley and colleagues [20] argue that emotions are caused by cognitive evaluations that may be conscious or unconscious. Each kind of evaluation gives rise to a distinct signal that reflects the priority of the goal, which then influences resultant behaviors. Frijda [13] uses a similar definition of emotion and states that certain stimuli elicit certain emotional phenomena because of the individual's concerns and the relevance of the stimuli to the satisfaction of these concerns. Duffy posed the question, at which particular degree does a characteristic become an 'emotion' or at which degree is it a 'non-emotion' [10]. For example, a raised heartbeat is characteristic of both emotional and nonemotional behavior. When does the difference in the characteristic occur? Is emotion a distinguishable state or a difference in the degree certain response characteristics exhibit? According to Duffy, the phenomena that are described as emotions occur in a continuum or a number of continua. The responses called 'emotional' do not appear to follow different principles of action to other responses of the individual. She states that all responses, 'emotional' or 'non-emotional', are reactions of an organism as it adapts to a situation. Emotion represents a change in the energy level, or the degree of reactivity of an individual. For example, situations, which are interpreted as threatening, are characteristically responded to with increased energy. Small changes in energy level may occur during 'interest' or 'boredom', whereas 'anger' is associated with a more extreme change. Duffy supports the goal-oriented view that emotions are only experienced in situations of significance to the individual. The intensity of the 'emotion' is proportional to the degree of importance associated with a particular goal and the degree of threat or promise the situation bears for that goal. The emotion experienced is also affected by the background and information that the individual has about the particular situation. Many theories use the concept of basic emotions. For example, the OCC model contains twenty-two different emotion types. Oatley and Johnson-Laird cite four basic emotions derived from evolutionary origins: happiness, sadness, fear and anger. A further five are derived from innate biological substrates: attachment, parental love, sexual attraction, disgust and interpersonal rejection. However, other theorists question the notion of basic emotions. Scherer and Duffy oppose the view of basic emotions and examine evidence that emotions are patterns of interrelated changes. Using basic emotions in a theory can lead to what Scherer calls 'bunching' of the different emotional states around a limited number of types. Conversely, the scope of emotional states in both Duffy's and Scherer's theories is considerably broader. Scherer points towards the existence of a large number of universal 'response elements' as opposed to basic emotions [22]. His concept of modal emotions attempts to address many questions. For example, why does the same situation not necessarily provoke the same emotional expression nor the use of the same label in two individuals? Like Duffy, Scherer sees emotion as a number of changes that occur over time in response to an event. He defines emotion as "a sequence of interrelated synchronised changes in the states of all organismic subsystems in response to the evaluation of an external or internal stimulus event that is relevant to central concerns of the organism". The emotional state results from the cumulative evaluation of relevant changes in internal or external stimulation. Scherer proposes that such organisms make five types of checks: novelty, intrinsic pleasantness; relevance to meeting plans; ability to cope with the perceived event; and compatibility of the event with self-concept and social norms. An appraisal according to these checks is carried out which elicits an emotional response. Scherer believes that the information from these checks is needed in order to choose how to respond. Some combinations of evaluation checks would be frequently encountered, giving rise to the same recurring patterns of state changes. The term 'modal emotions' refers to states resulting from these recurring stimulus evaluation check patterns. Although some patterns occur more frequently, the number of potential emotional states is virtually infinite. 3 Autonomy, Goals and Emotion Many frameworks are used for thinking about, designing and building intelligent systems. The use of rational BDI (Belief-Desire-Intention) models [6] is understandable, as they provide formal systems with well-defined properties. The limitations of such systems, e.g. logical omniscience and resource constraints, are known. Goal competition due to incompatible goals, insufficient resources or skills is a major research issue. Ferber [12] categorizes goal interaction in multi-agent systems as one of three categories: indifference, cooperative and antagonistic. Certainly in the latter case, and even for cooperative agents or goal interaction, perturbant states can arise. Such agent societies and intelligent systems need some means to manage these states or risk compromising their autonomy and reactivity, and hence their task effectiveness. Even the most rational agent architecture will be compromised if it lacks the mechanisms to cope with the emergent effects of antagonistic goal conflicts. One stance is to place a computational analogue to emotion at the core of an agent. This provides an agent with an internal model that maps across different levels and types of processing. Emotion provides an internal basis for autonomy and a means of valencing information processing events. It provides an internal model of use in ordering motivation and goals, and the means for choosing actions and regulating behavior. This emotional core can be used to recognise and categorise transient, episodic, trajectory and persistent control states. Sloman [25] also differentiates between episodic and persistent mental phenomena. His architectures for functioning minds include primary, secondary and tertiary emotions. Primary emotions are analogous to arousal processes in the emotion theories introduced above and have a reactive basis. Secondary emotions are those initiated by appraisal mechanisms and have a deliberative basis. Tertiary emotions are cognitive perturbances - typically negatively valenced emergent states - such as arising from goal or motivator conflicts in an information processing architecture, for example a multi agent society. Any agent architecture that supports multiple motivations or goals is liable to this type of dysfunction. Perturbant states can arise through resource inadequacy or mismanagement while pursuing multiple and not necessarily incompatible goals. Most computational systems face this type of problem even if their underlying theory does not. Possible solutions are particularly relevant to the design of goal-oriented and agent systems. 4 Four Layer Computational Architecture Earlier research on agents focused on an architecture that supports motivation [9]. The current framework builds on that architecture. It is used to pursue alternative computational perspectives on architectures of mind. Here the interplay of cognition and emotion is emphasized through mechanisms that support appraisal, motivation, tasks and roles. Emotions are accepted to be part mental (appraisal) states with descriptive (valencing) and causal (arousal) processes. This concept is used to provide a control or regulatory framework to model the different forms of emotion inducing events. The fundamental tenet of this work is that all agent events and actions, internal and external, can be described in terms of this model of emotion. A salient feature of many definitions of emotion is that they are described in terms of goals, roles (or norms) and responsive behaviors. This enables different aspects of motivational behavior to be consistently defined over different levels of the architecture in terms of an emotion-based control language. Global drives are those associated with the agent's overall and persistent purpose. Temporally-local drives are related to ephemeral states or events within the agent's environment or itself. Emotional autonomy allows an agent to select and attempt to maintain an ongoing globally-temporal disposition towards its roles. The nature of this is temporarily affected and perhaps modified through current goals and motivations. Over time events occur that modify, stall, negate or satisfy goals. Such events can be described within a model of emotion. An emotion-based control language can therefore be used to mediate the interaction of global roles and the temporally-local drives that reflect the current focus of the agent. An agent's internal environment can be defined in terms of its perception of external events, objects and agents and the behaviors (whether internal or external) they afford. Such descriptions can be organised according to control state theory [9]. The control language used to navigate this internal environment needs to be consistent across many levels and types of control state from autonomous reflexes to extensive deliberation associated with goal satisfaction or belief management. Various combinations of qualitatively different behavior are required of an agent as it attempts to achieve different categories of goals associated with a role. Different problem-solving trajectories, described in terms of goal-achieving behaviors, exist for any one role. A greater range exist where an agent has multiple and not necessarily contingent roles. Some trajectories while impossible are supported or attended to for any number of reasons; for example, the motivational intensity associated with a preferred goal or role. The possible trajectories depend on an agent's design. An agent is autonomous to the extent that it can choose to pursue specific motivational trajectories. An agent is rational to the extent that it follows feasible or achievable trajectories. Figure 1 shows emotion used as the core to a motivation based model of agenthood. This architecture emphasizes four distinct processing layers: a reflexive layer analogous to the autonomic systems in biological agents, a reactive or behavioral layer, a deliberative layer and a regulating reflective layer. The broad picture is of high and low level processes co-existing and interacting in an asynchronous, parallel and holistic manner. The majority of the higher level processes tend to remain dormant and state persistent; activated only when sufficiently required. The agent's processing exists in relation to the agent's environmental stance; i.e. what roles the agent has adopted, what objects, agents and events occur in the environment and how they affect the logistics of goal and role satisfaction. Motivator processing, planning and other cognitive processes are not merely abstract, nor just reactions to the current state of an agent's external environment but exist in relation to an agent's long term goals. Motivations, goals and the behaviors they subsume are all influenced by components of the emotion engine. ----------^'^EmoteiM Reflective ^— ^^ Dire Percep Deliberative^,. ote:] Reactive Emote:R^ __L T ___Reflexive (f^moteiA^------( on N...^ -►V ction Filtered Epistemic Data ---»Control Data Only Epistemic Data Figure 1. Sketch of the emotion engine based four-layer architecture. Overall the process is of information assimilation and synthesis, and information generation that typically map onto internal and external behaviors If emergent behaviors are to be recognized and described in terms the emotion based control language and then managed, there must be a design synergy across the different layers of the architecture. Processes at the deliberative level (for example Emote:D, Attention and Motivator in Figure 1) can reason about emergent states arising from anywhere in the architecture using explicit (motivator or goal) representations (see [9]) and the internally consistent control language. In an earlier architecture, the reflective processes (Emote:M) were used to classify the processing patterns of the agent in terms of combinations of a set of basic emotions and favored emotional responses (or disposition). Subsequent rejection of the concept of basic emotions, for theoretical and computational reasons, required a redesign of this component. The emotion-changing reactive behaviors (Emote:R) are used to pursue a change in disposition through changing the functional behavior of the lowest-level autonomous processes (Emote:A). This module is modeled using multiple communities of cellular automata (or hives). The behaviors associated with this module, and set by the Emote:R module, are those that govern the internal behavior of single cells, the communication between adjoining cells in communities and inter-community communication. Emotion is discretely valenced at the cell level as positive-neutralnegative. Ordinal measures across the valence of all the cells at the community level provide the basis for ascending control signals. Various threshold models have been used to determine if arousal occurs; for example a community of cells with a high aggregate valence, or a high degree of valence contrast across the cell community. Emotions can be instantiated by events both internal and external at a number of levels, whether primary, e.g. ecological drives, or by events that require substantive cognitive processing. Emotions can be invoked through cognitive appraisal of agent, object or event related scenarios, including for example the unwanted postponement or abandonment of a goal. To move to a preferred aspect of the possible emotional landscape, an agent may need to instantiate other motivators and accept temporarily unwanted dispositions. An agent with emotional autonomy needs to accept temporary emotional perturbance if it facilitates goal satisfaction at some future time. In the model shown in Figure 1, intense emotions or arousal events effectively override the reactive-level filters, activating the deliberative components of the emotion engine. Deliberative appraisal of an emotion inducing event can initiate lateral activation at the deliberative layer, affecting memory, attention and motivator management. Memory responds to emotional context as an aid to the storage and recall of memories about external events, objects and agents. Attention management makes use of the emotional state of Emote:D-R-A complexes to provide a semantic context for motivator filters, and set the quantitative emotion filters. The intensity levels of these filters are set in response to the Emote:D mechanisms and the reflective component (Emote:M) of the emotion engine. Computational experiments have used both sets of basic emotions and type-less emotion arousal models. Early experiments found that from any given state, a hive rapidly achieved a steady (continuous or oscillating) state. By changing the currently extant behavior set, or by communicating with another hive, transitions to the same or other steady states always occurred. Approximately 20,000 transition possibilities exist. Rules are used to select different hive dispositions and transitions. Similarly, through the modification of the internal state of a small number of cells, the emotion engine moves to a closely related but preferred state. This is analogous to the modal responses described in the Scherer model of emotion. 5 CRIBB and Emotion 5.1 The CRIBB Model CRIBB (Children's Reasoning about Intentions, Beliefs and Behavior) is a computer model based upon a general sketch for belief-desire reasoning in children [2]. It simulates the knowledge and inference processes of a competent child solving false-belief tasks [27]. A simulation run in CRIBB starts by giving propositions containing facts and perceptions about some scenario in sequential steps according to the time interval in which the propositions arise. On the basis of the given propositions and the inferences drawn, CRIBB answers test questions about the cover story. The questions can be about its own beliefs or about the intentions, beliefs and behavior of another person in the scenario. CRIBB represents propositions about physical states of a given situation and the intentions, beliefs, perceptions and behavior of others. Its knowledge base consists of four types of practical syllogisms and three other inference schemata, which represent the relations between these propositions. Practical Syllogisms denote knowledge about the relations between intentions, behavior and beliefs of another person. The three other classes of inference schemata relate perception-belief, belief-time and fact-time. These are split into primary and secondary representations. Primary representations are the system's own beliefs about the situation and the behavior of other people. Fact-time inferences, propositions about facts along a time scale, are classed as primary representations. Belief-time and perception-belief inference schemata are both types of secondary representation as they contain beliefs about the system's own and others' beliefs. A further element of CRIBB is a consistency mechanism that detects and resolves contradictions in belief sets. This is invoked each time a new proposition is added, in order to ensure the consistency of its knowledge base. 5.2 Extending CRIBB with Emotions Bartsch and Wellman's model [2] for belief-desire reasoning includes an emotion element that CRIBB does not implement. Consequently, CRIBB can be extended to perform some experiments with different models of emotion. Certain theories of emotion are more suitable for implementation in CRIBB. Both the appraisal and the goal-oriented approach cite cognition as the core of emotions. The scenarios used in CRIBB are based around a goal-oriented structure. The existence of intentions in CRIBB is comparable to a goal state. Therefore, implementing a goal base and using tenets of the goal-oriented approach to emotion is a suitable foundation on which to base a model of emotions. Gibson's theory of direct perception [14] can be used to extend CRIBB's perception-belief mechanism to incorporate emotional capabilities. Gibson describes how sensory data when perceived is given affordances and valences. An affordance is something that refers to both the environment and the perceiver in a way that no existing term does. They are properties taken with reference to the observer. Affordances of the environment are what it offers, what it provides, either for good or bad. For example, if a surface is horizontal, nearly flat and sufficiently extended and if its substance is rigid then the surface affords support. Affordances can also be valanced. The theory of affordances can be extended to allow emotion to exhibit an effect on perception of the environment according to the importance of needs, goals and plans to the individual. The following extension to CRIBB does this. When CRIBB is given a proposition, a belief is inferred from this. The consistency of this belief is checked with the existing set of beliefs. If no contradiction is found then the new proposition is added to the belief set. If there is a contradiction then this is resolved and the most certain belief is added to the belief set. For example: P := {r, s, q, p} B := {-p} P ® B ^B' B' := {r, s, q, p} B is the existing belief set and P is perception set. The new set B' contains the system's new belief set with all possible contradictions resolved (p is preferred to -p). In this scenario each perception of the world has equality. Rather than attempt to completely and accurately model the agent's world, emotion can be used to guide attention so an agent is drawn to aspects of the environment deemed to be of importance. Assigning an emotional affordance will enable a process by which perceptions can be filtered according to their importance. Hence: P := {r, s, q, p} E := {importance(high, p), importance(low, r)} B := {-p} E ® P ^ EP EP := {p, s, q, r} EP ® B ^ B' B' := {p, s, q, r} The perception set, P, contains the same perceptions as before. However, the order in which the perceptions are processed can be changed according to the emotional affordance, E, attached to each one. The new belief set, B', contains the perceptions which have been processed in the order that accords with their emotional significance to that individual. Emotion can be used to extend the belief and perception mechanism of CRIBB further. Consider a perception received from one source and a further perception, from a different source, that contradicts this. If, through the contradiction mechanism in CRIBB, the first perception is found to be false then this may affect the truth value of any beliefs and perceptions from that particular source. In other words CRIBB will now be less inclined to believe information received from this source. Or conversely, the information from some source may now be considered more reliable than before. This situation can be represented in CRIBB by creating an emotional correspondence for each possible source. This would give an indication of the likelihood of information from this source being either true or false. Ongoing work on this model is using various agent test-beds to gather metrics to inform our research. One particular experiment makes use of the fungus eater scenario [26, 28]. Early results suggest that the addition of emotion to CRIBB results in a more effective use of resources to achieve tasks, with a more efficient resolution of goal conflicts. For example in an environment evenly populated with fungus and ore, both CRIBB and ECRIBB agents achieve their task goals (the collection of ore). The ECRIBB agents however make more effective use of available energy sources (the fungus). In terms of the earlier arguments about goal conflicts, the emotion model augments the agent's autonomy and facilitates the resolution of goal conflicts. Experimentation continues to determine how an agent can adapt to environment changes through the modification of the emotional valences associated with perceptual affordance and goal importance. 6 Future Directions and Discussion The research described here reflects on two perspectives to the integration of emotion into cognitive agents. The architecture of Figure 1 has limited reasoning capabilities (more limited than the implementations described in [9]), but makes use of a coherent emotion-based control language. CRIBB on the other hand is a serial deliberative (BDI) model that does not try to provide a coherent story for all the types of control states identified in [23] and [9]. A complete architecture would subsume the use of emotion as a control language, and indeed the entire BDI reasoning processes of CRIBB. The current separation enables complimentary work to progress independently. As this research develops the two complementary architectures can be integrated. Sensor Sensor Agent(s) Agent(s) Figure 2. A distributed model that draws together the emotion engine of Figure 1, CRIBB and earlier work. Duffy's theory that emotions occur in a continuum or a number of continua, which includes both 'emotional' and 'non-emotional' behavior, views emotion as a more integrated part of behavior rather than a separate element. This view is also supported by Scherer who argues the pattern of all synchronised changes in the different components over time constitutes an emotion. Both of these theories can be viewed as 'distributed' models. Using a distributed emotion model is problematic in CRIBB as CRIBB's serial reasoning model is not amenable to the asynchronous re-evaluation of plans and information processing that takes place within a distributed system. No such problem exists with the four layer architecture - it is designed to be asynchronous and distributed. Oatley and Johnson-Laird propose that each goal and plan has a monitoring mechanism that evaluates events relevant to it. This mechanism broadcasts to the whole cognitive system, allowing it to respond to change as it occurs. For a distributed model of emotion, the monitoring system would need not only to communicate goals and plans but also respond to each sub-system. CRIBB can be readily extended with a central monitoring system without jeopardising its reasoning model. This module already exists in the architecture of Figure 1 as the deliberative Motivator processes. As a development of the work described here, distributed versions of the four-layer architecture are being investigated. The extended model includes those aspects described earlier in this paper and other work [9], and is being implemented as a multi-agent society (see Figure 2). In this architecture CRIBB is modeled as a deliberative perception agent and a separate deliberative reasoning agent. Changes to an agent's beliefs are possible through external influence, as in the Castelfranchi model of autonomy [7], using the mechanisms inherent in extended CRIBB (set E of affordances) mediated by ongoing emotional valences of the emotion engine. Exploratory implementations have made use of a simplified version of the motivator structures used in earlier work. Current work looks to formalise the control language based on a computational model of emotion that draws on the Oatley, Frijda and Sherer theories, i.e. a goal-based theory of emotion with modal responses and no basic emotions. References [1] Alterman, R., (2000) Rethinking autonomy, Minds and Machines, 10(1):15-30. [2] Bartsch, K. & Wellman, H. (1989) Young children's attribution of action to beliefs and desires. Child Development, 60, 946-964. [3] Buck, R. (1986) The Psychology of Emotion. In J. LeDoux, W. Hirst (Ed.) Mind and Brain: Dialogues in Cognitive Neuroscience, Cambridge University Press. Cambridge. [4] Brooks, R. A. (1991) Intelligence without representation, Artificial Intelligence, 47:139-159. [5] Bower, G. (1994) Some Relations Between Emotions and Memory, In P. Ekman and R. Davidson (Ed.). Nature of Emotion, Oxford University Press. New York, 303-306. [6] Bratman, M.E., Israel, D.J. & Pollack, M.E., (1988) Plans and resource-bounded reasoning, Computational Intelligence, 4, 349-355. [7] Castelfranchi, C. (1995) Guarantees for autonomy in cognitive agent architectures. In: Wooldridge, M. and N. R. Jennings (Eds), Intelligent Agents. Springer-Verlag: 56-70 [8] Damasio, A. (1994) Descartes Error: Emotion, Reason and the Human Brain, New York. Avon Books. [9] Davis, D. N. (2001) Control States and Complete Agent Architectures, Computational Intelligence, 17(4). [10] Duffy, E. (1941) An explanation of 'Emotional' phenomena without the use of the concept 'Emotion', The Journal of General Psychology, 25, 283-293 [11] Elliott, C. (1992) The Affective Reasoner: A Process Model of Emotions in a Multi-agent System. PhD thesis, Northwestern University. [12] Ferber, J. (1999) Multi-Agent Systems, Addison-Wesley. [13] Frijda, N. (1987) The Emotions, Cambridge University Press, Cambridge [14] Gibson, J. (1986) The Ecological Approach to Visual Perception Lawrence Erlbaum Associates, New Jersey [15] Goleman, D. (1998) Working with Emotional Intelligence Bloomsbury, London. [16] Izard, C. (1993) Four systems for emotion activation, Psychological Review, 100(1), 68-90. [17] Matari, M. J. (1997) Studying the role of embodiment in cognition, Cybernetics and Systems, 28(6), 457-470. [18] Merleau-Ponty, M. (1965) The Structure of Behaviour, Methuan, London. [19] Newell, A., (1990), Unified Theories of Cognition, Harvard University Press. [20] Oatley, K. (1992), Best Laid Schemes, Cambridge University Press, Cambridge. [21] Ortony, A., Clore, G. & Collins, A. (1988) The Cognitive Structure of Emotions Cambridge University Press, Cambridge. [22] Scherer, K. (1994) Toward a Concept of 'Modal Emotions', In Nature of Emotion, P. Ekman, R. Davidson (Eds.) Oxford University Press, New York. [23] Simon, H. A. (1979) Motivational and emotional controls of cognition, In: Models of Thought, Yale University Press. [24] Sloman, A. & Croucher, M. (1987). Why Robots will have emotions. Proceedings of. IJCAI-87, 197-202. [25] Sloman, A. (1999) Architectural requirements for humanlike agents both natural and artificial, In Human Cognition and Social Agent Technology, K. Dautenhahn (ed. ), Benjamins. [26] Toda, M. (1962) The Design of a Fungus Eater, Behavioural Science, 7, 164-183. [27] Wahl, S., Spada, H. (2000) Children's Reasoning about intentions, beliefs and behavior, Cognitive Science Quarterly, 1, 5-34. [28] Wehrle, T. (1994) New Fungus Eater Experiments. In P. Gaussier & J. D. Nicoud (Eds), From Perception to Action (400-403). Los Alamitos. IEEE Computer Society Press Emotional Learning as a New Tool for Development of Agent-based Systems Mehrdad Fatourechi Department of Electrical and Computer Engineering, University of Tehran, Tehran, Iran mehrdadf@,ece.ubc.ca Caro Lucas Center for Excellence and Intelligent Processing, Department of Control and Electrical Engineering University of Tehran, Tehran, Iran lucas@ipm.ir Ali Khaki Sedigh Department of Electrical Engineering, K.N.Toosi University of Tehran, Tehran, Iran sedigh@eetd.kntu.ac.ir Keywords: Intelligent control, multivariable systems, emotional learning, neurofuzzy control, agents. Received: October 21, 2002 A new approach for the control of dynamical systems is presented based on the agent concept. The control system consists of a set of neurofuzzy controllers whose weights are adapted according to emotional signals provided by blocks called emotional critics. Simulation results are provided for the control of dynamical systems with various complexities in order to show the effectiveness of the proposed method. 1 Introduction It is widely believed that decision making, even in the case of human agents, should be based on full rationality and emotional cues should be suppressed in order to not influence the logic of arriving at proper decisions. The assumption of full rationality, however, has sometimes been abandoned in favor of satisficing or bounded rationality models [1], and in recent years, the positive and important role of emotions have been emphasized not only in psychology, but also in AI and robotics ([2]-[4]). Very briefly, emotional cues can provide an approximate method for selecting good actions when uncertainties and limitations of computational resources render fully rational decision-making based on Bellman-Jacobi recursions impractical. In past researches ([5-9]), a very simple cognitive/emotional state designated as stress has been successfully utilized in various control applications. This approach is actually a special case of the popular reinforcement learning technique. However, in this case it is believed that since the continual assessment of the present situation in terms of overall success or failure is no longer simple behaviorist type of conditioning but it is closer to the definition of cognitive state modification and adaptation learning, the designation of emotional learning seems more appropriate. We should emphasize that here emotion merely refers to stress cue, and the use of other, and perhaps higher emotional cues are left for future research. On the other hand, in recent years, fuzzy logic has been extensively employed in the design of industrial control systems because Fuzzy controllers can work fine in conditions such as severe nonlinearities, time varying parameters or plant uncertainties as supervisory controllers. Also in the last decade, the intelligent control community has paid great attention to the topic of neurofuzzy control, combining the decision-making property of fuzzy controllers and learning ability of neural networks. Hence we have chosen a neurofuzzy system as the controller in our methodology. In the present paper, the idea of applying emotional learning [8] to the dynamic control systems using the agent concepts [10] is addressed. This paper can be considered as the general framework for the previous Single-Input Single Output (SISO) works ([6]-[9]) and NSISO systems ([5]). In general, control scheme consists of a set of agents whose tasks are to provide appropriate control signals for their corresponding system's input. Each agent consists of a neurofuzzy controller and a number of critics, which evaluate the outputs' behavior of the plant and provide the appropriate signals for the tuning of the controllers. Simulation results for the control of the Vander Pol system (single-agent single-critic approach), a strongly coupled plant with uncertainty (multi-agent multi-critic approach) and the famous inverted pendulum benchmark (single-agent multi-critic approach) are provided to show the effectiveness of the proposed methodology. The main contribution of the current paper is the introduction of an easily implementable framework that could lead to a controller design with little tuning effort. We have adopted an agent-oriented approach to encapsulate separate concerns in multiobjective and multivariable controller design. The organization of this paper is as follows: The focus of Section 2 is on the emotional learning and how it can be applied in the control scheme. A brief review of agent concepts and how they could be used in control applications is brought up in section 3. The structure of the proposed controller and its adaptation law are developed in section 4 and in section 5, simulation results are provided to clarify the matter further with the final conclusions to be addressed in section 6. 2 Emotional learning According to psychological theories, some of the main factors of human beings' learning are emotional elements such as satisfaction and stress. Emotions can be defined as states elicited by instrumental reinforcing stimuli, which if their occurrence, termination or omission is made contingent upon the making of a response, alter the course of future emission of that response [11]. Emotions can be accounted for, as a result of the operation of a number of factors, including the following [11]: 1. The reinforcement contingency (e.g. whether reward or punishment is given, or withheld). 2. The intensity of reinforcement 3. Any environmental stimuli might have a number of different reinforcement associations. 4. Emotions elicited by stimuli associated with different reinforcers will be different. It should also be mentioned that in this paper, emotion merely refers to stress cue and other (and perhaps higher) emotions are not considered here In our proposed approach, which in a way is a cognitive restatement of reinforcement learning in a more complex continual case (where reinforcement is also no longer a binary signal), there exists an element in the control system called emotional critic whose task is to assess the present situation which has resulted from the applied control action in terms of satisfactory achievement of the control goals and to provide the so called emotional signal (the stress). The controller should modify its characteristics so that the critic's stress is decreased. This is the primary goal of the proposed control scheme, which is similar to the learning process in the real world because in the real world, we also search for a way to lower our stress with respect to our environment ([12-13]). As seen, emotional learning is very close to reinforcement learning, but the main difference between them is that in the former the reinforcement signal is an analog emotional cue that represents the cognitive assessment of future costs given the present state. So here the system does not wait for a total failure to occur before it starts learning. Instead, it continues its learning process at the same time as it applies its control action. The resulting analog reinforcement signal constitutes the stress cue, which has been interpreted as cognitive/emotional state. In the next section, we'll discuss the concept of agent-based systems that will be used as the framework of our proposed control system. 3 Agent Concept and Multi-Agent Systems The main problem of dealing with multivariable control systems is dealing with cross-coupled components between different inputs and outputs. In other words, changing an input not only makes some changes in the corresponding output, but also influences other outputs as well. As it will be discussed in section 4, emotional learning provides a simple useful tool in dealing with such unwanted effects. The concept of this method can be easily developed within the framework of multi-agent systems. In order to do that, in this section we briefly address agents and multi-agent systems. Here we define an agent as referring to a component of software/hardware, which is capable of accomplishing tasks on behalf of its user. By reviewing Jennings and Wooldridge's work [14], we define an agent to be any kind of object or process that exhibits autonomy, is either reactive or deliberative, has social ability, and can reason, plan, learn, and- or adapt its behavior in response to new situations. Multi-agent systems (MASs) are systems where there is no central control: the agents receive their inputs from the system (and possibly from other agents as well) and use these inputs to apply the appropriate actions. The global behavior of MAS depends on the local behavior of each agent and the interactions between them [15]. The most important reason to use MAS when designing a system is that some domains require it. Other aspects include: Parallelism, Robustness, Scalability and Simple Design. Based on these concepts, we have proposed an emotion-based approach for the control of dynamic systems, which will be discussed in the next section 4. An Emotion-based Approach to the Control of Dynamic Systems using Agent Concept In this section we design an intelligent controller based on the concepts considered in the previous sections. Fig. 1 shows the proposed agent's components and their relation with each other based on the idea presented in [16]. As it can be seen, the agent is composed of four components. It perceives the states of the system through its sensors and also receives some information provided by other agents, then influences the system by providing a control signal through its actuator. The critics assess the behavior of the control system (i.e. criticize it) and provide the emotional signals for the controller. According to these emotional signals, the controller produces the control signal with the help of the Learning element, which is adaptive emotional learning. Inputs of this learning element are the emotional signals provided by both the agent's critics and other critics as well. To other Agents Emotional Critics From other Agents I Emotional Learning I Neurofuzzy Controller I Actuator Agent Output Signals from the Plant Control Signal Fig 1. Structure of an agent in the proposed methodology Fig 2. Multi-agent based approach to multivariable control The number of the agents assigned here is determined based on the number of the inputs of the system. The number of the outputs of the system is effective in determining the number/structure of the system's critics, which their role is to assess the status of the outputs. (See Fig.2 for the schematic of the presented approach when applied to a two input - two output control system where Ui and U2 denote the control signals and Oi and O2 are the outputs of the system). We now develop the controller structure for the multivariable systems, in general. From these calculations, derivation of the special case of SISO systems is straightforward. In the general case of multivariable systems, each agent consists of a neurofuzzy controller. All of the neurofuzzy controllers have identical structures; each one has four layers. The first layer's task is the assignment of inputs' scaling factors in order to map them to the range of [-1, +1] (the inputs are chosen as the error and the change of the error in the response of the corresponding output). In the Second layer, the fuzzification is performed assigning five labels for each input. For decision-making, max-product law is used in layer 3. Finally, in the last layer, the crisp output is calculated using Takagi- Sugeno formula [17], Žun + 2 + cu ) yi = (1) l=1 Z ui l=1 {for i = 1,2, K, n) Where xi1 and xi 2 are inputs to the controller (the error and the change of error of the corresponding output), i, n, uil, p, and yi are the index of the controller, number of controllers, l'th input of the last layer, number of rules in the third layer and output of the controller, respectively and ail's, bil's and cil's are parameters to be determined via learning. For each output, a critic is assigned whose task is to assess the control situation of the outputs and to provide the appropriate emotional signal. The role of the critics is very crucial here because eliminating unwanted cross-coupled effects of the multivariable control systems is very much dependent on the correct operation of these critics. Here, all the critics have the same structure as of a PD fuzzy controller with five labels for each input and seven labels for the output. Inputs of each critic are error of the corresponding output and its derivative and the output is the corresponding emotional signal. Deduction is performed by max-product law and for defuzzification, the centroid law is used. The emotional signals provided by these critics contribute collaboratively for updating output layer's learning parameters of each controller, thus the cross-coupled nature of multivariable systems is considered in the critic and not in the controller itself. The aim of the control system is to minimize of the sum of squared emotional signals. Accordingly, first we describe the error function E as follows, m E = Z rj J=1 Fig.3. The control loop in the case of SISO systems Where rj is the emotional signal produced as the output of j's critic, Kj is the corresponding output weight and m is the total number of outputs (for the special case of SISO systems, Kj=1 and m=1) For the adjustment of controllers' weights the steepest descent method is used, dr, dr,- ^^i = -Vi -— (i = 1,2, K, n) dai (3) Where ni is the learning rate of the corresponding neurofuzzy controller and n is the total number of controllers. In order to calculate the RHS of (3), the chain rule is used, dE drj cy, du, dE 7=1 drj Sy, Ou, da^ (i=1... n (4) From (2), we have, dE and j=1, m) = K, ■ r, dr, j j (5) Also, ^=^ji ( J = 1,2,k , m) (i = 1,2, K, n and J = 1,2, k , m) du, (6) Where Jji is the element located at the ith column and jth row of the Jacobean matrix. Taking yrefj -yj j=1,2,^, m ej = (7) Where ej is the error produced in the tracking of jth output and yrejg is the reference input (in case number of outputs is greater than the number of inputs, some of yrefjs are taken as zero as it will be cleared in the next section by the inverted pendulum example). Now we have, dy, dej (8) Since with the incrimination of error, r (the stress of the critic) will also be incremented and on the other dr. hand, on-line calculation of -is accompanied with dej measurement errors, thus producing unreliable results, only the sign of it (+1) is used in our calculations. From (2) to (8), A®, will be calculated as follows, a®, =n Z Kj .rj .Jj,. daa j=i ' Cop = 1, Cod = 2, Crp = 1, Crd = 2,n = 20 (i = 1,2, K, n and j = 1,2, k , m ) (9) Equation (9) is used for updating the learning parameters aus, bus and cu's in (1), which is straightforward. In the next section, we'll apply the proposed method to several SISO and NSISO plants with different properties in order to see the performance of the proposed control methodology in practice. 5. Simulation Results In this section, the proposed method is applied to control three dynamical systems. The first one is the highly nonlinear SISO Vander Pol system where a single-agent approach is used. In the second one, the controller is applied to a multivariable linear control plant with different conditions so that its robustness in the pre sence of parameter uncertainties is shown. This example is concerned with systems with equal number of inputs and outputs. In the last example, we apply our controller to the famous inverted pendulum benchmark, which is a SIMO (single-input multi-output) nonlinear non-minimum phase system. Example 1: Vander Pol Equation + Our first example discusses the control of the Vander Pol system, which is considered as a highly nonlinear SISO system. We use a single-agent single-critic approach here. The equations governing this system are as follows: X + (1 - x^)x + x = u y = x (10) In (10), u is the input, x is the single state equation and y is the output if the system. The block diagram of the control system is shown in Fig.3. The Input scaling factors and the learning rate of the control system are chosen as follows: Fig 4. Step response of the Example 1 Step response of the control system is shown in Fig.4. The result shows the power of the proposed algorithm in the control of this nonlinear SISO system. Example 2: Control of a plant with different conditions In our second example, the problem of handling a multivariable plant with uncertainties is investigated. The plant has the following transfer function [18]: P(s) = k12 1 + s^jj 1 + sA k 21 k22 1 + sA21 1 + sA. 12 22 (12) It has a total of nine plant conditions as given in Table 1. Our goal is to achieve the desired step response while output decomposition is maintained. A major problem encountered here is the tuning of control system's coefficients in order to provide an acceptable step response. It's a time consuming task and there exists the possibility that the desired step response may never be achieved. Our Experience with this control structure shows that when the change made in the input of the system is smooth (i.e. there are no sudden changes like applying a step response in the input but instead smoother inputs like sinusoids are applied) the control system acts very well. The reason is obvious: it takes much more time for the neural networks' weights to adjust when the input of the system changes suddenly (let's call it harsh input) compared to a situation where a much smoother input is applied. This problem is more evident in the case of multivariable systems when more than one controller's weights should be adjusted. Hence, when applying a harsh input to a system, we'll change it to a smooth one by pre-filtering it and obtaining a smooth (filtered) input for the system instead of harsh (unfiltered) one. The pre-filters' specifications are determined by the properties of the desired step response. The results of the simulations here show that this approach, although it's simple, is very efficient in different control situations. In this example, suppose that it is desired that both Table 1 : Nine plant conditioiu ofeitaii^jle 2 Plant Condition Kn KJ: Kij K:i An AJ: Aij Aji 1 1 2 05 1 1 2 2 3 2 1 2 05 1 0.5 1 1 2 3 1 2 05 1 0.2 0.4 05 1 4 4 5 1 2 1 2 2 3 5 4 5 1 2 0.5 1 1 2 6 5 5 1 2 0.2 0.4 05 1 7 10 S 2 4 1 2 2 3 S 10 S 2 4 0.5 1 1 2 9 10 S 2 4 0.2 0.4 05 1 Table 2: Additional plant conditions for the muLtivariable s^/stem in example 2 Plant Co udi tin n Kil K:: Ki: K:i Ali Aj: Ai: A:i 10 1 2 1 1 1 2 2 3 11 1 1 1 1 1 2 2 3 12 1 1 0.5 1 1 2 2 3 13 1 2 2.5 1 1 2 2 3 14 10 S 10 S 0.2 0.4 0.5 1 15 10 S 10 S 0.7 0.4 0.5 1 outputs have no overshoot and a rise time not more than 1 second. Accordingly, based on a rough measure the transfer functions of pre-filters are the same and are chosen as follows (note that achieving more compicated inputs requires more complicated pre-filter design technics which is not the topic of our discussion here): HA.) == 16 s (13) + 8s +16 Results of simulations are shown in Fig. 5 for a step response at t=0 at the first input and another step response at t=3 at the second input (Since all the conditions nearly produced the same results, the results of simulations for three selected conditions are chosen to be plotted). As it is clearly obvious, the change of plant conditions has little or almost no effect in the step responses of the system, i.e., system shows great robustness in the presence of uncertainties. Comparing the results with those obtained by classical methods such as the one in [18], shows the superiority of the proposed algorithm. Although we've achieved good step responses and great robustness, but we should take another important aspect into notice and that's the interaction in this system, which is not high. Interaction is the major drawback in the design of multivariable systems because it introduces unwanted effects from different inputs in the outputs of the system. The more is the interaction in the systems, the more complex the control approach will be. In order to show that our proposed controller can tolerate bigger parameter changes, which yield situations with high interaction, we added 6 more conditions to the previous ones (Table 2). The results of applying the controller are shown in Fig. 6 for two selected conditions. As we can see, our method also shows great robustness to parameter uncertainties in the presence of high interactions. Example 3: An Inverted Pendulum: The problem of balancing an inverted pendulum on a moving cart is a good example of a challenging multivariable situation, due to its highly nonlinear equations, non-minimum phase characteristics and the problem of handling two outputs with only one control input [19] (the position of cart is sometimes ignored by the researchers [20]). Here, the dynamics of the inverted pendulum are characterized by four variables: 0 (angle of the pole with respect to the vertical axis), 0 (angular velocity of the pole), z (position of the cart on the track), and z (velocity of the cart). The behavior of these state variables is governed by the following two second-order differential equations [17]: 2 0 C 0*( - F - m * l *0 * Sin0) g * Sin0 + Cos0 * (-) 0=- l *()/ - ) (14) mc + m z =■ F + m * l *(0 * Sin0-0* Cos0) (15) Where g (acceleration due to gravity) is 9.8 m mc (mass of cart) is 1.0 kg, l (half-length of pole) is 0.5 m, and F is the applied force in Newton. Our control goal is to balance the cart, yet keep the z not further than 2.5 meters from its original position. We use a single agent here, which provides the force F to the system and applies two emotional critics to assess the output. The first one criticizes the situation of the pole and the second one does the same for the cart's velocity. Both critics are satisfied when inputs to them are zero (i.e. the pendulum is balanced and the cart has no velocity). The results of simulation for initial condition 00 = 10 deg. are presented in Fig. 7. They show that after nearly six seconds the pole is balanced and the cart is stopped successfully around 1.4 meters from the original position. u o.s Fig.5. Simulation results of example 2. 5(a) condition 1; 5(b) condition (4); 5(c) condition (9) Fig.6. Simulation results of example 2. 6(a) condition 10; 6(b) condition (12). mc + m 2 M -1 .S T im e [S e C .) Tim e(S ec .) Fig. 7. Responses of variables of example 1 (from left to right: pole's angle and cart's position) 6. Discussions and Conclusions In this section we discuss the general properties of the proposed framework and we'll summarize the work that has been done in this paper. 6.1. The role of emotional signals in the proposed control scheme The proposed methodology is based on continuous emotional (stress) signals, which can be considered as performance measures of particular parts of the control system, which are of interest. In this paper, the parts that we're interested in, are the outputs of the control system and the cross-coupled components in the multivariable systems. In each part, the nearer we got to our predefined target, the less is the corresponding emotional signal and vise versa. With this simple approach, we can easily include any parts of the plant on which we want to have a control on it, in our framework. For example, for excluding the effect of cross-coupled components in multivariable systems, we'll assign a critic for each component. This critic would judge whether the control system has counterbalanced the cross-coupled effects or not. Based on the success of the controller in dealing with interaction, emotional signals are produced by the critics who in their turn would tune the parameters of the neurofuzzy controller so that the stress of the critics would be decreased. The same situation also holds for the inverted pendulum example in the previous section. The main variable that is of interest is the position of the pendulum regards to the vertical axis, but at the same time the position of the cart is also of interest here as the secondary control variable. Hence the inputs of the neurofuzzy controller are the error and the derivative of the angle of the pole with respect to the vertical axis but the weights of the controller are tuned based on the outputs of two critics; the first one criticizes the position of the pole and the second one does the same for the position of the cart. Both critics produce continuous signals until their inputs are zero, i.e. when the predefined targets are achieved. Next we'll discuss how our framework is related to agents and multi-agent systems and then we'll discuss the advantages and the shortcomings of the current method briefly which will be followed by description of future works. 6.2 Relationship between the Proposed Framework and Agent-Based Systems In this paper a major consideration has been distributing control concerns via agents. Each agent is used for representation of a control concern. In this paper we have used the notion of agency only in conceptual sense and no effort has been made towards utilization of agent-oriented technologies like ACL's, platforms, wrappers, etc. However, those technologies can be of benefit in future, more complex applications. The main agent property in our paper is autonomy. Our agents can be both interpreted as deliberative as well as reactive (since emotion is a mental state, but is also very close to the concept of reinforcement), and learning, reasoning, and adaptation is central to our proposed controller. Other benefits of agent orientation can also be seen to be applicable. 6.3 Conclusions and Future Works In this paper, the emotional learning based intelligent control scheme was applied to dynamic plants. Also the performance of the proposed algorithm was investigated by several benchmark examples. The main contribution of the proposed generalization is to provide the easy to implement emotional learning technique for dealing with dynamic (especially multivariable) control systems where the use of other control methodologies (specially intelligent control methods) are sometimes problematic ([21]). Simplicity and tolerance for uncertainties and nonlinearities is what is gained by its use. This is shown in various contributions for SISO and NSISO systems in this paper. On the negative side, it should be pointed out that only a very simple learning algorithm has been used throughout this paper. Although this stresses the simplicity and generality of the proposed technique, more complex learning algorithms involving time credit assignments [22] and temporal difference [23] or similar methods might be called for when processes involve unknown delays. Also as the number of the inputs and outputs of the system grows, the tuning of the control parameters becomes a tiresome task. A continuous genetic algorithm based optimization method is under development to find the optimal selection of the tuning parameters of the overall control system automatically. Having this done, generalization to systems with multiple numbers of inputs and outputs (more than two) can be realized efficiently. Other works include the application of multiple critics in a SISO plant in order to achieve multiple objectives ([6-7]). For example in [6], two objectives (good tracking and low control costs) are considered simultaneously. This reference shows the difference between our approach and supervised learning in a clearer manner, because it shows that our proposed methodology can perform well not only perform well in the case of cheap control, but also where control action also involves costs. Also implementing such control system for a switched reluctance motor (as a practical system) is under investigation. Again, agent orientation can underline the fact that each objective can be considered a separate concern delegated to an agent Our future works include changing the structure of the controller so that it could be applied to processes with unknown delays, considering more emotions in our control structure, optimizing the structure of the controller (for example using genetic algorithm for optimum selection of the membership functions of both the controllers and the critics), and finally considering more complex cues in our learning process. Acknowledgement The authors would like to thank the two anonymous referees for their valuable comments. References [1] A. H. Simon and Associates (1987), Decision making and problem solving, Interfaces, no.17. [2] C. Balkenius and J. Moren (2000), A Computational Model of Context Processing, 6th International Conference on Simulation of Adaptive Behavior, Cambridge. [3] M.El-Nasr, T. Loerger, and J.Yen (1999), Peteei: A Pet with Evolving Emotional Intelligence, Autonomous Agents99, pp.9-15. [4] J. Velasquez (1998), a Computational Framework for Emotion-Based Control, the Grounding Emotions in Adaptive Systems Workshop SAB '98, and Zurich, Switzerland. [5] M. Fatourechi, C. Lucas, and A. Khaki Sedigh (2001), An Agent-based Approach to Multivariable Control, IASTED International Conference on Artificial Intelligence and Applications, Marbella, Spain, pp.376381. [6] M. Fatourechi, C. Lucas and A. Khaki Sedigh (2001), Reducing Control Effort by means of Emotional Learning, 9th Iranian Conference on Electrical Engineering (ICEE2001), Tehran, Iran, pp.41-1 to 41-8. [7] M. Fatourechi, C. Lucas and A. Khaki Sedigh (2001), Reduction of Maximum Overshoot by means of Emotional Learning, 6th Annual CSI Computer Conference, Isfahan, Iran, pp.460-467. [8] C. Lucas and S.A. Jazbi (1998), Intelligent motion control of electric motors with evaluative feedback, agre98, Cigre, France, 11-104, pp.1-6. [9] C. Lucas, S.A. Jazbi, M. Fatourechi, M. Farshad (2000), Cognitive action selection with neurocontrollers, Third Iran-Armenia Workshop on Neural Networks, Yerevan, Armenia. [10] P. Maes (ed.) (1991), Designing autonomous agents: theory and practice from biology to engineering and back, The MIT press, London. [11] E. D. Rolls (1998), the Brain and Emotion, Oxford University Press. [12] K.M. Galloti (1999), Cognitive psychology in and out of laboratory (2nd ed.), Brooks/Cole, Pacific Grove, CA. [13] R.W. Kentridge and J.P. Aggleton (1990), Emotion: Sensory representations, reinforcement and the temporal lobe, Cognition and Emotion 4, pp. 191208. [14] M. Wooldridge and N. Jennings (1995), Intelligent Agents: Theory and Practice, The Knowledge Engineering Review, 10 (2), pp.115-152. [15] M. Wooldridge (1999), Intelligent agents, in G. Weiss (Ed.), Multi agent Systems: A modern approach to Distributed Artificial Intelligence, MIT Press, London, pp.27-77. [16] S. Russel and P. Norwig (1995), a modern approach to artificial intelligence, Prentice-Hall, Englewood Cliffs. [17] T. Takagi and M. Sugeno (1983), Derivation of fuzzy control rules from human operator's control actions, IFAC Symp.on Fuzzy Information, Knowledge Representation and Decision Analysis, pp.55-60. [18] C.C. Cheng, Y.K. Liao, T.S. Wanq (1997), Quantitative design of uncertain multivariable control system with an inner-feedback loop, IEE Proceedings on Control Theory Applications, no.144, pp.195-201. [19] R. H. Cannon (1967), Dynamics of Physical Systems, McGraw-Hill, New York. [20] J. S. Jang (1992), Self-learning fuzzy controllers based on temporal back propagation, IEEE Transactions on Neural Networks, 3(5), pp. 714-723. [21] P.G. Lee, K.K. Lee and G.J. Jeon (1995), An index of applicability for the decomposition method of multivariable fuzzy systems, IEEE Transactions on Fuzzy Systemsjno. 3, pp. 364-369. [22] R. S. Sutton, A. G. Barto (1987), A Temporal -Difference Model of Classical Conditioning, 9th Annual Conference on Cognitive Science, New Jersey, pp.355378. [23] R. S. Sutton (1988), Learning to Predict by the Method of Temporal Differences, Machine Learning, no.3, pp.9-44. Learning Behavior-selection in a Multi-goal Robot Task Sandra Clara Gadanho and Luis Custódio Institute of Systems and Robotics, IST, Lisbon, Portugal sandra@isr.ist.utl.pt http://www.isr.ist.utl.pt/^sandra/ lmmc@isr.ist.utl.pt http://islab.isr.ist.utl.pt/ Keywords: learning, emotions, autonomous robots Received: October 30, 2002 The purpose of the work reported here is the development of an autonomous robot controller which learns to perform a multi -goal and multi-step task when faced wi th real world problems such as continuous time and space, noisy sensors and unreliable actuators. In order to make the learning task feasible, the agent does not have to learn its action abilities from scratch, but relies on a small set of simple hand-designed behaviors. Experience has shown that these low-level behavi ors can be ei ther easily designed or learned but that the coordinati on of these behavi ors is not trivial. To solve the problem at hand, a dual-system architecture is proposed in which a traditional reinforcement learning adaptive system is complemented with a goal system responsible for both reinforcement and behavior switching. This goal system is inspired by emotions, which take a functional role on this work, and are evaluated in terms of their engineering benefits, i.e. in terms of their competi tiveness when compared wi th alternative approaches. Experiments reported carefully evaluate the goal system and determine its requirements. 1 Introduction In order to master a task, a robot controller may use reinforcement-learning techniques (e.g., 26; 14, for surveys on RL) to learn the appropriate selection of simple actions. For more complex tasks, skill decomposition is usually advisable as it can significantly reduce the learning time, or even make the task feasible. Skill decomposition usually consists of learning some predefined behaviors in a first phase and then finding the high-level coordination of these behaviors. Although the behaviors themselves are often learned successfully, behavior coordination is much more difficult and is usually hard-wired to some extent in other robotics applications (17; 15; 19). While learning the low-level behaviors consists of deciding on a simple reactive action on a step-by-step basis, when learning behavior selection apart from deciding which behavior to select, the controller must also decide when to switch and reinforce behaviors. There are various reasons why a behavior may need to be interrupted: it has reached its goal; it has become inappropriate, due to changes in the environment; or it is not able to succeed in its goal. In practice, the duration of a behavior must be long enough to allow it to manifest itself, and short enough so that it does not become inappropriate due to changing circumstances. The problem of deciding when to change behavior is not an issue in traditional reinforcement learning problems, because these usually consist of grid worlds divided in cells which represent states. In those worlds, the execution of a single discrete action is responsible for a state transition since it moves the agent to one of the cells in the neighborhood of the cell where the agent is located. In a continuous world, the determination of a state transition is not clear. In robotics, agent states change asynchronously in response to internal and external events, and actions take variable amounts of time to execute (19). As a solution to this problem, some researchers extend the duration of the current action according to some domain-dependent conditions of goal achievement or applicability of the action. Others will interrupt the action when there is a change in the input state (22; 2). However, this may not be a very straightforward solution when the robot is equipped with multiple continuous sensors that are vulnerable to noise. (18) go a step further, and auto-regulate the degree of discrimination of new events by attempting to maintain a constant attentional effort. Inspired by literature on emotions, previous work has shown that reinforcement and behavior-switching can easily be addressed together by an emotion model (13; 12). The justification for the use of emotions is that, in nature, emotions are usually associated with either pleasant or unpleasant feelings that can act as reinforcement (e.g;., 27; 1; 4) and frequently pointed to as a source of interruption of behavior (25; 24). The task used in the current work has been solved with success by that emotional system as the goal system. The goal system proposed here represents an abstraction of that system which has similar performance. The current goal system does not model emotions explicitly although it is inspired on them, but instead tries to identify which are the properties the goal system must have in order to work correctly. This goal system is based on a set of homeo-static variables which it attempts to maintain within certain bounds. The goal system's required properties are identified within a complex task with multiple goals. Apart from dealing with real-world problems, the task developed has several features which pose extra difficulties to the learning algorithm: - it has multiple goals which may conflict with each other; - there are situations in which the agent needs to temporarily overlook one goal in order to successfully accomplish another; - the agent has short-term and long-term goals; - a sequence of behaviors may be required to accomplish a certain goal; - the behaviors are unreliable: they may fail their goal or they may lead the agent to undesirable situations; - the behaviors' appropriate durations are undetermined, they depend on the environment and on their success. In the next section, this task is described in detail. This will be followed by a description of the proposed architecture in terms of its goal system and adaptive system. Finally, the experiments made are described, the proposed architecture is compared with related work and conclusions reached are presented. To conclude future work on the architecture is discussed. 2 The Robot Task The experiments reported here evaluated controllers in a survival task that consists of maintaining adequate energy levels in a simulated environment with obstacles and energy sources which are associated with lights the agent can sense when nearby. The agent has basically three goals: to maintain its energy, avoid collisions and move around in its environment. To gain energy from an energy source, the robot has to bump into it. This will make energy available for a short period of time. It is important that the agent is able to discriminate the existence of available energy, because the agent can only get energy during this period. This energy is obtained by receiving high values of light in its rear light sensors, which means that the robot must quickly turn its back to the energy source as soon as it senses that energy is available. To receive further energy, the robot has to restart the whole process by hitting the light again so that a new time window of released energy is started. An energy source can only release energy a few times before it is exhausted. In time, the energy source will recover its ability to provide energy again, but meanwhile the robot is forced to search for other sources of energy in order to survive. The robot cannot be successful by relying on a single energy source, i.e. the time it takes for new energy to be available in a single energy source is longer than the time it takes for the robot to waste that energy. When an energy source has no energy, the light associated with it is turned off and it becomes a simple obstacle for the robot. The extraction of energy was complicated, as described above, in order to make the learning task harder by requiring the agent to learn sequences of behaviors. Moreover, it requires the agent to temporarily suppress its goal of avoiding obstacles in the process of acquiring energy. 3 The Robot Controller The proposed architecture — see Figure 1 — is composed by two major systems: the goal system and the adaptive system. The goal system evaluates the performance of the adaptive system in terms of the state of its homeostatic variables and determines when a behavior should be interrupted. The adaptive system learns which behavior to select using reinforcement-learning techniques which rely on neural-networks to store the utility values. The two systems are described in detail in the following. 3.1 Goal System In an autonomous agent, the goal system complements a traditional reinforcement-learning adaptive system in that it determines how good the adaptive system is doing, or more specifically, the reinforcement it is entitled to at each step. In the current work the goal system is also responsible for determining when behavior switching should occur. Previous work (13) addressed the problem of the goal system by using an emotional model. A mixture of perceptual values and internal values were used in the calculation of a single multi-dimensional emotional state. This state in turn was used to determine the reinforcement at each time step and significant differences in its value were considered to be relevant events used to trigger the behavior selection mechanism. In the current work, this system has been modified to emphasize the multiple goal nature of the problem at hand and identify and isolate the different aspects of the agentenvironment interaction that need to be taken into consideration when assessing the agent's overall goal state. The goals are explicitly identified and associated with home-ostatic variables. These homeostatic variables are associated with three different states: target, recovery and danger. The state of each variable depends on its continuous value which is grouped into four qualitative categories: optimal, acceptable, deficient and dangerous. See details of state transition in Figure 2 and an example of the categorization of the continuous values of an homeostatic variable Perception System Sensory Input Perceptions Sensory Input IÜI Homeostatic variables Goal System w A W WeU äj i Being J 7 State Reinforcement Inteimpt Ä Behavior System Behavior Stochastic Selection Neural Networks Adaptive System Motor Output Figure 1: The robot controller. Acceptable Optimal ^— \ Dangerous \ DeficientJ \ ( Recovery ) ^—Acceptable Deficient Dangerous Acceptable Deficient Danger ) Dangerous-—' Figure 2: The state transitions of an homeostatic variable dependent on its value. in Figure 3. The variable remains in its target state as long as its values are optimal or acceptable, but it only returns to its target state once its values are optimal again. This state transition is akin to that of a thermostat in that a greater deviation from the target values is required to change from a target state into a recovery state than the inverse transition. The danger state is associated with dangerous values and can be related with urgency of recovery. To reflect the current hedonic state of the agent a well-being value was constructed from the above. This value depends primarily on the values of the homeostatic variables. When a variable is in the target state it has a positive influence on the well-being, otherwise it has a negative influence which is proportional to its deviation from target values. In order to have the system working correctly two other influences on well-being were also required: State change — when a homeostatic variable changes from a state to another the well-being is influenced positively if the change is towards a better state and negatively otherwise; Prediction of state change — when some perceptual cue predicts the state change of a homeostatic variable, the influence is similar to the above, but lower in value and dependent on the accuracy of the prediction and on how soon the state change is expected. In particular, if a transition to the target state involves a sequence of steps then a positive prediction may be made any time a step is accomplished. The intensity of the prediction increases as the number of steps to finish the sequence is reduced. Predictions are always associated with individual home-ostatic variables and are only made if the corresponding variable value is not optimal. The two goal events just described were modeled after emotions, in the sense that they result from the detection of significant changes in the agent's internal state or predictions of such changes. In the same way that emotions are associated with feelings of 'pleasure' or 'suffering' depending on whether this change is for the better or not, these goal events influence the well-being value such that the information of how good the event is is conveyed to the agent through the reinforcement. One may distinguish between the emotion of happiness when a goal is achieved (or predicted to be achieved) and the emotion of sadness when a goal state is lost (or about to be lost). The primary influence of the homeostatic variables, on the other hand, is modeled after the natural background Deficient Acceptable Optimal Acceptable Deficient Maximum deviation (d™^) Figure 3: An example of the categorization of the possible continuous values of a homeostatic variable. emotions which reflect the overall state of the agent in terms of maintaining homeostasis (7). The goal events are also responsible for triggering the adaptive system for a new behavior selection, which is also often associated with emotions. The calculation of the well-being value (wb) is presented in Equations 1, 2 and 3. This depends on the domain-dependent set of homeostatic variables (H) in different ways: their state, their transitions and predictions. These different influences are weighted by their respective coefficients (cg, ct{h) and Cp) presented in Table 1. The weights wh are constants which denote the relative importance of each homeostatic variable h and their value should lie between -1 and 1. The value of the well-being is normalized by a constant value (w6max), calculated by Equation 3, so that it is never above 1.0 or below -1.0. This depends on the maximum absolute value (cmax) of the transition coefficient which is 1.0 (see Table 1). wb = ^ (csrs{h) + ct{'h) + cprp{h))wh (1) Tsih) = wbmax if h is in target hen 1 -d{h)/dma^ otherwise Wbmax = (cg + cTa^ + cp) ^ Wh hen (2) (3) The influence of the state of an homeostatic variable on well-being is expressed by rs{h) described in Equation 2. This value is 1 if the homeostatic variable is in its target state. Otherwise, it depends on the normalized deviation from optimal values, i.e. the shortest distance (d(h)) of the current value to a optimal value normalized by the maximum possible distance (dhmax, see example in Figure 3) of any value of this homeostatic variable to a target value. This ensures that the normalized deviation is always between 0 and 1. The values of predictions (rp(h)) depends on the strength of the current prediction and vary between -1 (for predictions of no desirable changes in the homeostatic variable h) and 1 (for predictions of desirable changes). If there is no prediction then rp(h) = 0. The values of both wh and rp(h) are domain-dependent and are presented later. For the task at hand, three homeostatic variables were identified: Energy — is the battery energy level of the agent and reflects the goal of maintaining its energy; Welfare — maintains the goal of avoiding collisions — this variable is in its target state when the agent is not in a collision situation; Activity — ensures that the agent keeps moving — if the robot keeps still its value slowly decreases until eventually its target state is not maintained. These variables are directly associated with the robot goals mentioned previously. Their associated weights (wh) was 0.5 for Energy, 0.3 for Welfare and 0.2 for Activity. These weights translate the relative importance of each one of the goals. The most important goal is for the agent to maintain its energy, obstacles should be avoided when possible and the activity goal is secondary. The homeostatic-variable values were categorized according to Table 2. State change predictions were only considered for the Energy and the Activity variables. In the Energy case, two predictions are made. A small value prediction is made whenever the light detected by the sensors is above a certain threshold (0.4) and its value has just changed signifi-cantly1. Another, higher-valued, prediction is made whenever the agent detects significant changes in energy available to re-charge. The actual values of the predictions are: - p(Ia) as expressed in Equation 4, with la being the energy availability, when the agent detects significant changes in energy availability; or - p(Ii)/2 with II equal to light intensity, if there is solely a detection of a light change. p(I) = I -0.5(1 - I) if I has increased if I has decreased (4) The Activity prediction is a sort of no-progress indicator given at regular time intervals when the activity of the robot is low for long periods of time. This is in fact a negative prediction (value of-1), because it predicts future failure in restoring Activity to its target state if the current behavior is maintained since it has failed to do it in a reasonable amount of time. It is important that the agent's behavior selection is triggered in these situations, otherwise a non-moving agent will eventually run out of events. 1A significant change is detected when its value is statistically different from the values recorded since a state transition was last made, i.e. if the difference between the new value and the mean of the previous values exceeds both a small tolerance threshold (set to 0.02) and J times the standard deviation of those previous values (the J constant was set to 2.5). Coefficient Definition Value Cs State coefficient 1.0 ct{h) State transition coefficient h changed from/to: - Target 1.0 - Danger -1.0 Target Recover -1.0 Danger Recover 1.0 h did not change state 0.0 Cp Prediction coefficient 0.5 Table 1: Coefficient values used in the experiments. Homeostatic variable Optimal Acceptable Deficient Dangerous Energy Welfare Activity [1.0,0.9] [1.0,0.9] [1.0,1.0] (0.9,0.6] (0.9,0.7] (0.9,0.8] (0.6, 0.2] (0.2,0.0] (0.7,0.0] (0.8,0.0] Table 2: Value intervals of the different qualitative categories of the homeostatic variables. 3.2 Adaptive System The adaptive system implemented is a well known reinforcement-learning algorithm which has given good results in the field of robotics: Q-Iearning (30). Through this algorithm the agent learns iteratively by trial and error the expected discounted cumuIative reinforcement that it wiII receive after executing an action in response to a worId state, i.e. the utility values. TraditionaI Q-Iearning usuaIIy empIoys a tabIe, which stores the utility value of each possible action for every possibIe worId state. In a reaI environment, the use of this table requires some form of partition of the continuous values provided by sensors. An aIternative to this method suggested by (15) is to use neuraI networks to Iearn by back-propagation the utiIity vaIues of each action. This method has the advantage of profiting from generalization over the input space and being more resistant to noise, but on the other hand neural-networks on-line training may not be very accurate. The reason being that the neuraI networks have a tendency to be overwhelmed by the large quantity of consecutive similar training data and forget the rare relevant experiences. Using an asynchronous triggering mechanism as the one proposed by the current architecture can help with this problem by detecting and using only a few relevant examples for training. The system uses one feed-forward neuraI network per behavior, with: 7 input units, six representing state information pIus one bias; 10 hidden units; and 1 output unit that represents the expected outcome of the associated behavior. The state information fed to the neural-networks comprises the homeostatic variabIe vaIues and three per-ceptuaI vaIues: Iight intensity, obstacIe density and energy avaiIabiIity2. AII these vaIues vary between 0 and 1. The deveIoped controIIer tries to maximize the reinforce- 2 High if a nearby energy source is releasing energy. ment received by seIecting between one of three possibIe hand-designed behaviors: Avoid obstacles — Turn away from the nearest obstacle and move away from it. If the sensors cannot detect any obstacle nearby, then remain still. Seek Light — Go in the direction of the nearest Iight. If no Iight can be seen, remain stiII. Wall Following — If there is no waII in sight, move forwards at full speed. Once a wall is found, follow it. This behavior by itseIf is not very reIiabIe in that the robot can crash, i.e. become immobilized against a waII. The avoid-obstacIes behavior can easiIy heIp in these situations. At each trigger step, the agent may select between performing the behavior that has proven to be better in the past and therefore has the best utility value so far, or selecting an arbitrary behavior to improve its information about the utility of that behavior. The selection function used is based on the Boltzmann-Gibbs distribution and consists of selecting a behavior with higher probability, the higher its utility vaIue in the current state. 4 The Experimental Procedure The evaIuation of the controIIer's success in the task described is not straightforward. To start with, the agent must be successful in accomplishing each one of its goals, which does not allow for a direct single-dimension evaluation between different controIIers. Furthermore, knowing if the agent is Iearning its task is not trivial. On the one hand, it is possible that the agent may solve its task simply by taking advantage of implicit domain knowIedge such as the information provided by the Figure 4: The simulated robot and its environment. behavior switching mechanism and the knowledge already contained in the hand-designed behaviors. On the other hand, it is not clear how well a reasonably competent controller can manage all the different goals simultaneously. For those reason, it is important to compare the performance of the controller with those of a random behavior-selection controller and of an alternative controller which is competent at the task. Secondly, although results may be artificially reproduced by simulation, robots in the real world are not expected to face exactly the same situations twice. Minor environmental changes or slightly different sensor or motor outcomes are likely to lead to very different experiences. This allied to the fact that the controller makes use of some random decisions in the learning exploration process make the results of a single trial test unreliable. For this reason, a rigorous evaluation of a controller requires several trials. Each experiment consisted of having thirty different robot trials of three million learning steps. In each trial, a new fully recharged robot with all state values reset was placed at a randomly selected starting position. For evaluation purposes, the following statistics were taken: Energy — mean energy level of the robot; Distance — mean value of the Euclidean distance d, taken at one hundred steps intervals3, between the opposing points of the rectangular extent containing all the points the robot visited during the last interval, it is a measure of how much distance was covered by the robot; Collisions — percentage of steps involving collisions; All the experiments were carried out in a realistic simulator developed by (20) of a Khepera robot — a small robot with a left and a right wheel motor, and eight infrared sensors that allow it to detect object proximity and ambient light. Six of the sensors are located in the front of the robot and two in the rear. The robot environment — Figure 4 — consisted of a closed environment with some walls and two lights surrounded by bricks on opposite corners. 5 Results The proposed controller has empirically shown its competence by exhibiting a performance similar to the emotional controller discussed previously — see Table 3. Previous exhaustive experiments on the emotional controller have shown that it was quite competent and performed better than more traditional approaches (13; 12). In fact, previous experiments on learning behavior selection reported by the designer of the adaptive system selected (15) had to resort to severe simplifications of the behavior selection learning task. These simplifications included having behaviors associated with very specific pre-defined conditions of activation and only interrupting a behavior once it had reached its goal or an inapplicable behavior had become applicable. The table also shows the results for a random controller which selects a behavior randomly at regular intervals4. These results show that both learning controllers significantly improve performance at all levels. To assess the necessity of each of its properties, the controller had properties removed one at a time and was empirically compared against the complete controller. The experimental results obtained are presented in Table 3 and the conclusions reached are the following. The behavior interruptions provided both by state transition and prediction of state transition proved essential to the performance of the task. The former are responsible for interrupting the behavior when a problem arises or has been solved. The latter allow the agent to take the necessary steps to accomplish its aims. In particular, a controller with no Energy predictions is not able to acquire energy and a controller with no Activity predictions will eventually stop moving. In terms of reinforcement, all types of contributions were found valuable and the controller was fairly robust against changes in the relative weights of influences of homeostatic state, state transitions and predictions. The controller is able to learn successfully without the predictions influence on reinforcement, but the time to convergence is slower. It was also found that, for the successful accomplishment of the task and in particular the achievement of their respective goals, all homeostatic variables should be taken into account in the reinforcement. Agents without activity reinforcement showed that it is more profitable for the agent to move as a last resort only when its energy is low and there is no light nearby. Avoiding moving helps to reduce the number of collisions. The controller's success is quite sensitive to the correct adjustment of the relative weights of each homeostatic vari- 3The robot takes approximately this number of steps to efficiently move between corners of its environment. 4 The interval selected was 35 steps based on previous results (11) which indicated this value as the most suited for the task. Controller Energy Collisions Distance Proposed 0.53 ± 0.02 0.94 ± 0.15 1.87 ± 0.03 Emotional 0.54 ± 0.01 1.65 ± 0.48 1.96 ± 0.02 Random 0.02 ± 0.01 3.64 ± 0.27 0.83 ± 0.01 Proposed controller with b State change Prediction Energy Activity ehavior-switch 0.50 ± 0.02 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 ing not triggere 2.07 ± 0.23 2 bumping 0.00 ± 0.00 3 bumping ;d by: 2.10 ± 0.02 3 moving 2.37 ± 0.00 0.00 ± 0.00 Proposed controller with r^ State (cs = 0) State change (ct{h) = 0) Prediction (cp = 0) Energy (wenergy = 0) Welfare (wwelfare = 0) Activity (wactivity = 0) einforcement n 0.42 ± 0.04 0.24 ± 0.05 0.48 ± 0.03 0.10 ± 0.03 0.58 ± 0.02 0.48 ± 0.02 ot affected by: 0.57 ± 0.10 0.76 ± 0.27 0.98 ± 0.22 0.21 ± 0.05 11.1 ± 2.10 0.48 ± 0.13 2.16 ± 0.01 2.03 ± 0.05 1.84 ± 0.04 1.97 ± 0.06 1.80 ± 0.05 0.92 ± 0.11 Table 3: Summary of the controllers' performance. It shows the means of the values obtained in the last three hundred thousand steps of each trial. Errors represent the mean 95% confidence intervals. Results are presented for the proposed controller, the reference emotional controller, a random controller and modified versions of the proposed controller. Modifications consisted of disregarding specific trigger events or selectively dropping influences on reinforcement. In the former case, and in particular if activity prediction events were ignored, the agent would eventually stop receiving triggering events altogether. This would usually happen with the agent stopped in an isolated position, but sometimes it would also happen to a moving agent or to an agent crashed into a wall. These exception cases are accounted for in the table. able on reinforcement. This is a problem introduced by the proposed architecture which did not arise in the emotional controller. In fact, this is the only reason why this controller may be considered worse than the emotional controller: it required extra design effort. 6 Related Work The idea of homeostatic values stems from neuro-physiological research on emotions (8; 7) and has been modeled previously by the DARE model (23; 16; 29). In the DARE model, which emphasizes the dual nature of decision making where both emotions and cognition take part, there is a body with target values which has a central role in the evaluation of situations. There are other robot emotion-based architectures which rely on homeostatic variables. An example is the robot architecture developed by (10) which learns emotionally grounded symbol-object associations. In this case, there are a few internal variables which trigger drives when they are out of their target values. Drives have pre-defined associated behaviors and the robot only learns about differently-colored objects, namely how they may change its internal variables. Similarly to the current work there are innate emotions which are derived from monitoring the internal variables and associated emotions. However, the innate emotions only monitor changes of the internal variables values into/from their target values, and the associated emotions are not associated with behavior-state pairs but with objects. Another example, is the Kismet robot (5) whose drives have an acceptable bounds of operation named the homeostatic regime. If the drive's value is below these bounds then it is in the overwhelmed regime, if above then it is in the under-stimulated regime. Drives, along with somatically-marked releasing mechanisms, influence the affective state by contributing to the valence and arousal measures. If the drive is in the homeostatic regime, then there is a positive contribution to the valence, otherwise the contribution is negative. The contribution to arousal decreases with the drive value. Only the currently active drive, i.e. the one whose pre-defined associated behavior has been selected, influences the emotion state. Arousal, valence and stance are the three dimensions of the affective state. Emotions are defined as points in these space and are expressed by Kismet's face in its interaction with a care-giver. This approach shares the use of homeostatic variables but has a very different model of emotions based on a three-dimensional continuous space instead of processes5 and the task of the agent is quite different. In other architectures (e.g., 3; 6; 28), homeostatic variables are monitored to produce drives and do not have any direct relation with emotions6. 5There are adepts of the two types of emotion models, although arguments against defining emotions in terms of a few continuous dimensions seem stronger (9). 6Note that specific domain-dependent dependencies may be hand-coded by the designer when defining the activation conditions of the emotions. All these architectures are quite different in that there is a hard-coded relationship between the drives and the produced behavior, while in the proposed architecture the agents learn how to satisfy their goals or each goals to satisfy at any one point by choosing among available behaviors. 7 Conclusions The current work proposes a new architecture for learning behavior coordination which is inspired by emotions. Its goal system, in particular, is based on homeostatic processes which bare similarities with foreground and background emotions. In fact, (8; 7) refers homeostasis as central to emotional processes. Furthermore, the associations made by the adaptive system are akin to somatic-markers suggested by (8). Both provide a long-term indication of the "goodness" of the several options available to the agent in a certain situation, based on previous experiences. Emotions as used in this architecture, provide a low-level processing of internal homeostatic state and relevant perceptions which is used both in evaluation of the situation (or more specifically, reinforcement to the learning system) and interruption of behavior. These are two processes which have also been strongly associated with emotions by other researchers. Another distinctive feature of the proposed architecture is that only changes in perception which are relevant in terms of the agent current internal state are brought to its attention. In this work, an engineering approach (31) is taken towards emotions. This means that there is not so much an emphasis on attempting to have a replica of human emotions, as there is on having a competent architecture. For this reason, this architecture was subject to rigorous experiments which thoroughly evaluate the different aspects of the proposed architecture and compared it with other alternatives. Although the experiments were done in simulation, the robot faced a demanding task which keeps the essential problems of a real-world environment. Experiments demonstrated the validity of the architecture, by showing that it is very competent in accomplishing the task it was designed for. Furthermore, this architecture clearly specifies how the learning process should be controlled, namely what the reinforcement should be and when the behaviors should be interrupted, once the domain-dependent goals of a task are identified. 8 Future Work In the architecture presented, the goal system must be tailored to the task at hand so that it reflects its aims, whereas the adaptive system is more flexible and may solve different tasks when associated with different goal systems. However, the goal system does not need to be totally hand-designed. One may envisage an adaptive goal system where subgoals are found or new perceptual cues for prediction of internal state changes are uncovered. This way the goal system would model some of the emotional associations animals and humans create around specific events or situations. This would be in-line with the theory that during learning stimuli are primarily associated with emotions which then drive the behavior associations (21). One of the most difficult problems was to determine the relative weights of importance of the different homeostatic variables. This suggests that the homeostatic variables may have to be associated with different adaptive systems to be combined in a later stage for final behavior selection. This way the information required to pursue each goal can be kept separate. Acknowledgement The first author is a post-doctoral fellow sponsored by the Portuguese Foundation for Science and Technology. This work was partially supported by the FCT Programa Op-eracional Sociedade de Informagäo (POSI) in the frame of QCA III. References [1] James S. Albus. The role of world modeling and value judgment in perception. In A. Meystel, J. Herath, and S. Gray, editors, Proceedings of the 5th IEEE International Symposium on Intelligent Control. Los Alami-tos, CA: IEEE Computer Society Press, 1990. [2] Minoru Asada. An agent and an environment: A view on body scheme. In Jun Tani and Minoru Asada, editors, Proceedings of the 1996IROS Workshop on Towards real autonomy, pages 19-24, Senri Life Science Center, Osaka, Japan, 1996. [3] Bruce Blumberg. Old Tricks, New Dogs: Ethology and in^erac^i^e creatures. PhD thesis, MIT, 1996. [4] Stevo Bozinovski. A self-learning system using secondary reinforcement. In R. Trappl, editor, Cybernetics and Systems, pages 397-402. Elsevier Science Publishers, North Holland, 1982. [5] Cynthia Breazeal. Robot in society: Friend or appliance? In Agents'99 workshop on emotion-based agent architect^ui^es, pages 18-26, Seattle, WA, 1999. [6] Dolores Canamero. Modeling motivations and emotions as a basis for intelligent behavior. In Proceedings of the First International Symposium on Autonomous Agents, AA '97, Marina del Rey, CA, February 1997. The ACM Press. [7] Antonio Damasio. The feeding of what happens. Har-cout Brace & Company, New York, 1999. [8] Antonio R. Damasio. Descartes' error — Emotion, reason and human brain. Picador, London, 1994. [9] Paul Ekman. An argument for basic emotions. Cog-ni^ion and Emotion, 6(3/4):169-200, 1992. [10] Masahiro Fujita, Rika Hasegawa, Gabriel Costa, Tsuyoshi Takagi, Jun Yokono, and Hideki Shimo-mura. Physically and emotionally grounded symbol acquisition for autonomous robots. In Lola Canmero, editor, AAAIFall Symposium on Emotional andln^el-lig;entII: The Wangled knot of social cognition, pages 43-48. Menlo Park, California: AAAI Press, 2001. Technical report FS-01-02. [11] Sandra Clara Gadanho. Reinforcement Learning in Autonomous Robots: An Empirical Investigation of the Role of Emotions. PhD thesis, University of Edinburgh, 1999. [12] Sandra Clara Gadanho and John Hallam. Emotion-triggered learning in autonomous robot control. Cybernetics and Systems — Special Issue: Grounding emotions in adaptive systems, 32(5):531-559, July 2001. [13] Sandra Clara Gadanho and John Hallam. Robot learning driven by emotions. Adaptive Behavior, 9(1), 2001. [14] Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237285, 1996. [15] Long-JiLin. Reinforcement learning for robots using neural i^etworks. PhD thesis, Carnegie Mellon University, 1993. Technical report CMU-CS-93-103. [16] Märcia Magäs, Paulo Couto, Carlos Pinto-Ferreira, Luis Custódio, and Rodrigo Ventura. Experiments with an emotion-based agent using the DARE architecture. In Proceedings of the AISB'01 Symposium on Emotion, Cognition and Affect^ive Computing, pages 105-112, University of York, U. K., March 2001. [17] SridharMahadevan and Jonathan Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial intellig;ence, 55:311365, 1992. [18] Yuval Marom and Gillian Hayes. Maintaining atten-tional capacity in a social robot. In R. Trappl, editor, Cybernetics and Systems 2000: Proceedings of the 15^ European Meeting on Cybernetics and Systems Research. Symposium on Autonomy Control — Lessons from the emotional, volume 1, pages 693698, Vienna, Austria, April 2000. [19] Maja J. Mataric. Reward functions for accelerated learning. In William W. Cohen and Haym Hirsh, editors, Machine Learning: Proceedings of the Eleventh International Conference, pages 181-189. San Francisco, CA: Morgan Kaufmann Publishers, 1994. [20] Olivier Michel. Khepera Simulator package version 2.0: Freeware mobile robot simulator written at the University of Nice Sophia-Antipolis, March 1996. Downloadable from the World Wide Web at http://diwww.epfl.ch/lami/team/michel/khep-sim/. [21] O. Hobart Mowrer. Learning theory and behavior. John Wiley & Sons, Inc., New York, 1960. [22] Miguel Rodriguez and Jean-Pierre Muller. Towards autonomous cognitive animats. In F. Morän, A. Moreno, J.J. Merelo, and P. Chacon, editors, Advances in artificial life — Proceedings of the Third European Conference on Artificial Life, Lecture Notes in Artificial Intelligence Volume 929, Berlin, Germany, 1995. Springer-Verlag. [23] Rui Sadio, Gongalo Tavares, Rodrigo Ventura, and Luis Custódio. An emotion-based agent architecture application with real robots. In Lola Canmero, editor, AAAI Fall Symposium on Emotional a^nd Intelligent II: The Wangled knot of social cognition, pages 117-122. Menlo Park, California: AAAI Press, 2001. Technical report FS-01-02. [24] H. A. Simon. Motivational and emotional controls of cognition. Psychological Review, 74:29-39, 1967. [25] Aaron Sloman and Monica Croucher. Why robots will have emotions. In IJCAI'81 — Proceedings of the Sevent^h International Joint Conference on Artificial Intelligence, pages 2369-71, 1981. Also available as Cognitive Science Research Paper 176, Sussex University. [26] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning. The MIT Press, 1998. [27] Silvan S. Tomkins. Affect theory. In Klaus R. Scherer and Paul Ekman, editors, Approaches t^o Emotion. Lawrence Erlbaum, London, 1984. [28] Juan D. Veläsquez. A computational framework for emotion-based control. In SAB'98 Workshop on Grounding Emotions in Adaptive Systems, pages 6267, Zurich, Switzerland, 1998. [29] Rodrigo Ventura and Carlos Pinto-Ferreira. Emotion-based agents: Three approaches to implementation (preliminary report). In Juan D. Vel squez, editor, Workshop on Emotion-Based Agent Architectures, Seattle, U. S. A., 1999. Workshop of the Third International Conference on Autonomous Agents. [30] C. Watkins. Learning from delayed rewards. PhD thesis, King's College, Cambridge, 1989. [31] Thomas Wehrle. Motivations behind modeling emotions agents: Whose emotions does your robot have? In SAB'98 Workshop on Grounding Emotions in Adap^i^e Systems, pages 71-76, Zurich, Switzerland, 1998. Multiple Emotion-Based Agents using an Extension of DARE Architecture Marcia Magäs, Luis Custódio Institute for Systems and Robotics, Instituto Superior Técnico, Lisbon, Portugal {marcia,lmmc}@isr.ist.utl.pt, http://www.isr.ist.utl.pt/ ~islab Keywords: agent architecture, emotions, society of agents Received: October 30, 2002 The role of emotions in human intelligence and social behaviours has been considered very important in the past years. The DARE architecture, an emotion-based agent archi tecture, aims at the modelling of this contri buti on for building autonomous agents. In this paper the results of i ts applicati on to a multiple agents environment are presented. Emotions are used at an individual decision level, through the modelling of the somatic marker hypothesis, and are also used on decisions that involve others, using the same hypothesis and adding the notion of sympathy. The representation of other agents external expression allows to predict their internal state. This process is based on the assumption that similar agents express their internal state in similar way, being a mean of implicit communication. Sympathy allows more informed individual decisions, specially when these depend on others. On the other hand it makes agents learn, not only based on their own experi ence, but also wi th others experi ence. Besides impli ci t communication, i t is also used explicit communication, through messages exchanging. In the symbolic layer, a new layer added to the DARE architecture, interactions between agents are represented and used to improve individual and social behaviours. 1 Introduction Recent research findings on the neurophysiology of human emotions suggest that human decision-making efficiency depends deeply on the emotions machinery. In particular, the neuroscientist Antonio Damasio (3) claims that alternative courses of action in a decision-making problem are (somatically) marked as good or bad, based on an emotional evaluation. Only the positive ones (a smaller set) are used for further reasoning and decision purposes. This constitutes the essence of the Damasio's somatic marker hypothesis. In another study about emotions, conducted by the neuroscientist Joseph Ledoux (8), it is recognized the existence of two levels in the sensorial processing, one quicker and urgent, and another slower but more informed. DARE^ architecture for emotion-based agents is essentially grounded on these theories about emotions neurological configuration and application. In previous work the application of this architecture focused on decision-making at the agent's individual level (15; 16; 9; 14; 11). In the somatic marker hypothesis, the link between emotions and decision-making is suggested as particulary strong for the personal and social aspects of human life. Other emotion theories, mainly in psychology, focus on the social aspects of emotion processes. The work presented here tries to explore these notions and the importance of emotional physical expression on social interactions, as well as the sympathy that may occur in those in- ^ DARE stands for Emotion-based Robotic Agent Development (in reverse order). This work has been developed under the framework of a research project funded by the Portuguese Foundation for Science and Technology (project PRAXIS/P/EEI/12184/98). teractions. In what concerns emotion expression, it has been claimed that there is not another human process with such a distinct mean of physical communication, and more interesting it is unintentional (7). Some theories point out that emotions are a form of basic communication and are important in social interaction (Rivera, Oatley and Johnson-Laird in (13)). Others propose that physical expression of emotion is the body preparation to act (4), where emotional response can be seen as a built-in action tendency aroused under pre-defined circumstances. This can also be a form of communicating to others what will be the next action. If the physical message is understood it may defuse emotions in others, establishing an interactive loop with or without actions in the middle (Dantzer cited in (2)). The AI research concerning multi-agent systems relies mainly on rational, social and communication theories. However, the role of emotions in this field has been considered important by an increased number of researchers (10; 12; 2; 1). Linked to expressing emotions is the notion of sympathy defined as the human capability to recognize others' emotions (5). This capability is acquired by having consciousness of our own emotions. Humans can use it to evaluate others' behaviours and predict their reactions, through a mental model learned by self-experience or by observation that relates physical expression with feelings and intentions. Sympathy provides an implicit communication mean, sometimes unintentional, that favours social interactions. In this paper the results of the application of the DARE architecture to a multi-agent environment are presented. In this architecture, emotions play a role on individual decision-making based on the somatic marker and the stimuli double processing hypotheses. These concepts are extended for decision-making involving other agents. Agents represent others' external expression in order to predict their internal state, assuming that similar agents express the internal state in the same way (a kind of implicit communication). Sympathy is grounded on this form of communication, allowing more informed individual decisions, specially when these depend on others. On the other hand, it allows the agent to learn, not only by its own experience, but also by the observation of others' experience. The architecture also allows the modelling of explicit communication through the incorporation of a new layer, the symbolic layer, where relations between agents are represented and processed. 2 DARE Architecture The DARE architecture was applied to an environment that simulates a simple market involving: producer agents, that own products all the time; supplier agents, that must fetch products from producers or other suppliers either for its own consumption or for selling to consumers; and consumer agents, that must acquire products from suppliers for its own consumption. Agents are free to move around the world, interact and communicate with others. Their main goal is to survive by eating the necessary products and, additionally, maximize money by selling products. Figure 1 shows a global view of the DARE architecture. Stimuli received from the environment are processed in parallel on three layers: perceptual, cognitive and symbolic. Several stimuli are received simultaneously, and they can be gathered from any type of sensor. In the case of the market experiment, agents receive both visual and auditive stimuli. Auditive stimuli are strings with messages exchanged between agents or broadcasted. Visual stimuli consists of all the objects (agents and respective products) inside the agent's vision angle. For each stimulus, three internal representations, which will be evaluated on the corresponding layer, resulting on the selection of an action to be executed by the agent, are created. Meanwhile, the world may change due to its dynamics or as a consequence of other agents actions. In the market application agents actions consist of movements, picking products, eating them, and exchanging messages. Every action executed has an effect on the agent's internal state, which when added to the next perceived stimuli is used to update memory and feature meanings. The perceptual analysis evaluates stimuli based on i) a pre-defined set of relevant features, ii) their meanings and iii) the current internal state. This analysis results on a fast action selection which will be executed if the global situation (stimuli and internal state) is considered urgent, in which case the upper layers will be inhibited. The action selected in the perceptual layer will also be executed when- ever the upper layers are not able to select an action due to lack of information. Cognitive and symbolic analysis use memory to evaluate stimuli and predict action effects based on similar actions executed or observed in the past. The internal state and its changes are crucial to the evaluation and anticipation processes on these layers. 2.1 Perceptual Layer 2.1.1 Feature Extraction and Built-in Information Figure 2 presents the perceptual layer in more detail. When stimuli are acquired by the agent's sensors a set (RF) of pre-defined, simple and relevant features are quickly extracted. This extraction provides a basic and simple internal representation of each stimulus, called the perceptual image (IP). The Ip assemblies the amount of each relevant feature found in the stimulus. Since several stimuli can be sensed at the same time the set of all perceptual images at instant t is defined as IPAll perceptual images are evaluated based on the agent's internal state at instant t and pre-defined associations between features and meanings. The agent's internal state, IS, is a vector with a predefined number of components specified for each agent. The contents of this vector varies as consequence of actions execution. The ideal contents of IS is pre-defined by the homeostatic vector HV. Both IS and HV are crucial for agent behaviour, since the main goal of an agent implemented with this architecture is to approach its internal state to the ideal one. Every evaluation takes into account the unbalance of the internal state, 6*, i.e., the difference between both vectors, 5*' = ^(A(isi, hvi), A(is2, hv2),A(isp, hvp)) where isi and hvi are components of IS and HV, respectively; A is a function that computes their difference; and ^ is a function that processes all the differences in order to qualify them. This function may be the processing of thresholds or an application-dependent function which analyse/process specific patterns in its arguments. The unbalance is reflected on the agent's external expression. At the market experiment, the agents internal state consists of a set of nutrients, IS* = [glycides*, proteins*, fatty*, sugar*] which are decremented in every movement action, proportionally to the power used in it, and are also changed whenever the agent eats a product. Different products mean different changes on the nutrients, some might be increased, others decreased, depending on colours present in the product image. Initially, the agents are in perfect balance (IS = HV). The unbalance is a vector with the difference between each nutrient current value and its ideal, A(isi, hvi) = isi — hvi Figure 1: Global view of DARE architecture. where the nutrient with maximum absolute difference defines the agent's current need. ^(A(isi, hvi),A{isp, hvp)) = {argmaxj A(isj, hvj) If A(isj, hvj) < emin < 0 argminj A(isj, hvj) If A(isj, hvj) > emax > 0 This internal unbalance is mapped into images that represent the external expression of the agent. 2.1.2 Perceptual Evaluation The perceptual evaluation tries to qualify the presence of relevant features in the Ip. This qualification is based on the mapping between the features and their pre-defined meanings, taking into account the current internal state. The result is a perceptual desirability vector, DVp, which represents a basic, simple and fast assessment of a stimulus. In the case of the market implementation, visual stimuli are bitmaps and the relevant features are some colours in agents and products images. The relevant colours for agent images are red, yellow and green, whereas for product images are dark red, dark green, dark yellow, dark magenta, dark gray, red, green, yellow and magenta. The extraction of relevant features is simply the counting of pixels for relevant colours in the bitmap, i.e. the perceptual image is the set of the number of pixels for each relevant colour. The meaning-feature association, represented by predefined weights Wnf, establishes the goodness or badness of each colour, where positive weights mean good colours and negative ones mean bad colours. In this implementation, all relevant colours are initially considered positive. So DVp is the result of processing Wnf and Ip components. DVp = ^2 Wnf I pf f where Wnf represents how good feature (colour) f is for nutrient n. After the perceptual evaluation of all stimuli detected at instant t, the stimulus found as being more desirable is selected as the incentive for action, and its Ip and DVp are used to select the action. 2.1.3 Action Selection and Evaluation There is a pre-defined set of simple actions that can be selected at the perceptual layer (e.g., approach, avoid, wander, pick, and eat). At this layer the action selection is based on reactive rules designed to cover urgent situations where the agent must survive. When the Ip and DVp of the incentive stimulus satisfy the pre-conditions of a action rule, this action will be selected. For instance, a very hungry agent near a product will immediately select the eat action; if it is not near enough it will select a movement action to approach it. After an action being executed, the action, all the Ip's and DVp's of the processed stimuli, and the action effect on the agent internal state are associated and stored in memory. In future similar circumstances, upper layers may anticipate action effects and decide accordingly. At the perceptual layer the action effects are evaluated in order to adjust the weights that represent the meaning of relevant features. If the internal state changes over a threshold after an action being executed, and this change Figure 2: DARE architecture - Perceptual Layer. means a strong approach or deviation to/from balance, the weights that lead to its selection are adapted in order to reflect this knowledge, If 3„, that l^ilS'^,> nn, then "nf "'nf + T If <6* If 6*+^ = 6* -T If 6*+^ >6* "l+f ' = nf "'nf - 1 + 1 If wlf > W^ff If wif < wn f The contents of DVp are influenced both by the stimuli and the internal state. If the internal state is very unbalanced the DVp must reflect that situation in terms of the urgency to handle it. So the urgency of a situation is defined by a threshold w.r.t. the current unbalance, \At(J•)\ >ap where j is the most unbalanced component (nutrient) of the internal state, determining urgency if its absolute difference where f is the predominant relevant feature of the incentive stimulus, nn is the threshold that defines a major change in the component n , and t is the adaptation value. Since this evaluation of effects is based on stimuli and current internal state, this adaptation is only temporary and gradually the weights will return to their initial values. So, at each instant t, to the ideal value is above a If the effect repeats often the adaptation ends up to be persistent. This process aims at giving some degree of adaptation and flexibility to the perceptual layer. In the market implementation, when an agent eats a product that instead of increasing an unbalanced nutrient decreases it, the weight associated to the predominant colour of the product is decreased. In following decisions, the DVp for this product will have less desirability than other products without this colour. The gradual return to initial values allows the agent to re-select this product later, because it could be desirable for a different nutrient. In order to determine the urgency of a situation, thresholds for the desirability vector components are defined. Whenever the DVp is detected as urgent (above or below the threshold) the upper layers are inhibited and the action selected at the perceptual level is executed immediately. p 2.2 Cognitive Layer The perceptual adaptation is limited to simple stimulus features and a rough evaluation of the action effects on the internal state. Figure 3 presents the cognitive layer. From the sensed stimuli all the possible features are extracted, defined in the set F, which satisfies the condition RF C F. This extraction is computationally more heavy than the perceptual one, but will supply more information to distinguish stimuli. It is not only a quantification of the feature presence on stimulus, but also a processing that results on the stimulus full characterization, allowing identification. The result of this extraction is the cognitive image, /c, of a stimulus. Since several stimuli can be sensed at the same time it is defined the set of all cognitive images at instant t as IP *. In the market experiment, where relevant features are some colours present in the bitmap, the cognitive image is instead the full bitmap. The purpose of this layer is to generate adequate individual behaviours2 through learning by experience. Cognitive evaluation and action selection is conditioned by the urgency found by the perceptual layer evaluation. Nevertheless, the IC* is always created and stored in mem- 2 Social behaviours will be the purpose of the symbolic layer. Figure 3: DARE architecture - Cognitive Layer. ory, associated with the corresponding items stored by the perceptual layer. 2.2.1 Cognitive Evaluation and Action Selection Mainly, the cognitive evaluation consists of a search in memory for situations similar to the current one. This process uses the current IP * to reduce search space, assuming that similar Ic's have similar Ip's. The structures in memory that have an incentive stimulus with an Ip similar to the one of the currently sensed stimulus are selected and the corresponding Ic in memory is compared with the current I*. If they are similar the structure in memory is selected for further search. Memory is usually structured as sequences of stimuli-actions associations that end whenever a significant change in the internal state is detected. Each element of a memory sequence must have i) the IpP and I* of the incentive stimulus; ii) the IP * and IC * sets; iii) the executed action, and iv) its effect on the internal state. Each sequence must also have the overall change on the internal state, in order to determine the desirability of that course of action. Once sequences with a stimulus similar to one of the current set are found, the associated internal state changes are applied to the current internal state and the sequence that anticipates a more balanced internal state is chosen to be executed. Consider SEC the set with all the matching sequences in memory; Fm an element of a sequence; am the action represented in Fm ; and A^^ the change in the internal state caused by am execution. The action selected at the cognitive layer, a*, is determined by yam e Fm e SEC, aC = argmin(5t + ) am For instance, if a consumer agent has eaten in the past a product that made one nutrient increase by a certain value, and now its internal state needs that nutrient, the agent will anticipate the increase and, if there is not other product with a better anticipation, will choose to approach and eat that product, executing the same action sequence retrieved from memory. If the prediction made by the cognitive evaluation reveals a degree of urgency, i.e., if it predicts, given an unbalanced internal state, \At(J )| >ac, a strong and positive change, \At(J)+AaC (j)\