Informatica 33 (2009) 285-296 285 Optimization of Actions in Activation Timed Influence Nets M. Faraz Rafi, Abbas K. Zaidi and Alexander H. Levis System Architectures Laboratory, ECE Dept., George Mason University, Fairfax, VA USA E-mail: {mrafil, szaidi2, alevis}@gmu.edu P. Papantoni-Kazakos EE Dept., University of Colorado Denver, Denver, CO USA E-mail: Titsa.Papantoni@cudenver.edu Keywords: influence net, activation timed influence net, Bayesian net Received: October 12, 2008 A sequential evolution of actions, in conjunction with the preconditions of their environment and their effects, are all depicted by Activation Timed Influence Nets. In this paper, we develop two algorithms for the optimal selections of such actions, given a set ofpreconditions. A special case for the two algorithms is also considered where the selection of actions is further constrained by the use of dependencies among them. The two algorithms are based on two different optimization criteria: one maximizes the probability of a given set of target effects, while the other maximizes the average worth of the effects' vector. Povzetek: Predstavljena sta dva algoritma za optimizacijo akcij v časovno odvisnih mrežah. 1 Introduction We consider the scenario1 where a sequence of actions needs to be initialized towards the materializing of some desirable effects. As depicted in Figure 1, each action is supported by a set of preconditions and gives rise to a set of effects; the latter become then the preconditions of the following action(s) which, in turn, gives rise to another set of effects. Such sequential evolution of actions is termed Activation Timed Influence Nets (ATINs), where the action performers may be humans. ATINs are an extension of an earlier formalism called Timed Influence Nets (TINs) [6-12, 20-27, 30, 31] that integrate the notions of time and uncertainty in a network model. The TINs are comprised of nodes that represent propositions (i.e., pre-and post-conditions of potential actions as well as assertions of events which may indirectly describe such actions), connected via causal links that represent relationships between the nodes, without any explicit representation of actions. TINs have been experimentally used in the area of Effects Based Operations (EBOs) for evaluating alternate courses of actions and their effectiveness to mission objectives in a variety of domains, e.g., war games [20-22, 25], and coalition peace operations [24, 27], to name a few. A number of analytical tools [6-12, 23, 24, 27, 30] have also been developed over the years for TIN models to help an analyst update conditions/assertions, represented as nodes in a TIN, to map a TIN model to a Time Sliced Bayesian Network for incorporating feedback evidence, to determine best set of pre-conditions for both timed and un-timed versions of Influence Nets, and to assess temporal aspects of the in- 1 This work was supported by the Air Force Office of Scientific Research (AFOSR) under Grants FA9550-05-1-0106 and FA9550-05-1-0388. fluences between nodes. A recent work [31] on TINs, underlying constructs and the computational algorithms, provides a comprehensive analytical underpinning of the modeling and analysis approach. c: preconditions a: actions e: effects Figure 1: Network Representation of an Activation Timed Influence Net (ATIN) In contrast to their predecessors (i.e., TINs), ATINs explicitly incorporate as nodes the mechanisms and/or actions that are responsible for changes in the state of a domain; other nodes represent preconditions and effects of actions. A set of preconditions may support a number of different actions, each of which may lead to the same effects, with different probabilities and different costs/awards, however. The objective is to select an optimal set of actions, where optimality is determined via a pre-selected performance criterion. In this paper, we present two algorithms which attain such an objective. We note that an effort to develop an action selection algorithm is also presented in [1]. The organization of the paper is as follows: In Section 2, we present the core formalization of the problem, 286 Informatica 33 (2009) 285-296 M. F. Rafi et al. including two different optimization criteria. In Section 3, we derive the two algorithms which address the latter criteria. In Section 4, we express the extensions of the two algorithms to the network propagation scenario. In Section 5, we include numerical evaluations while in Section 6, we draw some conclusions. 1.1 Related Work ATINs include action planning. In the domain of action planning, classical planners assume that the effects of an action are known with certainty and generate a set of actions that will achieve the desired goals [19]. Some planners do monitor for errors as actions are executed, but no action adaptations are incorporated [29]. Other planners assign probabilities to the effects of actions [2, 13, 14, 16, 28], but provide no mechanisms for reacting to changes in the environment. Reactive planners [5, 15, 17, 18] are designed to select and execute actions in response to the current state of the world, but, with a few exceptions [3], [4], they do not use probabilistic information to determine the likelihood of success of the actions. In [1], probabilistic information is used, in an effort to deal with environmental uncertainties, but no optimal action selection strategies are considered and/or proposed. The ATIN formalism in this paper is similar to an earlier work by Sugato Baghci et al [1] on planning under uncertainty. The similarity, however, stops with the graph representation of preconditions, actions and their effects. Similar parallels can also be drawn with other graph-based planning approaches, e.g. GraphPlan (http://www.cs.cmu.edu/~avrim/graphplan.html). The approach in this paper represents a new formalism and is based on well established statistical results. 2 Problem formalization - core In this section, we consider a modular core problem. We initially isolate a single action with its supporting preconditions and its resulting effects, as depicted in Fig. 2. Yj = [Yb...,Ym]' pj(x 1) qj(y 1) The status random vector of the effects, where Y; = 1, if effect e; is present and Yi = 0 if effect ei is absent. y m denotes binary vector value realizations of Ym. The probability of success for action aj, given that the value of the precondition status vector is x 1; P(success for action a} | x^) The probability that the value of the effects' status vector is y m, given that the action aj is taken; P(y1" | aj taken) q0(y "1) The probability that the value of the effects' status vector is y m, given that no action is taken; P(ym | no action taken) Uj(y m) The utility of the value y m of the effects' status vector, when action aj is taken. U0(y 1) The utility of the value y m of the effects' status vector, when no action is taken. We note that the utility function Uj(y ") measures the net worth of the effects' vector value ym when action aj is taken; thus, Uj(ym) is computed as the worth of y I1 minus the cost of deployment for action aj. Let us now assume mutually exclusive actions, which are supported by the same preconditions, to lead to the same set of effects (as shown in Fig. 3). Let (aj}1 qo (y m) (i) where then max P(ym | x1) = qj*(y m) pj*( x ?) by no action; if qo(y m) > max qj (y "") pj(x^) 1< j< k where then max P(y" | x"1) = qo( y m ) (2) If more than one action satisfy the maximum in (1), then one of these actions may be selected randomly. b. Given xf , given a set of actions {aj}1 Z q0(y m^m) (3) by no action; if Z q0(y 1)U0(ym) max pj (xn)Z qj(y 1) Uj(y") (4) 1< j< k m y1 Aj»(x1î ) in (3) is the award assigned to action aj*; it is also the worth assigned to the precondition vector value x n by the action aj*. If more than one action attain the maximum award Aj*( xn ) in (3), one of them is selected randomly. y y y > aij= y 288 Informatica 33 (2009) 285-296 M. F. Rafi et al. 4 Solutions of the network propagation problem In this section, we generalize the core problem solutions expressed in Theorem 1, Section 3, to the sequence of actions depicted by the ATIN in Fig. 1. Problem 1 (The Optimal Path Problem) In the ATIN in Fig. 1, we fix the preconditions vector value xn(1), at time 1, and the effects' vector value y m (N), at time N. We then search for the sequence of actions that maximizes the probability P(ym(N)| xn(1)). The solution to this problem follows a dynamic programming approach where x n(l) = y m (l -1); 2 < l < N, in our notation. The proof of the step evolution is included in the Appendix. Step 1 For each y m (1) = x n(2) value, find r(ym (1)) = max [qo(ym (1)), max^ (1)) jx? (1))] and the action index j*( y 1 (1)) that attains r(ym (1)). Step l The values r(y1^ (l -1)) = max P(yf(l -1)| x^(l)), for each ym1 (l -1) value, are in memory, as well as the actions that attain them. At step l, the values r(ym(l)) = max r(ym (l -1)) x 1 ym (l -1) 1 x max [qo(ym (l)), maxq^ (l)) p^ (l -1))] are maintained, as well as the sequence of actions leading to them. The complexity of this problem is polynomial with respect to the number of links. Assume that a given ATIN model has 'N' number of levels and each level has 'k' links, then the complexity is given as O (N x k). Problem 2 (The Average Utility Maximization) In the ATIN in Fig. 1, we fix the value of the precondition vector at time 1, denoted xn1 (1) . For each value y w (N) of the effects vector at time N, we assign worth functions U (yw(N)). For each action aj (l), at time l, we assign a deployment cost cj (l). The utility of the effects' vector value y w (N), when action aj (N) is taken, is then equal to Uj (y w (N)) = U (yw (N)) -c j (N) , while the utility of the same value, when no action is taken, equals U 0 (y w (N)) = U (y w (N)) . We are seeking the sequence of actions which lead to the maximization of the average utility. The evolving algorithm, from part (b) of Theorem 1, back propagates as follows. The proof is in the Appendix. Step 1 Compute the action awards (including that to no action), with notation of Figure 1, as follows: 0 < j < r; Aj(xl (N -1)) = pj(xl(N-1)) 2 qj(y ,T(N)) Uj (y w (N)) y™(N) with po(x1 (N-1)) = 1 Select j, (N-1))(xl (N-1))=maxAj (xl (N-1)) ; for each x1 (N -1) value. Take action a* I(N ^(N) for preconditions vector value x, (N - 1)and simultaneously assign worth Aj((xi (N-1))(xl (N - 1))to xl (N -1). That is, assign: U(x1 (N -1)) = Aj((x,(N-1))(xl (N -1)) (5) Step 2 Back propagate to the preconditions at N-2, as in Step 1, starting with the worth assignments in (5), and subsequent utilizations Uj (x, (N -1)) = max[A.*(x, (N-1)) (x1 (N -1)) - c j (N -1),0] Step n As in Steps 1 and 2 (for subsequent levels) the above described algorithm generates the optimal sequence of actions for given initial preconditions x 1n (1) . The optimal such preconditions can be also found via maximization of the utility Uj (xk(2)), with respect toxn(1). The complexity of this problem is also polynomial with respect to the number of links. Problems 3 a, 3b (Optimization with Constrained Actions) Problems 3a and 3b impose dependency constraints on the actions in the ATIN network. As explained in Section 2, an ADM defines the dependency of one action on every other one, where positive dependency is depicted by 1 and negative dependency is depicted by 0. The dependency constraints are taken into account, when, at a certain level, an optimal action is finalized. At any given level, only positively related actions are considered in the calculations. As described in Step 1 of Problem 1 (see Section 4), for the first level, r(y m1 (1)) is calculated the same way for constrained actions also. But for the rest of the levels, it is calculated in a different manner. Consider, r(ym(l)) = max r(ym (l -1)) x 1 y> -1) 1 x max [q0(ym (l)), maxq^ (l)) jy^ (l -1))] The parameter max r(yI" (l -1)) corresponds to an ac- ym (l -1) 1 OPTIMIZATION OF ACTIONS IN ACTIVATION TIMED... Informatica 33 (2009) 285-296 289 tion selected for execution in level l -1. Its dependent actions can be known from the ADM. In this way, those combinations of actions which are not allowed by the ADM are eliminated from the calculation of r(ym(l)), hence eliminating all links to and from the actions exhibiting negative dependencies. As a result of which it yields a network with lesser number of links and eases the determination of optimal sequence of actions. 5 Numerical evaluations In this section, we focus on numerical scenarios. We first state the experimental setup. We then, evaluate and discuss a specific experimental scenario. We only state the experimental setups for Problems 1 and 2, since those of Problems 3 a and 3b are straight forward modifications of the former. 5.1 Experimental Setups Experimental Setup for Problem 1 Assign the probabilities {qj(x1k (l))} and {jxk (l))} as in problem 2. Given these probabilities: a. Compute first: r(ym(1)) = max [q0(y1(1)),max ^(l»^^))] and the action j* (y1^ (1)) that attains r(ym (1)). b. For each l : 2 < l < N, maintain in memory the valA ues r(ym (l -1)) = max P (ym (l -1)| x?(l)), for each y1]1 (l -1) value, and the actions that attain them. Then, compute and maintain the values: r(ym(l)) = max ^y"1 (l -1)) x 1 yJV -1) 1 x max [q0(ym (l )), maxq^ (l )) p^ (l -1))] Also, maintain the actions that attain the values r(ym (l)) . Experimental Setup for Problem 2 Considering the network in Fig. 1, assign: c. a. b. d. Probabilities pj(x k (l)) = P(action j succeeds | x k (l) preconditions) at all levels, from 1 to N-1, A where po(x k (l )) = 1; V l Implementation/deployment costs cj (l) for all actions, at all levels 2 to N. Given the above assignments, a. Compute first, Aj(xl (N -1)) = pj(xl(N-1)) 2 qj(yw (N)) Uj(yw (N)) yW(N) where, po(x1 (N-1)) =1; Uj (yw (N)) = max [U (yW (N)) - cj(N), 0] Af(xi(N-1))^(N-1))=10<;<^AJ(x1(N-1)) ; for all x1 (N -1) values. Worth function U(yw(N))for all yw(N) values of the effects' status vector, at level N. Probabilities q/x k (l)) = P( x 1k (l) occurring | action j at step l - 1) at all levels, 2 to N, where q0(x k (l)) = P( x 1k (l) occurring | no action j at step l -1) at all levels, 2 to N, b. Take action aw , , ^ for each precondition vec- j*(x1 (N-1)) F tor value x1 (N -1). Assign worth A^ (N1))(x1l (N-1))toxl (N-1), as U(x1 (N -1)) = Aj*(x1 (N-D)(x1 (N -1)) Repeat steps (a) and (b) for level N-1 and back propagate to level N-2. Continue back propagation to level 1. 5.2 A Specific Experimental Scenario In this section, we illustrate the use of Activation Timed Influence Nets with the help of an example ATIN, and present the results of the algorithms included in this paper, when applied to this ATIN. The model used in this section was derived from a Timed Influence Net presented in Wagenhals et al., in 2001 [27] (which was developed with the help of a team of subject matter experts) to address the internal political instabilities in Indonesia in the context of East Timor. For purposes of results illustration, we have selected a part of this network, as shown in Fig. 4. Example ATIN: The model provides detailed information about the religious, ethnic, governmental and non-governmental organizations of Indonesia. In this section, the propositions and actions referred are given in italic text. According to the model, rebel militia formed by a minority group poses the main concern which has captured a large number of people under its secured territory. Amongst these people in the community, some are against the rebels and considered to be at risk, in case the negotiations with the local government didn't work. For this example, 290 Informatica 33 (2009) 285-296 M. F. Rafi et al. consider the initial conditions when the rebels are getting local support, the community is in unrest and the local administration is losing control. Based on the data provided, only one action can be executed from a possible set of actions at a given time i.e. either of the Indonesian press or provincial authority or the minister of interior would declare resolve to keep peace. Depending upon this selected action and the data provided for the effects, only a specific set of events can result. For instance, rebels may or may not start thinking that they are getting publicity, GOI (original anti-government of Indonesia) war may or may not expand, GOI chances of intervention and international attention may increase or decrease. Similarly, this specific set of events forms the set of possible pre-conditions for a later time. Depending upon which conditions actually become true, second action can be selected for execution from another set of actions, i.e. Security Council and General Assembly may or may not pass resolutions or UN may or may not declare resolve to keep peace. Depending upon this action and the data provided for the effects, coalition may or may not form, rebels may or may not contemplate talks, GOI support may increase or decrease or may not increase at all, or GOI may or may not allow coalition into territories. Ultimately, the coalition may authorize use of force which might compel rebels to negotiate and the humanitarian assistance (HA) may start preparing for the worst case. Depending upon which conditions meet, the coalition may declare resolve to keep peace or may declare war on rebels. This may affect the chances of military confrontation, rebels' popularity and chances of negotiated settlement which represents the final effects in the network. Table 1 lists some of the parameters (and their values) required by the network in Fig. 4. The parameters in the table are listed by their abbreviated labels also in addition to the phrases shown inside the network nodes in the figure. For the sake of brevity, we do not list all the values. Solutions to Problems: Solution to Problem 1 (Optimal Path Problem): Consider the example scenario described earlier, we need to identify an optimal path (i.e., the sequence of actions) resulting into the final effect when, military confrontation chances are reduced, while rebels start losing local support and negotiation chances start increasing. This set of effects (post-conditions) leads to the following output state in the ATIN model: - Reduction in the chances of military confrontation (i e. Yj2 = 0) - Decrease in local support and popularity for Rebels (i e. Yjs = 1) - Increase in chances of negotiated settlement (i e. Y14 = 1). The above defined conditions lead to a postcondition vector [0, 1, 1] T at level 4, i.e. y 12(4). After fixing the post-condition vector, we define the initial preconditions, when rebels have been getting local support, the community has been in unrest and the local administration has started losing control. This set of pre conditions given by x 13 (1) results into a vector value of [1, 1, 1] T, where - Xj = 1; represents the condition Rebels are getting Local Support - X2 = 1; represents the condition There is unrest in the Community - X3 = 1; represents the condition Local Administration is losing Local Control. We want to find out the sequence of actions which achieves the desired effects y1142 (4) given the initial preconditions xj3 (1). Technically, we want to identify the sequence of actions which maximizes the probability P(y 12 (4) | x3 (1)). Applying the optimal path algorithm (see Section 4) results that if the provincial authority and UN declare resolve to keep peace and coalition does not take any action, instead it declares resolve to keep peace, then the desired effects will be achieved which will result into less chances of military confrontation, reduction in local support for rebels and more chances of a negotiated settlement. Figure 4: Example ATIN. OPTIMIZATION OF ACTIONS IN ACTIVATION TIMED... Informatica 33 (2009) 285-296 291 Table 1: Parameter values in the Example ATIN 1 Yt Action aj q0(y Í) x; Pitá) P,(^)*q,(y I4) r(>1) Indonesian press declare resolve to keq> peace (al) 68.00% 6.45% n.i.if so.oo% 54.40% 63.64% Provincial Authority declares resolve to keep 74.00% S6.00% 63.64% p eace (a2) Minister of interior declares resolve to keq> p eace(a3) 56.00% 20.00% 11.20% Lewi 2 YÍ Action aj q ,(y i ) q/yp xl pj(x I) Pj(x47)* q ,(y i) Resolution is passed in Security council (a4) 14.00% 0.95% [Q,Q,Q,0]T 16.00% 2.24% 34.02% [1A1:1]T 91.00% 12.74% [1,1,1,1]T 1S.00% 2.52% UN declares resolve to keep p eace (a5) 42.00% [0;0S0;0]T 66.00% 27.72% [1A14]T Sl.00% 34.02% [u=u]T 13.41% 5.63% Resolution is passed in General Assembly (a6) 39.00% [Q,0,Q,0]t 43.05% 16.79% [1A1:1]T 48.20% 18.80% [U=1,1]T 24.87% 9.70% Lewi 3 y 11 Action aj qM1) qJfrV) x1; PJ q^l) J J 1S^(ym^m) j y'ï yl1 P(no action taken | no action succ) = 1; if Sq0(ym)U0(ym) >max Pj(xn1)Sqj(ym)Uj(ym) r(ym(N -1)) is selected. The above proves the general step in the network propagation of Problem 1. Proof of the Network Propagation - Problem 2 Using the notation in Section 4, Problem 2, and via the use of the Theorem of Total Probability and the Bayes Rule, we obtain: max SU(yW(N))P(yw(N)|xn(1)) = sequence of actions (N) = max SU(yw(N))Z P(yw(N), sequence of actions y j '(N) xi(N-1) x1(N-1)|x?(1)) = = max S U(yw(N)) S P(yw(N)| sequence of actions y1 (N) A x'j (N-1) |x{(N - 1))P(x1(N - 1)|xn(1)) = = max S p(X1(N - 1)|xn(1))= sequence of actions x1 (N-1) Proof of the Network Propagation - Problem 1 Using the notation in Section 4, Problem 1, and via the SU(yw(N)) P(y w(N) | xf (N -1)) = yw (N) Theorem of Total Probability and the Bayes Rule, we = max S P(x{(N - 1)|xn(1)): obtain: kj=x11CkN;- 2x;1N-1) 1 1 A A r(yT(N)) = max P(yT(N) | x1(1)) = x Aj-c*1cn-d) x[(N-1) sequence of actions = max . sp^tCN^CN-1)|xn(1))= sequence of actions m ym (N-1) = max , S P(ym(N)|ym(N -1)) x sequence of actions yf (N-1) xP(ym(N- 1)|xn(1))< max [ maxP(ym(N)| y'm (N-1) action The latter expression proves the back propagation property and the steps in the algorithm. ym x