Informatic a 2 2 (1998) 69-7 4 6 9 Patterns in a Hopfield Linear Associator as Autocorrelatory Simultaneous Byzantine Agreement Paule Ecimovic University of Ljubljana, Department of Philosophy, Aškerčeva 2, Ljubljana, Slovenia Phone: +386 61 176 9200 int. 386 E-mail: ecimovicSctkl j .ctk.s i Keywords: PDP, neural networks, Byzantine Agreement, patterns, attractors, Interactive consistency, convergence, fault tolerance, k-resilience, distributed protocols, two-phase commit, Ising spin quan­tum computers Edited by: Rudi Murn Received: April 16, 1997 Revised: December 8, 1997 Accepted: February 23, 1998 In the first part, the Byzantine Agreement problem in paraUel distributed processing is formulated for generaJized, complete}y-interconnected networks of interacting processors. An overview of the main cases of this problem are presented in brief. Among standard optimal algorithms for reaching Simultaneous Byzantine Agreement, only the two-phase commit protocol is set out in any detail. In the second part, the process of pattern formation in Hopfield linear associators, realized as single-layer neural net\vorks with Hebbian \veight adjustment rules, is discussed. The main result of the paper is then presented, according to \vhich pattern formation in Hopfield linear associators is a solution to a form of Simultaneous Byzantine Agreement. In conclusion, it is argued that such associative memory solutions to interactive consistency problems in generalized transaction processing systems may finally prove viable, despite decades of neglect due to inavailability or prohibitive expense of sufhcient processing po\ver for their large-scale implementation. Introduction The transaction processing problem^ of achievingand/or maintaining interactive consistency, as typi­fied by the Byzantine Agreement (BA) scheme (Peaseet al. 1980), (Lamport et al. 1982), formulatedbelow, is exemplary among workable approaches tothe design and implementation of fault-tolerant dis­tributed protocols in parallel distributed processing (PDP) (transaction) systems. Wide-spread commer­cial applications, especially in banking, depend crit­ically on guaranteeing a minimally-sufhcient reliabil­ity of correct execution of distributed protocols in thepresence of faults, be they faulty connections or faulty (or ill-synchronized) processors, or combinations ofboth (Wang 1995: p.420). In particular, achieving in­teractive consistency among inter-connected and dis­tributed processors in the presence of faults is an im­portant problem to which standard round-optimized solution s exist , suc h a s th e Dwork-Mose S protoco l o rth e tW0-phas e commi t protocol . Th e tim e migh t wel lhav e COme, however , t o (re-)conside r th e advantage s r • ,- J. i. • i •L J 1 TT J2 1 11 1­of associative strategies typined by Hopheld s linear° •' ^ .1 r­associato r (LA ) wit h a Hebbia n associatio n rule , mos t ^The author is currently funded by the Slovene Science Foun­dation, while pursuing a master's degree in logic at the Collegeof Philosophy, at the University of Ljubljana, where he is alsoengaged in research. commonly realized as a fully-connected, single-layer neural network, where the connection strengths are determined by a form of Hebb's learning rule.^ Before expanding on the close analogy between BA and as­ sociative pattern formation in the Hopfield LA, let us consider the BA scheme, its most prevalent instances, and some round-optimized protocols for reaching var­ ious instances of BA. 2 Byzantine Agreement (BA) 2.1 Fundamental Definitions ^h e fundamental mgredients of problem of reaching Byzantine agreement are the following. We take a pro­'"''^°' *° ^^ ^"^ ''l*^^^^ Turing machine capable of '^^''^'''S °^* ^"'"^ ^^^^^^^^^ elementary instruction-set. Theoretically, this can be construed as a universal Tur­ 2^ 6 have deliberately avoided establishing an "understood" transition to neural networks per se in order to facilitate realiza­ tions of the Hopfield LA scheme in otherphysicalsystems,which might not behave "neurally" at ali, in anv currently-simulated " ,,, <• i n, * ^v,' • u <.u i • f J sense. We teel that this gives both neural inrormation process­ ing researchers an opportunity to reappraise the (in)adequacy of their neuron, models from an Information processing point of view as well as the neuro-biological community a breather from computational neuron-modelling strategies, vvhich have recently been shown to miss a vast arrayof interaction detail. See: (Koch 1997) for hints. 70 Informatic a 2 2 (1998) 69-7 4 ing machine. Practically, this can be anything as sim­ple as an "intelligent" bistable switch enhanced with some automated processing logic, e.g., a McCullough and Pitts binary neuron, or something as complex as a human operator sitting at a computer terminal con­nected to a local area network. Some interconnected network of such processors is required across which to distribute and on which to execute a protocol thus dis­tributed. By distributed protocol, we are to understand any non-contradictory set of instructions P which can be divided into n parts P^ such that ^ = U{^^} each part of which is assigned to a given processor in the course of a task designation phase of (pre­)processing of the protocol. Atop this layer, we must have a meta-layer for regulating and monitoring inter-processor communication in such a vvay as to achieve Interactive consistency, which means in this context that none of the processors contradicts any action(s) of other processors on their data or on its own data or of itself on its own data. Figuratively speaking, if one were to set a team of people to sweep a room, there would be a supervisor to ensure that no member of the team throws dust on an area just swept by another member or by the member him- or her- self, thereby guarding against "stupid mistakes" (Novak,1993:p.27). We also assume a global clock (generator of regular events) relative to which ali inter-processor communi­cation in the PD P network is synchronized and mea­sured. A round of computation is defined as the time interval (number of regular events) generated by the global clock required for ali the parts Pj of protocol P to be executed by the corresponding processors pi once. ^ Informally, a run is the entire state transition history of aH the states in \vhich ali the processors were in the course of aH rounds of computation of a given protocol P from some arbitrary starting tick of the global clock to some other arbitrary tick, i.e., in some time interval as measure on the global clock.'* For practical reasons, we often normalize the clock ticks to coincide with rounds of computation of a given protocol, making the obvious execution uniformity as­sumptions. By crash failure is meant the state of a PD P system in which execution of a protocol'from a certain round onward generates results incompatible with any admissible run of the given protocol. ^ The class of parallel-distributed processing (PDP) systems ^Henceforth, processors will be denoted by lower čase sub­scripted p's, whereas the part of a given protocol P aissigned to each by upper čase subscripted P's . ''Formally, this can be defined in terms of the modal logic Ss, where it is possible to define the set of ali possible state transitions and quantify over them with possibility and necessity operators. (Halpern 1986) ® Crash failure could be defined more "severely" and abso­lutely as a state of a PD P system after which the system ceases P. Ecimovi c to be considered for the rest of the paper is defined by ali PDP systems, which satisfy the follovving five conditions (Wang 1995: p.420): For natural numbers k and n, such that n > k + 2: PDP l There are n processors, at most k of which are faulty with respect to a given semantics, whether functional, operational, or declarative (elaborated below), without incurring crash-failure of the sys­tem in executing a given distributed protocol P PDP2 the processors can communicate directly with each other through message exchange in a fully-connected network (FCN) PDP3 the message sender is always identifiable by the receiver PDP4 an arbitrary set of processors are chosen as sources and their initial value v^ is broadcasted to other processors and to themselves at the start of parallely-distributed execution of P PDP5 The only faulty components considered are processors. PDP4 is a significant generalization from the PD P specifications presented in (Wang,1995: p.20), since it allows for more than one processor to carry the initial state which is to be distributed to and agreed upon by the entire PDP network. This is critical for our LA realization, because it admits the čase of an "initial configuration", see the next section, being distributed initially across the entire network. PDPl , mutatis mu­tandis, can be taken as a general definition of what it means for some protocol P to be k-resilient. In other words, a protocol P is k-resilient if, and only if, for k3k-|-l. This formulation leaves malicious units anonymous (unidentified) in the sense of (Novak 1993: p.22). For n= 4 and k=l , two "phases" (rounds of computation) sufRce to reach BA. In our notation, the situation is as follows, denoting messages sent by processor p, by mj. See (Novak 1993) for a diagram Informatica 22 (1998) 69-74 71 and re-label accordingly. Here the specification of P and its partitioning is unimportant, since it applies to most" ali parallelly-distributable protocols, with some solvability and other application-specific restrictions. Processors pi, P2, and ps send their messages, mi , m2,and ms to the other three processors respectively, whereas processor p4 sends m4 only to p2 and ps, but not necessarily to pi. Instead, p4 sends pi some mes­sage X. Assuming that for ie{l,2,3}, pi is fault-free and that p4 is faulty in the above sense as well as being "maliciously" faulty, i.e., sending some message X, and claiming it sent the expected message, m4, with the claim not being necessarily true (Novak 1993). Thus, the algorithm for reaching BA in this čase proceeds in the following two phases: TPS l Each processor pj sends its message to ali the other processors TPS2 Verification of message correctness. Thus BA is reached, according to the above specifica­tions, and pi can claim to have received message X from p4 as a result of the TPS2 phase, vvhereas p4, being "malicious" can claim that it sent m4. Thus we have reached "Byzantine" agreement, agreeing, within reason, to disagree, while stili getting the job done. Comparison TPS l and TPS2 with the two-phase commit protocol for distributed databases (McFad­den 1994: p.481) reveals a similar pattern, with some modification relating to the application domain, etc. Suffice it for our purposes to interpret "site" as "pro­cessor" and "commit" as "agree on value a common v as per BAl" For explication of the broadcast variant of BA, see (Wang,1995). TPC l A message is broadcast to every participating site, asking whether that site is willing to commit to its portion of the transaction at that site. Each site returns an "OK" or "not OK" message. TPC2 The originating site coUects messages from ali sites. If ali are "OK," it broadcasts a message to aH sites to commit the transaction (come to agreement). If one or more responses are "not OK," it broadcasts a message to ali sites to abort the transaction. Obviously, this is a "no non-sense" (fault intoler­ant) specification which could be generalized to ad­mit one or more failures to respond "OK" in which čase it would amount to a direct solution to BA. However, non-sense at the level of automated teller machines savings/checking account transactions and corresponding account balance updates is not taken lightly by the customer on the street who is likely to complain angrily at the slightest "irregularity", let alone fault. At this level, fault-tolerance and k-resilience are not to affect the end user, but rather only the processing system and its administrators. 72 Informatica 22 (1998) 69-74 3 Hopfield's Linear Associator (LA) and its Convergence Propertie s In this section, we define Hopfiekrs linear associator (LA) in terms of discrete-time dynamical systems (ef­fectively, finite state automata) and vveighted graphs, thus avoiding this systems usual interpretation as a neural network. The reason for this will become ap­parent in the final two sections of this paper. 3.1 Specification of th e Hopfield Linesir Associato r The topology and architecture of Hopfield's LA (HLA) are specified as follows (Bruck 1990:p.247): HLAl A Hopfield LA is a discrete-time dynamical system represented by a vveighted graph HLA2 Each edge of the graph is assigned a weight and each node a threshold value. HLA3 The weighted graph is fully connected. The order of the HLA is defined to be the number of no'des in the corresponding graph. Thus, if N is an HLA of order n, written henceforth NeHLA(n), then, according to HLAl-3, the ordered pair (W,T) determines the topology and architecture N uniquely to vvithin a renaming, where W and T are defined as follovvs. For natural number n: — W is an n X n matrix, the elements Wjj of which represent the weight assigned to the edge joining the i-th and j-t h nodes. - T is a vector of dimension n, the components tj of which denote the threshold assigned to the i-th node. The processing element represented by any node can be in one of two possible states, either -1 or -1-1, and the state is denoted by Si{t), where t is a natural number to be interpreted as a certain discrete number of ticks of a clock analogous to the global clock defined above. The n-dimensional vector s{t) of ali elements s,(t) rep­resents the state of N(n). The vector s is called the state vector of the HLA N(n). Thus, the order of N is determined by the dimensionality of its state vector. Purthermore, the state vector represents the states of aH individual processing elements ("processors", for short) represented by the nodes of the graph. Thus, the state vector s{t) is defined by the equation S{t) : = (Si,S2,---,Sn)-(1) Ali the set of ali possible state vectors of a suitably-chosen set of individual processor states Sj, where i P. Ecimovic varies from 1 to n by ones and n is some finite natural number, forms a state space. The evolution equation of the dynamical system, i.e., the equations governing the transition of the system from one state s{to) to the next state s{to + 1) are given componentwise as follows: Si{to + 1) = sgn{Hi(to)), (2) where the function sgn{x) is the sign of the number x defined as -t-1, if a; > O and -1 , othervvise, and Hiito) = ^Wj,iSj{to) - tj . (3) Accordingly, each processor at a given node is a linear threshold element, which adds its input signals from ali other elements, at ali other nodes in the system. Thus, each linear threshold element acts here as an "adder". To complete the specification of HLA, we must state a Hebbian weight adjustment rule and an energy function as follows (Hopfield 1982: p.2556). AWj.i = [Si{t)Sj{t)]average, (4) where the average is calculated over past history, and (5) and the change in energy AE due to a change in state AE due to change in the state of an individual pro­cessor Asi is given by AE = —Asi 2_] Wj^iSj. (6) Apart from particular initial conditions, which depend on the specific problem instance in question, this com­pletes our specification of HLA. 3.2 Convergence Propertie s of th e HL A The feature property of the HLA for the purposes of reaching BA and, more specifcally, SBA, is that since its state space is finite, the HLA dynamical system \vill always coverge to either a stable state/cycles in state space (Bruck 1990). A specific value of the state vector s{t) at a given time t is called a configuration. Stable states (or for a given t, configurations) of the HLA are called patterns (or pattem configurations, re­spectively). Thus, the proces s of n processors reaching SBA reduces to the convergence of the HLA to a pat­tern or pattem configuration. ASSOCIATIV E PATTERN S A S SB A 4 Reaching Simultaneous Byzantine Agreement with Hopfield's L A In this section, we state the main result of this paper, which in the terminology introduced above, may be stated as SBA can be represented by a pattern in an HLA. This impHes that the process of reaching SBA can be represented by convergence of an HLA to a stable configuration, i.e., to a pattern. Now, we must just State some facts about the stability of states and configurations in order to indicate corresponding conditions for achieving SBA as well as fiush out a key property of HLA's which corresponds to k-resilience and realizes fault tolerance. 4.1 Stabilit y of Configurations A necessary and sufRcient condition for a state of an HLA to be stable is the following (Bruck 1990: p.247) (in vectorial form): s{t) is stable <^ s{t) -sgn{Ws{t) - T) , (7) where W is an n x n matrix 'of weights and T is an n-dimensional vector of node thresholds. This is crit­ical in determining patterns, since for a given tirne t, the state s{t) becomes a configuration, which, if it is stable, is a pattern. 4.2 k-resilience an d Fault Tolerance in HL A As is evident from the HLA specification given above, the next state at time io + 1 of an HLA is computed componentwise from its current state at some time to applying equation (2) to each component Si{t) of s{t). This yields the state of the HLA at time ^o + I, i.e., s{to + 1). One of the most prominent features of the HLA model which make it well suited to the im­plementation of fault-tolerant distributed protocols is that s{to + 1) can be computed from a proper suhsef of ali processors in the system (Bruck 1990). This is analogous to k-resilient protocols where, by definition (Kilmer 1995), at most k processors could fail, yet the protocol continues to execute and terminates correctly. Specifically, if S is the set of indices of processors from whose States the state of N{n) is computed, then cardinality{S) < n (8) and cardinality{complement{S)) = k, (9) Informatic a 2 2 (1998) 69-7 4 7 3 vvhere n and k have the same meanings as in the PDP and SBA specifications in the previous sections.^ 5 Conclusion In this paper, we have demonstrated that Simulta­neous Byzantine Agreement can be represented by a Hopfield hnear associator. We have done so without interpreting Hopfield's system as a neural network, but rather as a discrete dynamical system. This allows for implementations in other physical systems, which have possibly greater coraputational power (as mea­sured by the classes of problems which can be solved by means of such systems) than the current coraputa­tional models and systems used to implement artificial neural networks. This will be the topic of a forthcom­ming paper. Acknowledgenients The author is delighted to acknowlege the many eye-opening discussions he has had in the past few years on the subject of PDP and Byzantine Agreement. Spe­cial thanks go to Dieter Gawlick of Oracle (California), who shared with me his enthusiasm for new ideas and Creative solutions to computer science problems. Bob Jackson of IBM Santa Teresa Laboratory (California) for drawing my attention to two-phase commit in this context, to my mentor of many years, Prof. Dr. An­drej Ule, for introducing me to Byzantine Agreement in the first plače, to Mitja Peruš, who drove the point home about the interrelation between neural networks and quantum systems, to Dr. Joseph Halpern of IBM's Almaden Research Center (California), who, in an in­spiring telephone conversation when I needed it most, made me aware of uncharted territory in the Byzantine Agreement enterprise, to Dr. Franc Novak, of IJS, for a stimulating discussion on identifying faulty processors in Byzantine environments, to Damjan Bojadžiev, of Institute Josef Štefan, in Ljubljana, who read through and corrected an earlier version of this paper, and to one of editors. Prof. Dr. Matjaž Gams, of IJS, for his advice on formatting and style as well as encouraging me to submit a paper in the first plače. Most of ali, I would like to thank my father, who introduced me to computers many years ago, when they were not at ali commonplace household items and, over the years, gave me an osmotic course in Information processing. To these and many others, the author extends warmest gratitude. ^If set A is a subset of set B, then every element of A is an ^The cardinality of a set is the number of elements contained element of B. A set C is a proper subset of set B if B contains in the set, and the complement of a given set is th e set of ali at least one element which is not contained in set C. those elements which are not contained in the given set. 7 4 Informatica 22 (1998) 69-74 P. Ecimovic References [1] Bruck J. (1990) On the Covergence Properties of the Hopfield Model Proceedings IEEE, Vol. 78, No. 10, Oct., p. 1579-1585. [2] Chakrabarti B.K., Dutta A., & Sen P. (1996) Quantum Ising Phases and Transitions in Trans­ verse Ising Models, Berlin: Springer-Verlag. [3] Halpern J. (ed.) (1986) Theoretical Aspects of Rea­soning about Knoinledge, Proceedings of the 1986 Conference, Los Altos: Morgan Kaufmann. [4] Kilmer W. L. (1995) Self-Repairing Processor Modules IEEE TRANSACTIONS ON RELIABIL­ITY, Vol. 44, No. 2, June [5] Koch C. (1997) Computation and the single neu­ron, Nature, V ol. 35, 16 January, p. 207-210. [6] Lamport L., et al (1982) The Byzantine Generals Problem ACM Trans. Programing Languages, Syst., Vol.4, No.3, pp. 382-401, July. [7] McFadden F.R. & Hoffer, J.A. (1994) Modem Datahase Management, 4th ed., Redwood City: The Benjami/Cummings Pubhshing Company, Inc. [8] Novak F. & Klavžar S. (1993) An Algorithm for Identification of Maliciously Faulty Units, Interna­tional Journal of Computer Mathematics, Vol. 48, p.21-29. [9] Pease M., Shostak R., and Lamport L. (1980) Reaching agreement in the presence of faults JACM, Vol.27, No.2, pp.228-234, April. [10] Wang S.C, Chin Y.H., & Yan K.Q. (1995) IEEE TVansactions on Parallel and Distributed Systems, Vol.6, No.4, April, p.420-427.