DATA FLOW BASED PARALLEL INFERENCE MACHINE INFORMATICA 4/87 UDK 681.3.001:519.6 Jurij Silc and Borut Robic Jozef Stefan Institute, Ljubljana Abstraot. Execution models of the data flow based parallel inference machine for OR-parallel and AND-parallel Prolog and the experimental machine architecture are presented. It is shown that two types of logic programming languages with different aims can be supported on this machine. The programs are compiled into data flow program graphs corresponding to machine language codes. Thus, parallelism in the program can be exploited naturally. The machine is constructed from processing elements and structure memories interconnected through a hierarchical network. The processing elements interpret the procedu- res represented by the data flow program graphs in parallel. Structu- red data is distributed to structure memories and shared among these procedures. Keywords. Fifth generation computer systems, parallel inference mac- hine, OR-parallel Prolog, AND-parallel Prolog, data flow mechanism, machine architecture. 1 Introduction The Fifth Generation Computer Systems (F6CS) research and development aim is to build a prototype of knowledge information processing system capable of efficiently performing know- ledge-based problem solving and inference. To- ward this end, a ten-year period has been assigned to the FGCS Project, and this period has been further devided into three stages. The goal of the initial three-year stage is to conduct basic research on individual system components in order to establish basic configu- ration technology for subsystems which are to be realized in the intermediate four-year sta- ge. Fig. 1 shows what has become known as the "basic configuration image" of the fifth gene- ration computer C13. Looked at vertically, it has a hardware layer, a software layer, and an external interface to applications systems, as might be expected. Looked at horizontally, it becomes clear that each aspect of the functio- nality of a fifth generation computer - problem solving and inference, knowledge base manage- ment and intelligent interfacing - requires its own hardware and software support mechanises. The parallel inference machine and know- ledge base machine are the most important hardware components of the F6CS. In the FGCS prototype to be completed as the final product of the project, the two machines will be integrated through a close link. In the initi- al stage, however, research and development are proceeding separately for each machine with research themes separately determined, since the initial stage mainly aims to conduct rese- arch and development of individual component technologies to establish the basic technology for the hardware, called the inference subsy- stem and knowledge base subsystem to be build in the intermediate stage C83. 2 Knowledge base and inf«r*no» subsystems Development of the F6CS hardware and arc- hitecture will include the implementation of mechanisms for processing and controlling a knowledge base and efficient execution of pro- blem-solving and inference techniques with the- se mechanisms. The. system will depend on multiprocessing and parallel processing techni- ques for which two objectives are critical: (1) Provide machines with the power to handle the natural parallelism found in pro- blems tackled by humans. The structure of a problem and the neccessary processing for sol- ving it can be shown by rules controlled by an inference mechanism. Thus a major goal of the FGCS project is to devise an execution model for the inference mechanism and to determine a way to configure it. (2) Achieve high-speed parallel proces- sing capable of supporting intelligent human activities. For this requirement, the princi- pal research must be concentrated on knowledge base processing algorithm to handle a large number of facts as well as a mechanism suppor- ting the algorithm. The inference subsystems, together with the knowledge base subsystem, forms the kernel of the FGCS hardware C83. The ultimate aim of the FGCS research and development project is a machine enabling the execution of parallel inferences C3,7,93. In the following we shall describe some FGCS project "data flow directed" efforts in designing such a machine. 28 High Level Enquiry Language Core Language Natural Language Speech Picture Knowledge Base Managnent Systen Problen Solving & Inference Systen Intelligent Interface Systen Knowledge Base Managnent Systen Problem Solving 8. Inference Systen Intelligent Interface Systen Knowledge Base Machine Problen Solving & Inference Machine 1> External interface of the Basic Software Systen Basic Software Systen i Intelligent Interface Machine T> Hardware Systen VLSI Architecture Fig. 1 The overall structure of a fifth generation computer. 3 Parallel infcrane* subsystem Machine language of parallel inference machine In FGCS project, logic programming was selected as a bridge to fill the gap between a highly parallel computer architecture and know- ledge information processing. Several logic programming languages, named kernel languages, which define an abstract interface between the hardware and the software, are beeing develo- ped. The kernal language KLO is the machine language of sequential inference machine tram which a parallel version KL1 is beeing develo- ped. KL1 is the the machine language of parallel inference machine C23 including two types of basic languages: AND-parallel Prolog and OR-parallel Prolog CA3. In the execution of logic programs, a high degree of parallelism can be implemented with use of AND-parallel and OR-parallel executions. When OR-parallelism is applied alternative clo- uses of the same goal are executed in parallel. The alternative clauses have identical initial states and do not interfere with each other, except far possible concurrent initialization attempts of a goal variable by multiple clau- ses. On the other hand, in AND-parallei ism the conjunctive goals of a clause body are executed in parallel. In general, the goals nay share variables and thus interfere with each other when the shared variables are accessed concu- rrently. AND-parallelism in logic programming involves the simultaneuos execution of subgoals in a clause. Whereas OR-parallei ism attempts to achieve increased speed by investigating many possible solutions in parallel, AND-paral- lel ism attempts to achieve increased speed by investigating the subparts of a particular solution in parallel. In conventional sequential Prolog the se- arch and test operations (called unifications) are executed one by one, but parallel search and test operations can be implemented through parallel machine architecture to obtain a high- speed machine. A few sources of parallelism which can be distinguished for parallel execu- tion of Prolog are AND parallelism, and OR parallelism. <1) OR-parallel Prolog. When a goal li- teral G is given, the definition of 6 is invoked. A clause C is then selected fro« the definition, and unification of G and the head H of the clause C is attempted. Generally, when multiple clauses C1, C2, ... Cn exist in the definition, unification of G and each H1, H2, ... Hn can be executed in parallel. A unit clause Ci that is successfully unified with G returns the solution(s). A nonunit clause Cj initiates the next unification, treating its body as a new goal statement, and waits for the solutions. The resulting solutions of the goal G are merged into streams by stream merging primitives (in the order in which they are obtained) and then returned to the goal. Thus, OR-parallel Prolog is suitable for the clas of "search-for-all-solutions" problems. An example of OR-parallel Prolog is Paral- lel Prolog C43. 29. (2) AND-parallel Prolog. When a goal is expressed as 61 AND 62 AND ... AND 6B, ANO-parallei ism can be used to search for conditions for all literals Gi in parallel. The goal statement is satisfied only when solutions are found for all the literals Gi and there is no inconsistency between these soluti- ons. The consistency checking is easy or even unnecessary in cases when the goal literals 6i have no unbound variables shared among AND processes, or where shared variables are bound to the ground instances before invocations of these literals. Several languages have been proposed to realize AND-parallei ism. They include PARLOG, Concurrent Prolog, and 6HC (Guarded Horn Clau- ses) [4,53. Mechanisms of parallel inference Various mechanisms of parallel inference and architectures based on those mechanisms are beeing studied: data flow mechanism, reduction mechanism, complete-copying mechanism and, cla- use unit processing mechanism. In what follows we shall briefly describe these four mechanisms C81: (1) Data flow mechanism. In the data flow concept, execution starts when data neces- sary far the execution arrived. This concept can result in parallelism regardless of whether it is explicitly indicated in the program. This mechanism executes kernel language pro- grams in parallel based on the data flow concept. (2) Reduction mechanism. When executed, an OR-parallel and AND-parallel Prolog program generates resolvents from a goal and clause. This can be regarded as a process in which a goal modifies itself using a clause as a rule. The reduction mechanism can also be viewed as a kind of self-modification. Thus, there is a close similarity between the execution of OR- parallel and AND-parallel Prolog programs and the reduction mechanism. Accordingly, the re- duction mechanism was selected for a machine architecture that executes OR-parallel and AND- parallel Prolog programs. (3) Complete-copying mechanism. Comple- te-copying is type of reduction mechanism. Even if a process includes several literals (subgoals) and only one literal (subgoal) is reducible, the whole process is copied and transferred to a unit that executes the unifi- cation process. This increases the number of copies and the length of a packet in the network, while enhancing the independence of each process. (A) Clause unit processing mechanism. In response to a request from an idle processing unit, a busy processing unit sends a process. Thus, this mechanism can avoid an explosion of resource requests. However, it takes time for all processing units to become busy. 4 Data flow based inference •acttlmi The parallel inference machine based on data flow mechanism (PII1-D) is naturally well suited to parallel procesing becouse the data flow mechanism is closely related to functional languages. Data flow computation ... Programs in the data flow.model are repre- sented by data flow graphs, nodes correspond to operators and directed arcs correspond to data paths along which operands are sent. An opera- tor is driven by operand arrivals from its input arcs, and it outputs the result operands to its output arcs without affecting the other operators' execution. This functionality of operators has close similarity to the functio- nal languages. ... and loQic programming Execution of logic programs is performed in a goal-driven manner: a clause in the programs is initiated when a goal is given and returns the solutions to the goal. Logic programming languages make use of the unifica- tion operation, which is one of their basic functions. Nondeterrainism is another basic feature of these languages; in particular, "don't-knaw nondeterminism" is required for OR-parallel Prolog, while "don't-care nondeter- minism" is required for AND-parallel Prolog. The data flow model is also similar to logic programming languages such as OR-parallel and AND-parallel Prolog. The pograms written in OR-parallel or AND-parallel Prolog are compiled into data flow graphs. Implementation of 6HC GHC was selected as a basic language of KL1 becouse it has clearer semantics and provi- des more efficient implementation than Concu- rrent Prolog, and it has more powerful descrip- tive power than PARLOG. GHC programs consist of guarded clauses such as: H :- G1 , G2, Gm I B1, B2, Bn. where, H, Gi, and Bj are head, guard and body literals, respectively and "I" is called a commit operator. When a goal literal is given each definition clause is invoked and a semap- hore flag shared among these clauses is crea- ted. Unification is attempted between the head literal and the given goal literal and if it succeeds then the guard literals are invoked as the new goal literals. Only the clause whose guard literals succed first can execute its body; i.e. the clause whose guard succeeds performs a test-and-set operation to the shared semaphore flag. If the result of this operati- on is also successful, the clause can execute its bodyj processing of the other clauses is terminated. Thus, one clause is exclusively selected for a given goal from all the clauses whose guards succeeded. There are several implementation schemes to support the guard mechanism in GHC C63: <1) Complete compilation scheme. All the unification directions are analysed in compila- tion time and codes are generated using unidi- rectional unification primitives. In this scheme the compiler is complicated. (2) System number scheme. All the envi- ronments are managed by guard system numbers. A new guard system number is allocated, when a new definition is invoked and is restored to its parent number when the commit operator is executed. The guard numbers are associated with all the variables included in the invoked clauses and the invironment to which each variable belongs is compared with the current environment when unification to the variable is attempted. <3) Pointer coloring scheme. The pointer coloring scheme distinguishes variables belon- ging to the goal literals from those belonging to the current guard by coloring.If unification is attempted between a goal variable and a variable in the invoked clause, the callee's variable is changed to a colored variable, which points to the original variables. If a colored variable is unified with a term, the instance bound to the variable is read before unification. The commit operator restores the colored variables to their original variables. 30 <4) Read-only tagging scheme. This sche- me is an extension of the pointer coloring scheme, in which every variable has a tag specifying its read-only level. The read-only levels of the goal variables are increnented by one before the definitions are invoked, and decremented by one when the commit operator of each invoked clause succeeds. Translation from GHC program into the data flow orach We shall illustate the translation from a given GHC program into the corresponding data flow graph. Let us have a sample program written in GHC (Fig. 2). It is a list-append program which appends a list specified by the second argument of the head literal to the end of the list specified by the first argument. the right side of the "=" operator. The "<<=" operator specifies a procedure "app" invocation macro. The "wait.instance" instruction reads the instance of the first goal argument which is passed along the input path "arg1". If the goal argument is an unbound variable, it is suspended until the variable is instantiated ("uargi"). This operation will need reaote access if the variable cells are distributed over the memory units in the system. Then, the "switch_by_type" instructions switch all the goal arguments according to the first argument "uargi". If the first argument is nil, they put their left operands on their first destina- tions, otherwise (if "uargi" is a list) they put their left operands on the second destina- tions. Thus, one of the subsequent segments is invoked exclusively. The "write_instance" in- struction tries to unify its two operands and ( IBS •9IDBGX PRSOKfltt ) »pnd([],T,Z):-true:ZsY. Fig. 2 GHC source program. ret<< = app(argl,arg2,arg3). begin. ( CLAUSE INDEXING ) uargl:«alt imttnctjargl), _R) = ««Ucl)_by_typ«(uargl,uargl). _R) = «w»tcl)_by_type(arg2,uargi). _R) = ««HtcD_by_typ«(arg3, uargi). (ret _l,ret_2) = i«ltch_by_typ« (ret, uargi). I COMPILED CODE OF THE FIRST CLAUSE ) resl = «rlte_li»«tance(arg3_L,arg2_L). r«turi»(resl,ret_l). ( COMPILED CODE OF THE SECOHD CLAUSE 1 (pl,pS):4tcoap»ie litt(argl_R). p3:cr««t« fl Io b a I va r(a r g1_R ). p 4 = c o n i Ilit(pl,p3). pS:*trit« lnittflci(ar|3_K,p4). p6<< = app(p2,arg2_R,p3). res2:cl)«cH c onslst«ncy(p5,p6). return(res2,ret_2). end. Fig. 3 Compiled code. The resulting list is unified with the third argument. The compiled code is depicted in Fig. 3. The first statement of the compiled code specifies the procedure name "app" and its arguments "arg1", "arg2", and "arg3". The pocedure body is enclosed by "begin" and "end" statements and consists of three segments. The second and the third segment are compiled codes of the first and second source clause, respec- tively. The role of the first segment is to decide which of the subsequent segments should be invoked. Each body statement corresponds to a node in the data flow graph. The left side of the "=" operator specifies destination paths for the results of the instuction specified by if one of them is a variable, it will instanti- ate the variable to another operand. In the second clause of the source program, there is a variable "Z1" in the body which does not appear in the guard. For such a variable, the "crea- te_global_var" instruction creates a new varia- ble cell and initializes it. Two body literals in the body of the second source clause will be executed in parallel. The first is the "write- _instance" instruction and the other is the recursive invocation of the predicate. The "check.consistenoy" instruction tests their re- sults whether they terminated successfully. The data flow program graph representation of the compiled code is shown in Fig. A. 31 argl (waitjnstance) uargl switch \^ by_type J^ &rg2 ( switch by_type arg3 arg3_R I create I global_var ( const_list I consistency I res2 ret 2 ret_l return Invoke *app* ret Fig. 4 Data flow graph representation of the compiled code. Machine architecture Abstract aaohinc arohitaotwra. The machi- ne can exploit OR and AND-parallelism as well as parallelism in unification. In head unifi- cation, if both literals consist of multiple arguments, or if both arguments are structured data, the unification of these arguments or their substructures can be executed in paral- lel. The machine is constructed from multiple processing elements and multiple structure Me- mories interconnected by networks. The ab- stract machine architecture is shown by Fig. 5 CO. Experimental aachina. The experimental machine is constructed from multiple processing element modules (PEs) and multiple structure' memory (nodules (SMs) interconnected through a hierarchial network as shown in Fig. 6 CS,&3. There are several hierarchy levels in the 32 processing element processing element NETWORK structure nenory structure nemory Fig. 5 The abstract machine architecture. interconnection network. Each PE has its local bus. Four PEs and four SMs are interconnected by an inter-module network bus. A set of these modules is called a cluster. Several clusters are futher interconnected by an inter-cluster network bus. The hardware specification for these interconnection busses are the sane, and they are called T-busses (token busses). Actual implenentation of the experimental machine includes two clusters and is currently being expanded to four clusters. Of these clusters, one is specialized, having one SM repleced by a host processor (VAX-11/730) which is used to initialize or monitor the system. Packet foraats. Each PE has several sta- ges in order to implement pipelined or parallel execution. Packets transferred between these stages include result packets and executable instruction packets. A resul packet (a token which is sent along the directed arc in the progran graph) , consists of three fields: (1) The activity identifier (16 bits) specifies the invoked procedure instance name to which the result packet belong. (2) The destination field (24 bits) spe- cifies the address of the destination instruc- tion (a node in the data flow graph) of the result packets. It also includes two bits for additional information} one specifies whether the destined instruction receives one or two operands, and the other specifies whether the operand is a left or right operand. (3) The data field (32 bits) contains the operand data to be send to the instruction. The machine uses a tagging schene, in which each operand has a value field (25 bits) and tag field (7 bits), which specifies the data type of the operand. If the operand is a structured data, the value field has a pointer to the structure aeraory (5-bit module nuaber and a 20-bit local address in iseaory), and tag field is further divided into two subfieldi a data type subfield, which specifies the data type of structure (i.e., list, vector, ...) and a attribute subfield. The attribute subfield contains a non-ground flag, which indicates whether the structure has any siaple variables. The attribute subfield also contains a shared flag, which indicates whether the structure has any shared-type variables (i.e., shared varia- bles, global variables, or read-only varia- bles). The machine recognizes the tag field of the operand and transfers control to the appro- priate firmware routine. An executable instruction packet consists of five fields: (1) The current instruction address (20 bits) indicates the instruction address to be executed and is used to obtain the destination PE...processing element SM...structure nenory NN...network node CLUSTER 8 Fig. 6 Configuration of the experimental machine. 33 address from the destination specifier field as described below. <2> The operation code field (8 bits) specifies the operation to be executed. (3) The left operand (32 bits). <*> The right operand (32 bits). (S) The destination specifier field (48 bits) specifies the destination addresses of the results. There are two modes to specify the destination addresses in the destination specifier field; in the full destination node the specifier field contains up to two destina- tions (each of then is 24-bit length), and in the short destination mode the specifier field contains up to four destinations, where each destination is of 12-bit length and contains the relative addresses from the current in- struction. The relative addresses are added to the current instruction address to obtain the absolute addresses. Processing eleaent nodule. Fig. 7 depicts the configuration of each PE. The stages in a PE include a packed queue unit (PQU) , an instruction control unit (ICU), several atonic processing units (APUs), and a network node (NN). These functional units have their own controllers and are operative in a pipelined manner. Packet transmission via T-bus is con- trolled by a NN, which has nine-to-one arbiter to arbitrate the requests from its lower level units and from its higher level bus. The PE has a local memory unit (LMU), which is used to store local data such as activity manegenent information, and is shared and accessible from APUs. PQU is a FIFO queue memory to store the result packets from the T-bus. ICU receives the result packets from PQU and checks if the destination instructions are executable or not. An instruction is executable if it receives a <( <( 4, T-BUS A P u 1 A P u 2 PQU ICU | I-BUS L M U > y APU,..atonic processing unit PQU,..packet queue unit ICU,..instruction control unit LMU...IOCO.I nenory unit T-BUS...token bus I-BUS...instruction bus Fig. 7 Configuration of a processing element. single operand, or if the partner operand is already in the operand memory (OM) in the ICLJ when it receives two operands. In the later' case, the ICU searches in its Oil whether the partner operand exists or not. If it does, the partner is removed from the memory; otherwise, the result packet is stored in the OH. This searching is performed associatively by hardwa- re hash using the identifier and the destinati- on address as the key field. If the instructi- on is executable, the ICU fetches the instruc- tion code in its instruction memory (III) and constructs an executable instruction packet and sends the packet to the next stage, one of APUs via the instruction bus (I-bus). The APU interprets the instruction packets and sends result packets to the PQU in its PE or other PEs, or sends structure access comand packets to SMs via the token bus. Structure aeaory Module. The SMs are responsible for the structure access conmands, perform structure manipulation operations, and return resul.ts to the destination specified by the commands. Each SM consists of an structure processing unit (SPU) and structure'memory unit (SMU) for storing the structured data (Fig. 8). The SPUs receive the structure manipulation commands from the APUs and interpret them. If the commands need the responses, new result packets are created and sent back to the PEs. Such commands include read commands,' memory allocation commands, and so on. s p u T-BUS 1 S M U y SPU...structure processing unit SMU,..structure nenory unit T-BUS...token bus Fig. 8 Configuration of a structure memory. The specification and typical processing times of the various units are given in Table 1 and Table 2, respectively. Table 1 Specification of the units. FIFO size: 16Kw x 86b (16K tokens) IM size: 96Kw x 59b (96K instr.) OM size: 32Kw x 64b (32K tokens) micro store: 1Kw x 32b ROM 7Kw x 32b RAM memory size: 512Kw x 32b ! unit I +—+ + I IPQUI I IICUI IPEI I I I APUI I I I I ILMUI +—+ + I ISPUI I SMI I I ISMUI I I I +—+ + I NN I + + * w specification* I micro store: 1Kw x 32b ROM 7Kw x 32b RAM memory size: 1Mw x 34b (data, tags) 512Kw x 10b (ref. count) FIFO size: 64w x 86b word b .. bit (64 tokens) I + 34 Table 2 Typical processing tines of the units. 1 unit I item IPQUI packet receive I I delay in queue IICUI single operand instruction I I two operand instruction PEI I (on arrival of 1st operand) I I two operand instruction I 1 (on arrival of 2nd operand) IAPUI "copy" instruction I I packet creation SM I SM-read operation I SM-write operation NN I packet receive I packet send time* I + 2 8 2 5 3 2 2 8 * in machine cycles 5 Concluding paints Striking progress in computer technology has given us single-chip computers whose pro- cessing power far exceeds that of the first-ge- neration computers. There are also various high-level languages, operating systems, and data-base systems. As a result, programs lor almost any kind of application can be written, provided that their algorithms can explicitly be described. This means that computers can replace people in many areas because of their high-speed processing and large memory capabi- lity. However, there remain many application fields with hard-to-solve problems. One such is the knowledge-information processing field, where FGCS are expected to play an important role. A machine to cope with knowledge-informa- tion processing should support extensive stora- ge of data and high-speed inference using the data. Up to now, inference procesing has involved implementing functional and logic pro- gramming languages an conventional sequential computers. However, the need for processing power of new applications in knowledge-informa- tion processing may exceed the capabilities of sequential computers. The architecture of parallel inference mechine makes it a possible candidate for coping with such processing requirements. Com- puter architectures proposed for parallel infe- rence machines include the high-level language machine C103 as well as the data flow machine. Literature C13 P. Bishop, Fifth generation coapu- ters: concepts, implementations and uses, (El- lis Horwood, 1986). C23 K. Furukawa, T. Yokoi, Basic soft- ware system, in: ICOT, ed., Proc. Int'1 Conf. Fifth Gener. Camp. Systems 1984, (North-Hol- land, 1984) 37-57. C33 L. 0. Hertzberger, R. P. Wan Oe Riet, Progress in the fifth generation inferen- ce architectures, Future Generation Computer Systems 1 (2) (1984) 93-102. CO N. Ito, H. Shimizu, II. Kishi, E. Kuno, K. Rokusawa, Data-flow based execution mechanisms of Parallel and Concurrent Prolog, New Generation Computing 3 (3) (1985) 15-41. C53 N. Ito, M. Kishi, E. Kuno, K. Rokusawa, The dataflow-based parallel inference machine to support two basic languages in KL1, in: J.V. Woods, ed., Fifth Gener. Co«p. Architectures, (North-Holland, 1986) 123-145. C63 N. Ito, M. Sato, E. Kuno, K. Rokusawa, The architecture and preliminary eva- luation results of the experimental parallel inference machine PIM-D, Proc. 13th Int'l Symp. Comp. Arch., (IEEE, 1986) 149-156. C73 T. Moto-oka, H. Tanaka, H. Aida, K. Hirata, T. Murayama, The architecture of a parallel inference engine - PIE, in: ICOT, ed., Proc. Int'l Conf. Fifth Gener. Comp. Syst- heos 1984, (North-Holland, 1984) 479-488. C83 K. Murakami, T. Kakuta, R. Onai, Architectures and hardware systems: parallel inference machine and knowledge base machine, in: ICOT, ed., Proc. Int'l Conf. Fifth Gener. Comp. Systems 1984, (North-Holland, 1984) 18-36. C90 K. Murakami, T. Kakuta, R. Onai, N. Ito, Research on parallel machine architec- ture for fifth-generation computer systems, Computer 18 (6) (1985) 76-92. C103 H. Tanaka, A parallel inference aac- hine, Computer 19 (5) (1986) 48-54. P0DATK0VN0 PRETOKOVNI PARALELNI 8TR0J ZA SKLEPAMJE. V Clanku je predstavljen paralelni stroj za sklepanje, ki temeljl na podatkovno pretokovnem izvrsevanju logicnih prograaov. Stroj podpira izvrsevanje logicnih prograaov, zapisanih v OR ali AND-paralelnen Prologu (Pa- rallel Prolog, PARLOG, Concurrent Prolog, GHC). Taksnl programi se prevedejo v podatkovno pre- tokovne programske grafe, ki ustrezajo strojne- mu jeziku. Podan je primer transformaciJe programa, zapisanega v jeziku GHC, v ustrezni podatkovno pretokovni programski graf. Arhi- tektura stroja obsega procesne elemente ter strukturne pomnilnike, ki jlh povezuje hierar- hicna aireza. Procesni element! izvrSujejo dele programskega grata socasno, pri Center si delijo podatke, zapisane v strukturnih pomnilnikih. Podan so tudi prostorske in Casovne zahteve posameznih komponent arhitekture.