THE REVIEVV OF SOME DATA FLOW 
COMPUTER ARCHITECTURES 
INFORMATICA 1/87 
UDK 681.32.02 
Jurij šile and Borut Robič 
Jožef Štefan Institute 
Department of Computer Science and Informatics 
Ljubljana 
The article reviews same selected data flow camputer arhitectures. 
Ali the architectures are designed far VLSI implementation to provide 
large throughput, law power consunptloni and reduced size and weight. 
Uhile some are in the phase of sloulation and VLSI chip floor-plan 
contruction the others already ekhibit real VLSI implementation. 
PREGLED IZBRANIH PODATKOVNO PRETOKOVNIH RAČUNALNIŠKIH ARHITEKTUR - V 
eianku podajava pregled izbranih podatkovna pretokovnih računalniških 
arhitektur. Nekatere arhitekurs so bodisi v fazi simuliranja oziroma 
izdelovanja logifinih natirtov za VLSI vezja, druga' pa so ie implementi­
rane v VLSI tehnologiji. 
Inbroduction 
In spite of the conceptual break uith 
previous coaputers, the harduare of the fifth 
generation computers will be based on VLSI of 
seaiconductor cooponents. Vet it is to be 
expected, that the hardware of each type of the 
fifth generation camputer will be much more 
closely tailured to the application area than 
it is the čase at present. 
For a nuober of reasons, one of the most 
promising architectural models is data flow 
architecture. It is (lexlble and ektensible, 
it has the potential for very high data thro-
ughputs, and it reflects, at harduare level, 
the inherent parallelism of the processing. 
Thus, the potential realm of use includes 
problem solving i inference machine and intel-
ligent interface machine as it was proposed in 
the JIPOEC project for fifth generation Compu­
ter systems. 
The presence of some real data flow compu-
ters indicates that the state of the art in 
data flow conputing has already passed initi-
al, purely.academic discussions. 
In the article ue review some evisting data 
llow computers from the architectural view 
point. The presentation is not intended to be 
thorough. Instead, we concentrate on similari-
ties and differences among the selected airchi-
tectures. 
HanchBster data (low coaputar 
The machine org 
data llou ooaputar 
nication organizati 
bie-86D and is s 
structure is a ring 
to a host system vi 
modules operate ind 
iashion with pack 
rate of 4.37 M pack 
ned for the same i 
her in the aatching 
storage capacity, 
anization o 
Ceurd-85] i 
on with tok 
houn in F 
of four 
a an I/O BW 
ependently 
ets transfe 
ets/second. 
nstruction 
unit. T 
so that an 
f the Hanchastar 
s a packet commu-
en roatching CHo-
ig.1 . The basic 
odules connected 
itch module. The 
in a pipelined 
rred at a maxiiiiuiii 
Paokets desti-
are paired toget-
his has limited 
overtlov unit is 
required for programs with large data sets. 
Paired packets and those destined for unary 
instructions, fetch the appropriate instruction 
from the instruction stara, which contains the 
nachine-code for the executing program. The 
instruction is forwarded togethec with its 
input data to the prooassing unit, where it is 
executed. Output packets are eventually produ-
ced and transnit.ted back to toward to the 
matching unit to enable subsequent instructi­
ons. The return path passas through the' I/O 
switch module, which connects the 5ysteiii to a 
host proosssor, and to the token queue, which 
is FIFO buffer for smoothing out an even rates 
of generation and consumption of packets. 
10 Host (168 Kbvtes/second maM ) 
(14 Klokefis/second niax.) 
token packcis 
MalChing Unil 
l/OS*ilch 
lohen pa^r 
packets 
InsifuctlOr StOfO 
executable pacitets 
PfOcoss"tgL'i 
Ifom tHost (168 KbytM/socond max ) 
(14 Ktol^ens/second tna«) 
Fig.1. Manchester dataflow system structure. 
62 
The I/O switoh oiodule gives prlority to 
input froffl the ring and selects the output 
route by perlorming a deoode of certain marker 
bits. It is organized as a simple two-by-tuo 
cooaon bus swltch. 
The token queue comprises three plpellne 
bufier registers and a circular buffer memory. 
The later has a capaclty of 32K packets, each 
beeing 96-bit wide. 
The aatching unit is based on a 1,2S MU 
pseudoassociative oieoory uith six pipeline re-
gistere in the nain ring and two buffere 
interfacing with the overflo« unit, The roemory 
is used to store oatched packets while auaiting 
their partners. Its associative operation is 
achieved by accessing a parallel store using an 
appropriate hash function. Recall, that pac­
kets destined for unary instructions do not 
need to oatoh uith partners; instead, they pass 
straight through the unit. The overflo« unit 
handles packets that cannot be placed in the 
parallel hash table becouse they encounter a 
(ull hash entry. Overflow packets are stored 
in linked lists in the ovarflou unit, uhich 
contains a oicrocoded processor together uiith 
data and pointer nemories. 
The instruction store coraprises two pipeli­
ne buffar registers, a segment lookup table, 
and a random-access instruction store to hold 
the program, The segment field of the inconing 
packet is used to. access the instruction froo 
the store. The instruction is 70 bits uide, 
The instruction is conblned uith a destination 
fiald and the data iield of the inconing packet 
and Is sent to the perocassing unit as a 
'tA6-bit eiiecutable instruction packet, 
The processing unit comprises five pipeli-
ned buffer registers, a special purpose prepro-
cessor, and a parallel array of up to 20 
hoaogeneous oicrocoded funotion units uith lo-
cal buffer registers and cominon buses for input 
and output, A soall nuober of instructions are 
executed in the preprocessor but the iiiaJorlty 
ara passed into one of the funotion units vla 
the distributlon bus. Each funotion unit con­
tains a oicrocoded bit-slice processor with 
input and output buffering, 51 internal regi­
sters, and 4 KU of uritable nicrocode inemory. 
Instructions are enecued independently in their 
allotted funotion unit, and the output is 
oerged onto the arbitration bus and thence out 
of the processing unit touard the I/O suitch. 
In the nanohester archltacture a haruare 
nashing schene is used to simulate the associa­
tive neaory uhich turns out to be Isss exBpen-
cive, Unfortunately, this scheme does not 
produce very good results in terns of uaiting 
tiso. In order to reduce the uaiting tirne, a 
•ultiple aatching units scheoe is incorporated 
in the EIHAN - EXtended MANchester data flou 
coaputer CPatnaik-863, 
NIT data flou ooaputor 
The HIT data ilav ooaputar bases on a 
static concept of data flou architecture CDen-
nis-B03 in uhich the instructions of machine 
leval prograo are loaded into specific iiiermory 
location in the nachine before conputation 
begins, and only one instance of an instruction 
is active at a tine. 
Instructions are held in the local niesories 
of the proovsalng alaaant PE, Each instruction 
includes an ooeration code, spaces for operand 
values, and destination fields that spacify 
uhere resuls should be sent. Each PE is 
equlpped to recognise uhich of the instructions 
it holds havs been enabled for execution by 
arrival of needed operand values, If an ena­
bled instruction calls for a scalar arithoetic 
operation, the instruction, including its ope-
rands, is sent to a functlonal unit FU capable 
of perforaing that operation. The array •a*ory 
units AH are provided to hold arrays of data 
PC 
1 
A__ _ 
..r n. 
s 
o 
0 
PE 
1 
Hr 
T 
A 
^ 
^ 
fu 
AM 
0 
o 
o 
AM 
FU 
1 
A 
v 
V 
^ 
D(5ni«LincN 
ROUTNC 
fit ruMIomtUMt 
Fig.2o The HIT data flou computer. 
making up the data base of coiaputation, and are 
accessible through the seaorv routing network. 
Instruction eKecution in FU or AM ylelds result 
packets each of uhich consiets of a data value 
and 8 destination field that specifies the 
targat instruction for the result packet. The 
result packets are sent to PEs that hold the 
target instruction through the distributlon 
routing natuork. Other instructions, such as 
those calling (or duplicata data values, for 
boolean operations, and for simple tests, are 
performed uithin the PE. 
The current status of the HIT data flou 
projeot ie that hardueire for the above conputer 
architecture is under development. For this 
sake, a data flou engineering model C0ennis-S33 
consisting of eight processing units coupled by 
a paket coaouniation nstuork built of tuo-by-t-
uo routers is designed for aoulating the de-
scribed architecture. 
Data llou coaputar SIGHA-l 
8ieHA-1 is a data flou oultiprocessor sy-
steo for soisntific conputations CShifflada-86]. 
The configuration o( the 5ysteci is depicted in 
Fig.3. Four processing elenents PE and four 
structure eleoents SE are connected by local 
netuork and oalled a group. Sroups are connec­
ted by global nstuork. The purpose of using 
hiorarchlcal netuork is to executQ prograns 
afficiently by utillzlng prinoiples of locali-
ty. 
Fig.3o Global configuratlon of the SIGHA-l, 
The procassing slammnb consists of five 
units, uith the units organized as a tuo-stage 
plpellne as shoun in Fig.4. PE enecutes ali 
Instructions except those that sianipulate 
struoture Beaory. The buffar unit (B KU of AO 
bits) is an Interface betueen the netuork and 
the PE. The length of the incoaing packet is 
88 bits. It is divided into tuo parts (top 
63 
48-bit and bottom *0-bit) and passes through' 
the network as aonsecutive parts. The most 
signifficant 8 blts are a network address, next 
40 bits are tag, and the remalning 40 bits are 
data type and value. Hhen there is no gaiting 
packet in the buffer iiieinQry and the nent units 
are not dealing with an other packet^ the 
incooing packet bypasses this unit and proceeds 
to the subsequent units. The (stoh unit is 16 
KM, 40-bit-wide program memory. The link num-
ber carried by an incotning packet is used to 
access the address o( an Instruotlon to be 
letched. The operation field of the fetched 
instruction /indicates an operation code and is 
sent to the Bxecution unit. The destination 
field of the fetched instruction gives addree-
ses of destination instructions Cwaiting for 
the result) and is sent to the destination' 
unit. The matching flags from the destination 
field are sent to the matching unit. The 
•atohlng unit is a 16KH, 80-bit-wide associati-
ve fflemory to find a partner packet of an 
incooiing packet. The matching-flag indicates 
whether the operation is unary or binary. When 
it is a unary, the incoraing data packet is 
bypassed to the execution unit. If the in­
struction is binary operation the incoming 
packet is stored in the associative memory if 
it is a first arrived packet of the two 
operands. Otheruisei the matching unit succe-
eds to find a partner packet in the matching 
aemory and sends both data of packet pair to 
the executlon unit. The •Ksoutlon unit con-
sists of an ALU, a shift unit and a floatlng 
point arithffletio unit. The word length is 32 
bits. It receives an operation code from the 
fetch unit and data from matching unit. The 
destination unit nakes output packets by combi-
ning the destination addresses and results from 
the execution unit. 
JL 
PBTCH 
OIIT 
KATCailS 
DiriT 
DESTimiOl 
UIIT 
"^V 
PIRST 
STACe 
eiECtmoii 
DIIT 
SECOID 
STAGB 
used for the module at each stage of the global 
netuork. 
Judging from the performanpe of 1.35 MIPS 
of the pratotype harduare for the benchoark 
prograns, the performance of the next version 
of a processor with CMOS LSI technology should 
be about 1.? MIPS. 
|iP07281-baaed data flow architaotura 
The pPD7281 is the first VLSI device on 
Silicon using data flou architecture CNEC-8S:]. 
ThB pPD72ai image pipeline processor is desi-
gned to be used as a peripheral processor with 
a mini- or aicrocomputer serving as the host. 
Fig.S shows a general system configuration 
exafflple of uhich four pP07281s are used connec-
ted to the neaory in a ring shape with the 
entlre ring interfacing with the host computer 
Via a standard bus. 
For the above architecture, NEC is develo-
ping a support chip HAGIC, nefflory Access and 
General bus Interface Chlp. It handlee aH 
packet flow between the pP07281s, the image 
ffleaory, and the host processor. 
/ \ 
I 8 I 
I V I 
I I 
I 8 I 
I I 
I T I 
I 
I 
• • 
• • 
I I 
I I 
I IHA6E I 
I HENORV I 
I I 
•_—+•_—4. 
II 
•I -f-i • •—• 
i I 
H A 6 I C I 
I I 
I I 
I I 
I B I 
I U • !• 
HOST 
CPU 
ir 
8 I •-
I 
I 
I 
\ / 
No. 4 
PPD7281 
II 
No. 3 
|iPD72ai 
II 
No. 2 
PPD72S1 
II 
No. 1 
)iPD72ai 
II 
—+1 
Fi4.4. Structure of the processing element. Fig.S. )iPD7281-based data flcv architecture. 
The atruotur« aleaent comprises 64KU, 
35-bit-wide memory te store arrav data and a 
control unit to nanage free «emory uords and 
uaiting queue8. When an array is deolared in a 
program, a contiguous area corresponding to the 
array size is allooated in the structure memo-
ry. Once the uord is allooated, the used bit 
of each uord in the area is turned on, Each 
word has two other special bita. The presence 
bit aeans that data has already been vritten in 
the word. The waiting bit indicates that at 
least one read request packet exists in the 
uaiting queue. Hhen data is written in the 
word the data is sent to the instructions 
indicated by the walting packets. 
A 10 by 10 crossbar is adopted for a looal 
natuork. This is realized by bit slice ohip. 
The global nctvork is organlzed as a nultistage 
netuork CHavrie-863. The same orossbar chip is 
The pPb72ai uses an internal circular pipe­
line and the powerful instruction set Ceilc-86tl 
to allow high end immage processing. A data 
flow architecture allows the processor to maxi-
Bize efficiency in a variety of multiprocessing 
applications. As shoun in the block diagram in 
Fig.6, the nPD7281 is formed by ten functional 
blocksi the Input oontrollcr IC, the link table 
LT, tha funotion tabl« FT, the address genera­
tor and flow oontraller A6&FC, the data acvir/ 
DH, the queue G, the processing unit PU, the 
output fueue 08, the output controller OC, and 
the refreeh eontroller RC. 
Before any processing occurs, the host 
processor down-loads the object code into the 
LT and FT by using specially fornated input 
packets. The contents of the LT and FT are 
closely related to a data flou graph. The arcs 
represent the entries in the LT uhlle the nodes 
represent the entries in the FT. 
64 
Fig.6. Block diagram of tha pPD72B1, 
prooaaaing aleaants (PEs). The PEs are allowed 
to be functianally nonldentical in order to 
capitalize the eKisting high speed architectu-
res for fixed signal processing algorithas. 
Frequently usad operations inay be executed in 
dedicated PEs having the appropriate hardware 
structure. The I/O functions take plače in 
speoial PEs called I/O processors. This is 
oonvenient in signal processing applications, 
beoouse signal sources and sinks lend to inlro-
duca specialized requirenents. 
The control section schedules instruotions 
lor the PEs using fiiced-format messages. Re-
oall, that initially ali the informatlon about 
tha data flou graph of the application program 
resides in the local menories of the PEs. Each 
result packet carries the neccessary part of 
this information to the control section where 
it is temporarily stored in the activity store 
until the destination operation may be schedu-
led for the execution. The eKecution is per-
foraed by sending an operation packet to one of 
the PEs. 
The aotlvlty atora contains the activlty 
tenplates of those operations which ^-.Ave reoei-
ved at least one of the operands, Di:t, whi.ch are 
not Echeduled for the eKecution. Conceptually, 
tha activity store contains a representation of 
tha active part of the data flow graph. 
The contains of the result packet are used 
by the updata unit for locating the activity 
teaplate (of the destination operation). It 
also contains the the address of a block in the 
When a data packet enters the pPD7281, it 
fetches fro« the LT the address of the instruc-
tion in FT, uaiting for the incoming data. 
After the destination instruotion has been 
fetched, the ASFC unit deteraines whether the 
instruotion is unary or binary. If it is 
unary, the operation packet, consisting of the 
instruotion and the data is oomposed and sent 
Via 9 to the PU. If it it binary, the ASFC 
Stores the inooaing data to the 011 if it is the 
first arrived operand for the instruotion. 
Otheruise, it fetohes tha flrst operand from 
the on and sends it together uith the incoming 
packet and the instruotion to PU via a. The 
result packet froa PU can either be sent out of 
the pPD7281 <via LT, O, 09, and OC) or can be 
used for further execution of the program graph 
in the same processor. 
The applications of the pPD7281-based data 
flow architecture include digital image resto-
ration, data oonpression, and enhancement, pat-
tern recognition, radar and sonar processing, 
FFTs, digital filtaring, speech processing, and 
nueeric proessing. 
DFSP - a dat« flow signal prooessar arohltectu-
r« 
A block dlagraa of the DFSP arohitacture 
CHartiao-S&] is shown in Fig.7. A bank cf 
processing elesents constitutes tha •xaoution 
unit, uhich parforas tha aotual digital signal 
processing coaputation* and I/O operations. 
Other parts of the architaotura fora a oontrol 
••otion, which is essentially a data flovi 
instruotion execution pipeline. In orther to 
inorease conmunication bandwidth, data trans-
fers are separated physically froa executlon 
control using a double bus architecture. Si­
gnal data is transfarred via tha shaded buses 
of the figure, The unshaded buses are used for 
operation and results packets, whioh do not 
contain operand andrecult values, respective-
A host conputer is requlred to load the 
application prograas of the DFSP. Programs are 
ooded as saparata high lavel operations, which 
are copied into the looal •••ories of the 
11 
ACTIVITV 
STOHAGE 
Fig.7. Block diagram of the DFSP architecture. 
data storag«, where the value of the operand 
has been stored. If the operand is the first 
one, the update unit creates a new activity 
teaplate and stores the result packet into it. 
Otheruise, the result packet is stored in the 
located activity template. Finally, it puts a 
transfer coamand into the result queue. 
After the rasult tranafar unit detects the 
coaaand from the queue it transfers the updated 
activity teaplate. Each activity template con­
tains a TRIGCER field whose value indicates the 
nuaber of the arrived operands. The result 
65 
transfer unit decrenents the TRIGSER field of 
the destination teoplate and checks for the 
resulting value. If TRIGGER equals zero the 
teaplate address iE put into the queue, sihce 
the operation is exectable. 
After the tetoh unit gets a teaplate ad­
dress (rom the queue, sends the operation 
packet to an Idle PE, and puts a data transfer 
coaaand into the data queue. 
Finally, the data transfer unit initiates 
the transfer of the operand data block fron the 
data store to the PE. 
The perforaance of the DFSP architecture 
was evaluated on a deteroinistic discrete event 
slaulator. The update unit has been the major 
bottleneck in the simulated control section. 
However, considerabl/ uniforn utilization of 
the (four) processors has generally been achle-
ved. Also, a VLSI iaplementation of ths con­
trol section is under develcpuient. 
DESTINATION 
MEMORV 
lOM) 
MOUTIftt 
KAM CHirS 
PIANE COLUMN ROW 
t 1 1 
COMMUNICATIONS CMIP 
• PACKET ROUTING 
ANO FAULT TOLERANCE 
COMUVISII 
W)OCESSINC CHIP 
• OPERANO FETCH 
• DATA FI.OW SEOUENCING 
• MSTRUCnON EKECirriON 
• tENOOUTMSUlTS 
pnocii visii 
TEMPLATE 
MEMORV 
ITMI 
MULTIPU 
RAMCMPS 
HDFH - Huges data flou •ultiprocacsor 
The HDFN is a proposed high performance, 
foult tolerant, high level language prograoina-
bls processor targeted for enbedded signal and 
data processing applications CGaudlot-85]. The 
HDFPI consists ' of one to hundreds identical 
procasming alMants PEs connected by global 
packet-swltching netuork. This netwark is a 
three-diaensional bused-cube network as shaun 
in Fig.8. Packet transmlssion proceeds via a 
store-and-foruard protocol, uhich allows any PE 
to transfer data to any other PE. 
y^ 
Fi«.a. 
1X1X1CONFICURATION 
Ths HDFn Gubic bus interoonnatlon nat-
work. 
Fig.9. Tha structure of the PE. 
to another allways take the sane path. This 
prinoipla, oalled single path routing, is requ-
Ired to preserve tha order of packets in 
tlne-ordered data groups such as strings. Each 
PE consists of four partsi ooaaunioation ohip 
con, proiMssliio ohip PROČ, dastinatlon Basor/ 
DH, and t«aplat« Ma«ory TH. Tha Con receives 
packets troa the routing netuork and either 
forwards thea onto other PEs or sends thea to 
Its attached PROČ. Uhen the PROČ receives a 
packet, it datarnlnates uhether the packet has 
enabled a teaplate to fire. If so, the tenpla­
te opcode and the operands stored in the TK are 
fatched, coobined with the incooing packet, 
uhich enabled the teaplate to fire, and sent to 
ALU. Sioultaneously, the destination addresses 
to uhich the result should be sent are fetched. 
The ALU perforos the operation and the results 
are aatched uith their destination address to 
fora packets. The packets are sent either back 
to the saoe PE or out into the routing netuork. 
If the teaplate is not ready to fire, the 
arriving paoket is storad in the TM. Since a 
teaplate result aay naad to be sent to nultiple 
destinations, there is additional destination 
overflou storage in the DH to acconodate lists 
of destinations. for a' node. 
Siaulation results desonstrate high-perfor-
•ance operation uith high-.level language pro-
graoaabllity. For exaaple, the results of the 
deterainistic siaulation of tha aachine shou 
that a 64 proaessing eleaent aachine nay provi-
da real throughput of 6«^ HIP6. 
Each PE can execute the instruction set and 
perfora the data flou sequencing and addres-
sing. ' Each PE has its oun local mBaory for 
both prograa and data storage. There is no 
global aeaory. The program, uhich consists of 
data flou graph nodes, is allocated to the 
local aeaories of the PEs at coapile tioe. 
Uhen nodes are implenented in an architecture 
they ara called teaplates. A teoplate consists 
of an opcode, slcts for operands and destinati­
on pointers, uhich indicate the nodes to uhich 
the results of the operation should be sent, 
Each PE has a unique 9 bit address oorrespon-
ding to its position in the oube (plane, 
coluan, rou). This allous up to 8 PEs per bus 
for a aaxiaua configuration of 512 PEs. Hithin 
each PE ara aultiple teaplates, each of uhich 
has unique teaplate address. To iopleoent the 
data flou nodel, the results of one teaplate 
ara sent to another in the fora of packets. 
Each paoket consists of a type field, a desti­
nation ' address, and data. The destination 
address indioates the PE address and the tea­
plate address of the teaplate to uhich the data 
is to be sent. Packets travelling froa one PE 
PlH-0 - the datallov-based parallel inferenoe 
•aohln« 
The research and developaent of the paral-
lel inferenca nachine includes the data flou 
aeohanisa to rapldly executa inference operati-
ons. The data flou eodal has also siailarity 
to the logio prograoaing languages. E^ecution 
of logio prograoB is parforaad in a goal driven 
aanner) a olause in tha prograa is initiated 
uhen a goal is given and returns the results to 
the goal. The logio pr.ograas are ccopiled into 
data flou graphs CBic-S43. PIH-D is an exaaple 
arhiteoture to support parallel version KUl of 
the kernal language for ICOT's fifth generation 
ooBputers Clto-863. 
PIM-O is, siailarly to S16HA-1, constructed 
froa oultiple proaasslng eleaents PEs and 
struotur« eecorles SHs interoonnected by a 
netucrk. 
Each PE has saveral stagas. The packets 
transferred betueen these stages include result 
packets and exeoutabla instruction packets. A 
result paoket consists of three fields: identi-
fier, destination and the data. The identifier 
66 
to/froo Hetwork 
1 Token Bua 
^ 
Fig.10. 
Instruotlon Bua 
FQO: Pack«t Qu«ue Unlt 
lOJ; Inatruetlon Control Unlt 
APO: Atoole Processing Onlt 
Configuration of the processing ele-
nent. 
specifies the Invoked procedure instance to 
which the result packet belongs. The destina-
tion specifies the destination instruction ad-
dress of the result paoket. It also specifies 
Mhether the destined instruction receives a 
single operand or two operanda. The data field 
contains the operand data to be sent to the 
instructuion. Fig,10 depicts the configuration 
of each PE. Paokat ^uaue unit PSU is a FIFO 
queue oeiiiory to store the result packsts froin 
the token bus. Instruotlon oontrol unit ICU 
receives the result paokets (roai PSU and checks 
if the destination instructions are enecutable 
or not. An Instruction is executable if it 
receives a single operand, or i( the partner 
operand is already in the operand aaaorv in the 
ICU vhen it receives two operands. In the 
later čase, the ICU searches in its operand 
aeaory whether the partner operand exists or 
not. If it does, the partner is removed from 
the aeiaoryt otherwise, the result packet is 
stored in the operand nemory. This searchlng 
is perforaed associatively by hardvare hash 
using the identlfier and the destination ad-
dress as the key field. If the instruction is 
axecutable, the ICU fetches the instruction 
code in its instruotlon aaaory and construots 
an exeoutable instruction paoket and sends the 
packet to the next stage, one of atoalo proces­
sing unita APUs Via the inatruotion bus. Ths 
APU interprets the instruction packets and 
eends result packets to the PSU in its PE or 
other PEs, or sends structure aocess conand 
packets to SHs via the token bus. The BMs are 
responslble for the structure acoess conmands, 
perform structure aanipulatlon operations, and 
return results to tha destination specified by 
the conaands. 
Actual inpleaentation of the experifflental 
oachine is currently beeing dsveloped. The 
aachine includes fi PEs, 7 SHs, and one host 
ooaputer used to aonitor or debug the s;^stein. 
The APUs and Sds are Inplemented as micropro-
graa control units using bit-slice oioroproces-
eors or special harduare to reoognlze the data 
tag. The ICUs are also microprograo controlled 
to iapleaent hashlng harduare. A software 
siaulator for OR-parallel and Concurrent Prolog 
was developed. Perfornance evaluatlon results 
froa the softwara sinulator lshow that about one 
aillion head unifloations per seoond can be 
achieved by eiiploiting parallelis«. 
Sinos about 
and uldespread res 
ooeputer architec 
ninated in nany de 
ter systems, seve 
in the process of 
re. 
The major lo 
techniques uiti b 
perfornance of 
partlculary inpor 
systeffis should b 
in VLSI, and to 
software techniqu 
the hardware. 
Conclusions 
1970 there has been a groving 
earch interest In data flcu 
ture. This interest has oul-
signs for data-driven coapu-
ral of which have been or are 
being implemented in hardva-
ng-terffl interest in dataflou 
e in the construction ar.č 
ultiprocessor systeiiis. It is 
tant to knou how dataflow 
e designed for implementation 
be oertain that elfiotive 
es are avaiable for utilizing 
Referenoes 
CBic-8A3 l..Bic, A data-driven model for paral-
lel interpretation of logio programs, Proo. 
Int'1 Conf. Fifth Sen. Comp. Svstems, ICOT 
(1984) 517-523. 
CDBnnis-603 J.B.Dennis, Data flow supercompu-
ters, Computer 13 (11) (1980) A8-S6. 
CDennis-83] J.B.Dennis, W.Y.-P.Lim, and W.B.Ac-
kerman, The MIT data flow engineerlng model, 
Ini R.E.A.Mason, Ed. Information Processing 
83, (Elsevier Science Puhlishers B.V., North-
Holland, 1983) 553-560. 
C6audiot-8S] J.-L.Gaudiot, R.M.Vedder, G.K.Tuc-
kar, D.Finn, and M.L.Campbell, A distrtbuted 
VLSI architecture for efficient signal and data 
processing, IEEE Trans. Comp. 34 (12) (1985) 
1072-1087. 
C6urd-853 J.R.Surd, C.C.Kirkham, and I.Hatson, 
The Hanchester prototype dataflou computer, 
Coaa. ACH 28 (1) (1985) 34-52. 
CHartimo-863 I.Hartimo, K.Kronl5f, O.Simula, 
and J.Skytta, DFSPi A data tlo« signal proces-
sor, IEEE Trans. Comp. 35 (1) (1986) 23-33. 
CIto-86] N.Ito, M.Kishi, E.Kuno, and K.Rokusa-
ua, The dataflow-based parallel Inference mac-
hine to support two basic languages in KL1, in; 
J.V.Uoods, Ed. Fifth Generation Computer Arc-
hiteotures, (Elsevier Science Publishers B.V., 
North-Holland, 1986) 123-145. 
CMEC-853 NEC Electronics, Inage pipeline pro-
oessor MP072810, Product Description (1985). 
CMavrie-863 S.MavriB, B.hihovilovie, and P.Kol-
bezen, The interconneotion netvork in a »ulti­
processor syste>, Informatlca 10 (4) (1986) 
44-50 (in Slovene). 
CPatnaik-S63 L.n.Patnaik, R.GovindaraJan, and 
N.SiRanadoss, Design and perforaance evaluation 
of EXnANi an EXtended MANchester data flow 
ooeputer, IEEE Trans. Comp. 3S (3) (1986) 
229-244. 
CRabie-863 B.Robiti and J.Sile, Classif ication 
of new generation computer architectures, In-
formatioa 10 (4) (1986) 18-32 (in Slovene). 
C8hiaada-863 T.Shlmada, K.Hiraki, K.Nishida, 
and S.Sekiguohi, Evaluation of prototype data 
flo« processor of the SIGMA-I for sclentiflc 
computation, Proč. 13th Int'1 Syfflp. Comp. 
Aroh., IEEE (1986) 226-234. 
Ceilc-863 J.eilc and B.Robifi, Data flow archi­
tecture based processor, Informatica 10 (4) 
(1986) 74-80 (in Slovene).