ANALYSIS OF BUFFERED MULTISTAGE INTERCONNECTION
NETWORK FOR PARALLEL PROCESSORS
INFORMATICA4/87
UDK 618.3.02
Dusan Fajfar
Institute for Teleinformatics
ISKRA Telematika, Kranj
Matija Lokar
University E. K. Ljubljana, Department of Mathematics
Abstract In the paper a method for calculating distribution law on the number of memory requests in
the finite buffer in multistage interconnection network of parallel processors is given. A modified
delta network is used for the connecting processors with memory modules. Memory requests on each
processor are generated randomly and independently. Two cases of traffic flow are discussed: the
constant average rates of requests and the time dependent average rates.
Povzetek v clanku analiziramo vecnivojako povezovalno mrezo med procesorji in pomnilniskimi moduli,
ki jo aestavljajo element! z vmesnimi pomnilniki omejene kapacitete. Podana Je metoda za izracun
porazdelitvenega zakona stevila zahtev po podatkih iz pomnilnlka v posameznem elementu mreze. Vsak
procesor generira zahteve po pomnilniskih modulih nakljucno in neodvisno od ostallh. Obravnavana sta
dva primera: casovno konstantno in s casom spreminjajoce se povprecno stevilo zahtev.
Keywords Buffered network, delta network, multistage Interconnection network, buffer length, queuelng
theory.
I.Introduction
In the recent years a lot of new multiprocessor
architectures have been proposed. The main problem of any
multiprocessor system is its interconnection network. A
typical configuration of such system is illustrated in
Fig. 1. Many identical processors are connected via an
interconnection network to identical memory modules. Each
processor should have access to each memory module with
requests generated randomly and independently at each
processor.
We have several possible organizations of such,
processors - memory interconnection. For example, a
shared bus in an unexpensive, but admits very low
transfer rate. On the other hand maximum transfer rate
is attained by the full crossbar switch (with n»m
switches where n la number of processors and m number of
memory modules). However it is far too complicated and
expensive for practical application. Thus we have to use
interconnection networks with less then n»m switches. One
of them is multistage interconnection network Ci]. A lot
of them have been presented in the literature
(C2],[3],[i)],[5],C6]). In this paper we study a delta
network with some modifications in the first and the last
stage (Input process from processors and output process
to memory). In section II description of network is
given. The network consists of three types of switches.
For more detailed presentation see C6]. In sections III,
IV and V the distribution law of buffer length for each
type of switch is calculated. In section VI results and
conclusions are given.
II. The network model and system operation
We discuss a multistage network for connection N
processors with N memory modules, where N = 2n. The model
for N = 16 is Illustrated in Fig. 2.
The network consists of log^ N + 1 stages. Each stage
has N identical switches. With regard to the number of
input and output links there are three types of switches.
Switches at the first stage have one input and two output
links, switches at the last stage have two input links
and only one output link. Switches at all other stages
have two input and two output links. To achieve higher
transfer rate and to avoid blocking there is a finite
buffer at each output link, where the incoming requests
wait to be processed further.
As we have finite buffers we propose that in the
case when the buffer is full, the Incoming requests are
lost.
For the time unit we choose the system cycle time.
At each cycle only one request can be transmitted through
the same link except at the first stage where more than
one request can come by input links from each processor.
In one time unit only the first request in the buffer
can travel from the stage i to stage i + 1. As we can
see on Fig. 2. there are no links between the switches
of the same stage. The input links of stage 1 are the
output links of the stage 1-1. The first stage has input
links from processors and the last stage has output links
to memory modules. This regularity of the network gives
the possibility to analyze it stage by stage Instead of
the whole network at once. In the next section we give
the analysis of the first stage in the section IV we
give analysis of the stages indexed from 2 to log-N and
in the section V the last stage is analyzed.
(1)
kl
He analyze the case where a is constant and the case
where a is time dependent, denoted by a(t).
Each request cooing from processor is switched in
one of two output buffers. As requests are uniformly
distributed between memory modules <[6]) there is an
equal probability that the incoming request Joins the
first or the second buffer. By pk we denote the
probability that i of k requests enter the first buffer
(and k-m requests enter the second buffer). Thus we get
(0.5) (2)
Let pa denote the probability that m requests enter
the first buffer in one cycle. Then
k=m
= I ak e"a / kl (0.5)k
k:D 01
= am e"a/2 /(2«,|)
Tio.Z
III. Switches with one input and two output links
Switches with one input and two output links can be
found only in the first stage. The Input links come from
processors and the output links are connected to switches
of the second stage. Each processor is connected to only
one switch. There are no links between the switches of
the same stage, so we have N identical systems. Each
system consists of a processor that sends the requests
to switch and two output links, each with finite buffer.
As the memory requests are time independent ([6]),
we use the Poisson distribution with a given average rate
a. The average rate can be different for each processor.
Let p denote the probability that k requests are sent
from processor in one cycle. By Poisson law we get
The output process is very simple if we compare it
with the input process. In each cycle only the first
request from each buffer leaves system.
Let us first analyze stationary system with constant
average rate of input process. Since events on both
buffers are equal we could analyze only one of them. Let
bl denote the buffer length (we propose that all buffers
have the same length but we do not have any difficulties
when buffers have different length. We just have to
calculate the distribution for all switches). The balance
diagram Is shown In Fig 3.
F/g. 3
The nodes In the diagram denote the number of
requests in the buffer. If we look at in- and out-coming
arcs to each node we get the system of balance equations
where n(s) denotes the probability that we have s
requests in the buffer.
n(0) • I P(n = no) » p0
0(»).t)>0 • [
m-2
= n(o)«ps
k=1
-. oio)«p
bl
s=1,2 bl - 1 (4)
will Join the buffer.
q,. , = min {bl, q,. + \
<L , = min ibl, Vf J
So fl(s,t) is expressed by
s+1
n(s.t) = I n(i,t-D v*
1=1
+ p(0,t-1)
s-i+1
(9)
qt,<0
= 0
(8)
if s is less than buffer length bl. In that case some of
the requests are rejected. The equation is almost the
same as (9), we must only replace v. with v,, where
Iw we pot PB expressed by (3) in the left side of
equations (4) we get the following system of linear
equations.
e"a/2 (e" • n(o)
- (a/2)e'a/2] =
= n(o)«p
s-1
I
k=1
n(s+i)P
= 1, 2 bl - 1 («')
n(bl) e"a/2
bl-1
= n(o)*Pbl + [ n(k)«p
k=1
bl-k+1
As the system (O is trlvialy solved by putting
all O(sJ to 0, we add the normalization equation (5).
JrO
(10)
is the probability that more than i-1 requests are
Joining the buffer at the time t.
s+1
oc.t) .^ nu.t-o »*.1+1
(11)
p(0,t-1)
We start with the distribution
p(0,0) = 1, p(s,0) = 0
= 1,2 bl
and repeatedly calculate the probabilities.
I p(s) = 1
s=0
(5)
Now the system can be solved by any of the well-known
methods for solving linear system of equations.
For the system where average rate of input process
is time dependent we could not use the same approach.
Because the system works in cycles we consider the
discrete time. Instead of probabilities n(s) we have
fl(s,t) where the second variable denotes the cycle
counter. With vj we denote the probability that 1
requests came on input line to the buffer at the time t
and is expressed by the following equation
v* - E ^ . Pk
k=l
(6)
A simple calculation gives
= a(t)D e"a<t)/2/ (7)
With a given buffer length bl we get the recurrence
relation for the number of requests in the buffer at the
time t (denoted by qfc) if in this moment Vn requests
The same approach we could use in the case where
average input rate is time constant. But as equations
(9) and (10) can not be simplified, the calculation with
this approach is time consuming.
IV. Switches with two input and two output links
Switches with two input and two output links we find
in stages form 2 to log2 N. TWO input links are connected
to two switches of the previous stage and the output links
are connected to two switches of the next stage.
The input process now depends on the output process
of the previous stage. If the buffers that are connected
to input links of the switch are not empty, we get a
request out of them. As we have only two input links, we
can not get more than two requests in one cycle. The
output process is just the same as described in section
III.
Again we first analyze the stationary case with
constant average rate. As we could see from the previous
section and as is shown in this, probabilities that
buffers are not empty (in this case we get a request)
are not changing with time. Let D. and H? be the
probabilities that the buffers from where input links
come are empty. Then we have the following probabilities
on the number of incoming requests
1 " 2
n, (i -n2,
d - n
o
n2 d -ii,)
- n2)
(12)
k = 3, 1,
As the path from the processor to the memory module
is completely random, requests Join each of the two
buffers with equal probability. So we get
P0 •
PQ = p] = p2 = 0.5 (13)
PQ • P* * 0.25
For the probability of m requests coming into buffer
in one cycle we get
? k 1c
k:O
m
(1J4)
The balance diagram for this type of switch is shown
in Fig. 1.
F;
The balance equatations which we obtain from the
diagram are
0(0) (p, + p2) = 0(1) pQ
(0(0) + 0(1)) p2 = 0(2) pQ (15)
O(s) p2 = 0(s*1) pQ s = 2, 3 bl - 1
Now the system is already linear. We Just have to
add the normalization equation (5) and solve it.
For the time dependent system we use a 3imilar
procedure as in the previous section. Consider that we
analyze the 3tage 1. Then we have
0(0,t) = 1, o(s,t) = 0, s = 1, ..., bl,
t = 0 1-1
(16)
on the stage 1. The number of requests in the buffer may
grow maximaly by a request per cycle, because of maxlmaly
two incoming requests and one outgoing request (except
In the case where at the previous moment the buffer was
empty. But we only have a Jump from 0 to 2 and after
that the length grows maximaly by one). So we get the
following system
O(0,t) = [0(0,t-1) + 0(1,t-D] pQ
0(2,t-1) PQ
0(2,t) = [O(0,t-D + 0(1,t-1)] p2 +
0(2,t-1) p1 + 0(3,t-1) PQ
(17)
s+1
O(s,t) = I n(s-'.t-1)P
a+1_k
s = 3, ..., bl - 1
O(bl,t)
Equatations (17) are the same in the case where average
input rate is time dependent and where is not.
V. Switches with two input and one output link
This kind of switches we find at the last stage.
The input links come from the previous stage and the
output links are connected to memory modules. From each
switch we can reach only one memory module. The situation
is almost the same as described in the section IV. The
difference is only that we have Just one buffer so all
requests go in the same buffer. So we have to change the
coefficients pB defined by (12) and (11).
VI. Summary and conclusion
In the Table 1 probabilities on number of requests In
the buffer for one switch on the first stage with
different average input rates are given.
average rate
0.5 1.5 1.9
0
1
2
3
5
6
7
8
9
10
0.75000
0.21302
0.03277
0.00385
0.O0037
0.00001
0.00000
0.00000
0.00000
0.00000
0.00000
0.50000
0.32136
0.12260
0.03779
0.01091
0.00311
0.00088
0.00025
0.00007
0.00002
0.00001
0.25085
0.28020
0.19189
0.11706
0.06787
0.03915
0.02258
0.01302
0.00751
0.00133
0.00250
0.07687
0.12189
0.12635
0.11699
0.10591
0.09569
0.08613
0.07807
0.07052
0.06370
0.05753
After 1 cycles the first request can come to the switch
Table 1
average rate
s 0.5 1.5 1.9
0
1
2
3
1
5
6
7
8
9
10
O.75OOO
0.22959
0.01999
0.00011
0.00001
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.50000
0.38889
0.09876
0.01097
0.00122
0.00011
0.00002
0.00000
0.00000
0.00000
0.00000
0.25086
0.39017
0.23001
0.08251
0.08251
0.02960
0.01062
0.00381
O.OC137
0.00019
0.00006
0.08057
0.19731
0.20122
0.15007
0.11028
0.08101
0.05955
0.01376
0.03216
0.02363
0.01737
Table 2
In the Table 3 the probabilities for the switches on the
last stage are given. All requests are generated with
equal average input rate.
average rate
0.5 1.5 1.9
0
1
2
3
1
5
6
7
8
9
10
0.50000
0.38889
0.09877
0.01097
0.00122
0.00011
0.00002
0.00000
0.00000
0.00000
0.00000
0.02500
0.07500
0.10000
0.10000
0.10000
0.10000
0.10000
0.10000
0.10000
0.10000
0.10000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00011
0.00125
0.01117
0.09958
0.88781
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00008
0.00897
0.99091
As we can see from the tables it is obvious that as
the input average rates grow to the maximum output rate,
which is 2, the buffers are more and more full, so also
grows the number of rejected requests.
Acknowledgement
This work was partly • supported by Iskra Delta
Computers.
References
[1] Rettberg R., Thomas R., Contention is no obstacle to
shared-memory multiprocessing, CACM, 29(1986), 12.
[2] Patel J.H., Performance Of Processor-Memory
Interconnection for Multiprocessors, IEEE Trans, on
Comp., Vol. C-30 (1981), 10.
[3]Kruskal-C.P., Snir M., The Performance Of Multistage
Interconnection Networks for Multiprocessors, IEEE Trans,
on Comp., Vol. C-32 (1983), 12.
[4]Dias D.M., Jump J.R., Analysis and Simulation of
Buffered Delta Networks, IEEE Trans, on Comp., Vol. C-30
(1981), 1.
[5]Thanawastien S., Nelson V.P., Interference Analysis
of Shuffle/Exchange Networks, IEEE Trans, on Coop., Vol.
C- 30(1981), 8.
[6]BraJak P., Designing a reconfigurable intelegent
memory module (RIMM) for Performance Enhancement to Large
Scale, General Purpose Parallel Processor, Informatica
11(1987), 1.
[7]Gelenbe E., Mltranl I. Analysis and Synthesis of
Computer Systems, Academic Press, 1980 London.
Table 3