A SELECTED SURVEY OF PARALLEL COMPUTER SYSTEMS INFORMATICA 4/87 UDK 681.3.02 Sasa Presern Iskra Delta and Jozef Stefan Institute, Ljubljana ABSTRACT. This paper is a selected survey o-f parallel computer systems. A classification of parallel computers is given and some most attractive architectures are discussed. Special attention is paid to massively parallel processors. The organization and interconnection structure of multiprocessor systems is given. By analyseing a trend of research in parallel computer systems over last 10 years some predictions are given about individual •features which will probably have great influence on future parallel computer systems. An extensive survey of references in parallel computer systems is given. IZBOR IN PREGLED PARALELNIH RACUNALNISKIH SISTEMOV. Clanek podaja izbor in pregled paralelnih racunalniskih sistemov. Narejena je klasifikacija paralelnih racunalnikov in opis nekaterih najbolj aanimivih arhitektur. Podajna je organizacija multiprocesorjev in opisane so razlidne povezovalne strukture med procesorji ter pomnilniki v posameznih sistemih. Analiza trenda raziskav paralelnih racunalniskih sistemov v zadnjem desetletju omogoda izlocitev posameznih znacilnosti, ki bodo predvidoma mocno vplivale na razvoj bodocih paralelnih racunalniskih sistemov. V bibliografiji je prilaien obsiren pregled referenc za paralelne racunalniske sisteme. 1. INTRODUCTION - EVERYBODY MAKES IT PARALLEL A few years ago all high developed counties in the world have started projects in developing a parallel computer system. All these projects were financially supported by governments. Many companies and research institutes also started research projects on parallel systems. The falling price of microcomputers and VLSI facilities on universities has encouraged many universities to design and to build parallel computer architectures based on linking many microprocessors or specially designed VLSI chips together to work on one job. Development of a parallel computer is an extremely difficult task which includes: - development a new concept of parallel computer architecture, - design of an operating system that supports parallel architecture, - transformation of traditional sequential application programs to parallel programs either by preprocessor or by a parallel programming language. Me see that by switching from SISD (single instruction single data) machines to MIMD (multiple instruction multiple data) machines one can not simply upgrade an existing SISD computer system but one is faced with problems which are conceptually new. Research and development of a parallel computer system requires a very strong research which often includesi - more than 100 specialists, - a bill an dollar financial support, - research and development phase which several years. lasts Government financial support is only a fraction of the whole finances which are devoted to projects in parallel computing. Strategy makers in most companies are familiar with market research studies which predict that parallel processing machines will take about SO percent of the market in high-performance computers by 1990. 2. CLASSIFICATION OF PARALLEL SYSTEMS Parallel computers are usually divided Into three architectural configurations! - SIMD pipelined computers « early vector processors, • attached processors, • recent vector processors, • other vector processors, - SIMD array processors, - MIMD parallel processors, * massively parallel processors, * small scale parallel systems. Another grouping in possible as for example classification according to distribution of local and global memory into tightly and loosely coupled parallel systems or classification according to application possibilities into general purpose or special 41 purpose computers. Many existing computers are now using several parallel approaches. Parallelism in pipeline computers is per-formed by overlapping computations and im there-fore temporal parallelism. Parallelism in array processors is per-formed by multiple synchronized ALUs and is there-fore spatial parallelism. Parallelism in multiprocessor systems is performed by a set o-f processors with shared resources which work in asynchronous mode. The list of projects in parallel computing is getting longer every day. By comparing the architectural approach in di-f-ferent projects we see that the computer scene in parallel computer systems is particularly varied. It is difficult to classify parallel computers, but helpful in order to concentrate on similarities and differences between the computer architectures. Because parallel computers are using several different architectural principles one might argue a proposed classification. Some of described computer are "paper machines" that have been studied theoretically and by simulation, but have not been build. Many of this projects were funded by government agencies, but some of them are industry projects (IBM, Burroughs, CDC,...). There follows an alphabetic list of the parallel computer systems or projects, each with the name of the chief architect and host institution. A list of references dealing with each project is also given. The most interesting architectures are briefly described. The list of parallel computers is grouped according to upper classification. SIMD PIPELINE COMPUTERS EARLY VECTOR PROCESSORS BVM (Boolean Vector Machine), Robert A. Wagner, Duke University, North Carolina. This is a collection of 1-bit processing elements connected as a hypercube with rings at each corner, using the Cube-Connected-Cycle topology. STAR-100, Control Data Corporation . The design of Star started in 1965 and was delivered in 1973. This is a processor with two nonhomogeneous arithmetic pipelines. (HWA85, LINB2, PUR74). TI A8C, (Ttxai Instruments Advanced Scientific Computer>, Texas Instrument. This machine uses 1 to 4 homogeneous pipelines and was delivered in 1972. (HWA85, KOGB1). ATTACHED PIPELINE PROCESSORS (CSP Inc. BILLERICA, CSPI MAXIM/64, Massachusetts). Maxim/64 in a minimal configuration includes a 16 slot chassis, a 64-bit floating point array processor, 16 Mbytes of data memory and Micro VAX-II CPU. The machine is designed for research, scientific and engineering users and costs about •170.006". (NANB6) . FP8-AP120, Floating point systems, Beaverton, Oregon, USA. This company produces also a new version of attached pipeline processors FPS-164 and F.PS- 264 which is used in configuration named LCAP (Losely coupled array of processors). More than 1500 machines had been sold and were used mostly for signal processing. They are quite cost effective in comparison to Cray or Cyber computers. (H0CB1, HWAB5, WILB2). IBM 3838 IBM 3838 is a multiple pipeline scientific processor specially designed to attach to IBM mainframes, like the System/370, for enhancing the vector-processing capability of the host machine. It is microprogrammed pipeline processor which can be supplied with custom- ordered instruction sets for specific vector applications. RECENT VECTOR PROCESSORS Cray-1 Cray Research Inc., Chippewa Falls, Wisconsin, USA. This is the first successful vector computer. More than 40 computers have been sold and installed, first in 1976. It comprises 12 special-purpose pipelines for the different arithmetic operations. It is very expensive. (HWA85, J0R82, RUS78). An upgrade of this computer is Cray-2, (H0L8S). Cybar-203 This computer is an example of pipelined architecture and is highly competitive with the CRAY-1. It is based on CDC STAR 100. It is based on one, two or four pipelined general- purpose units working always to and from main memory. It is an expensive machine, designed initially to weapons' calculations and weather simulation. (H0C81, HWA85, V0N84). CDC/NA8F Control Data Corporation Numerical Aerodynamic Simulation Facility. This is a supercomputer to be used in 1990s for aerospace vehicle or superjet designs. The speed requirements was set to be at lest 1000 Mflops and the purpose is to calculate the viscous Navier-Stokes fluid equations for three dimensional modeling of the wind tunnel experiments. (HWA85, H0C81). VP-200, Fujitsu. This system has a scalar and a vector processor which can operate concurrently and it can be used . as a loosely coupled back—end system. (HWA85, LLU84, UCHB5). OTHER VECTOR PROCESSORS Ahmdal 1200 This computer is a European version of Fujitsu's racent vector processor VP-200. Similar version o-f VP-100 is known in Europe as Ahmdal 1100 computer. (K0C85). Siemens VP200 This is another European version of Fujitsu's vector processor VP-200. Fujitsu's VP-100 is as Siemens product known as Siemens VP100. (K0C85). YH1 This is Chine's first supercomputer, known also as "Balaxy". The development started in 1978 at the University of Defense Science and Technology in Changsa. The machine looks like a Cray computer. (NEW85/1). 42 SIMD ARRAY COMPUTERS MIMD PARALLEL PROCESSORS BSP, (Burroughs Sci.nti-fic Processor), Burroughs (H0C81, HWA85, KUC82). BSP has been largely based on the experiences that Burroughs have gained as major contractors on the ILLIAC IV project. The design principles o-f the BSP were to provide a machine using a standard technology, which would b. programmed in a high level language and sustain a continuous 20-40 M-flops/s. ICL QAP and controlled by a single instruction stream processed in a central control unit. SS 57 51 59 60 61 62 63 Fig 1.: The connectivity between 64 processing elements in ILLIAC IV (HWAB5). MPP, (Masmiv.ly Parallel Processor) This processor was developed for processing satellite imagery at the NASA Goddard Space Flight Center and has 128x128=16.384 microprocessors that can be used in paraUe . Each processor is associated with a 1024-bit RAM. 160 180 200 220 2(0 260 PROCESSORS MATRIX MULTIPLICATION IMA6E CONVOLUTION THEORETICAL SPEEOUP Fig. 22.1 Linear spaedup in a multistage switched network with 256 processors for Butter-fly parallel system. 4. APPLICATION OF PARALLEL SYSTEMS Uniprocessor architectures are approaching theoretical limits in processing speed. In high speed or real time processing tightly coupled computer systems have to be used. Most parallel computer nowadays are designed -for numerical work with floating point numbers, and arm build for the solution of large problems in physics, chemistry and engineering. Large computer capabilities are particulary in: - complex graphic Images, - structural analysis, - aerodynamics, - meteorology, - medical diagnostics, - research in an oil exploration, - research in fusion physics, - industrial automatization, - processing of sensing signals, - genetic engineering, - molecular dynamics, - quantum mechanical problems, - socioeconomic models, etc. necessary Mathematical problems parallel systems aret which are solved by - Monte Carlo simulation, - Hartree-Fock equation in the electron gas, - finite element methods, etc. Many of multiprocessors purpose as far examplei are almost general ALLIANT, BUTTERFLY, CEDAR, C.mmp, Cm«, CONVEX, COSMIC CUBE, CRAY X-MP, CRAY-3, CYBERPLUS, DCA, DPP, E6PA, ELXSI 6400, FLEX/32, FMP, HEP, IBM RP3, IBM LCAP, IBM BFll, MINERVA, ONERA, PLURIBUS, PRINBLE, SUPERNUM, S-l, TRAC, ULTRA. These multiprocessors are designed for large scientific and engineering problems, for CAD automation, real time .voice data multiplexing and other computationaly involved problems. Some multiprocessors are more limited In applications and are considered as special purpose parallel computers designed for one bit logic operations, or image processing, knowledge based expert systems or designed for other special applications in artificial Intelligence. Special purpose multiprocessors aret CHIP, DADO, FEM, MANIP, MEIKO, PASM, PUMPS, VFPP. 5. WILL PARALLEL PROCESSING WIN? Many ambitious projects in parallel processing in past have failed. For example ILLIAC IV cost four times the original contract figure and did not come even within a factor of 10 of its originally proposed performance. However its influence was profound and ILLIAC IV was the first to to pioneer the new and faster emittei— coupled logic rather than the established transistoi—transistor logic . ILLIAC IV also pioneered the use of 15-1ayer circuit boards and computer aided layout methods. Other parallel computer systems of the 70' were also not very successful 1. For example C.mmp and Cm* had problems with hot memories because their inteconnections structure, which was based on crossbar switch was not intelligent and could not overcome this problem. Nowadays these solutions are given. BSP NASF had a bottleneck in a central control processor and no efficient synchronization mechanisms were known at that time. But again ICL DAP was pioneering in an important feature of engineering design that processing element logic is mounted on the same printed circuit board as the memory to which it belongs. VLSI technology can now include processing element and its memory on the same chip. Now the technology has advanced sufficiently to make parallel architecture practicable. Therefore we see such great interest in parallel processing. 6. CONCLUSION It is not possible to predict which of these varied computer architectures will prove the most successful in future on the market. By analyzing the performance of multiprocessors which is primarly dependent on interconnection structure one might get an Insight to the development of parallel computer systems and try to predict future trends in parallel computing. It seems that in next decade the most influence on parallel computing are going to have the projects in massively parallel processori - NYU Ultracomputer, whose principles are applied in IBM-RP3 parallel system, - Butterfly, produced by the company BBN Bolt, Beranek & Newman, - Cedar, a multiprocessor supercomputer of the University of Illinois. 51 •—< [g I ! I, — ] ) 1 : (a) (b) Fig. I8.1 A mesh network with 16 processing element connected in a lattice (a) and connected as a torus (b) . Cube networks have either hypercube architecture or cube-connected-cycles network architecture. A cube-connected-cycles network is a cube where each node of the hypercube is replaced by a ring (or cycle) of processing elements. In a d—dimensional binary hypercube there are d connections to each processing element, n=2, and therefore the number of processors equals N=2 exp d. We see that the number of processing elements can not be increased without also increasing the number of connections to each processing element. For example, a six—dimensional hypercube which has 64 nodes is topologically the same as 4x4x4 three dimensional mesh with triply periodic boundary conditions. the switch lattice configured as a mesh the switch lattice con-figured as a binary tree ooooooooo Root o o o o o o Fig. 19.1 A cube network with 4, processing elements. 8 and 16 Hierarchical class o-f multiprocessor systems is realized as tree network, hierarchy o-f pyramides or clusters o-f clusters. U) (b) Fig. 20.: A hierarchical network realized as a tree network. Fig. 21.1 The original switch lattice in CHiP parallel computer con-figured as a mesh and as a binary tree. Let us compare three most widely used topologies! common bus, crossbar and multistage network. Seven -features are going to be comparedi 1 - cost 2 - complexity 3 - max. throughput 4 - interconnect bandwidth 3 - # o-f signal paths 6 - efficiency 7 - max. # of CPU Me see (table 1.) that cost of a parallel . system is the lowest in a common bus topology, but efficiency drops with increasing the number of processors. A crossbar switch is very powerful in connecting a few processors, but the price and complexity of a system is very high. A multistage network is a good topology to interconnect a large number of processors for a medium cost. Reconfigurable networks include all cases in which the interconnection pattern between processing elements can be changed. This is usually achieved by interspersing switching elements between the processing elements which may be controlled by a user program. the original lattice architecture FEATURE BUS CROSSBAR MULTISTAGE NET. 1 2 3 4 5 6 7 low high (CPU exp2) low high (CPU exp2) high no limit fixed by proportional cycle time to # of CPU large medium drops linear up to 30 up to 10 medium (n logn) medium high proportional to # of CPU medium linear up to 1000 Table l.i Comparison of different -features for a common bus, a crossbar and a multistage parallel computer system. New synchronization mechanisms for multistage switched networks are nowadays enabling almost linear speedup for systems with up to 256 processors (fig. 22). 52 The reasons for success of these projects seem to be the fact that they are devoted to development of a GENERAL PURPOSE MIMD parallel computer. Excellent performance results are reached particularly because they usei - MULTISTAGE INTERCONNECTION NETWORKi A near linear speedup is reached by 256 processors using as interconnection structure between the memories and processor a multistage interconnection network (Omega network); - INTERLEAVING! that is spread data uniformaly throughout common memory modules in order to avoid contention for any one memory module; - FETCH-AND-ADD: a very effective interprocessor synchronization operation. An operating system seem to be a parallel version of a UNIX-like operating system. It is possible to achieve high performance by connection a large number of processing elements, even with "of the shelf" standard processors. It seems also that some architectural features as for" example the size of local memory or the size of cache memory at every processor is of secondary importance for high performance of a parallel system. 7. REFERENCES (ABU84)Abu-Sufah W., A. Kwok, Performance Prediction Tools for Cedar: a Multiprocessor Supercomputer, IEEE Conf. on Comp. Architecture, 1984, p.406-413 (ABU86) Abu-Sufah W., H. Husmann, D. Kuck, On I/O Speedup in Tightly Coupled Multiprocessor, IEEE Trans, on Computers, June 1986, p. 520-530 (ADE85) Adelatado M. , D. Comte, P. Siron, Ph. Berger, Expression of Concurency and Parallelism in an MIMD environment, Computer Physics Commentars 37, 1985, p. 63-67, North Holland (BAR68) Barnes G. at al., The Illiac IV Computer IEEE Trans. on Comp., August 68, p. 746-756 (BAT77) Batcher K. , The multidimensional Access Memory in STARAN, IEEE Trans. on Comp., 1977, p. 174-177 (BAT80) Batcher K. E., Design of a Massively Parallel Processor, IEEE Trans on Comp., Sept. 80, p. 836-844 (BAT82) Batcher K. E., Bit Serial Parallel Processing System, IEEE Trans. on Comp., May 82, p. 377-384 (BEEB5) Beetem John, et al., The GF11 Supercomputer, IEEE, pp 1O8-11S, 198S. (B0U72) Bouknight at al., The Illiac IV System Proc. IEEE, April 1972, p. 369-388 (CLE84) Clementi E. at al. Parallelism in Computations in Quantum and Statistical Mechanics, Proceeding on 2nd International Conf. on Vector and Parallel processors in Comp. Sci., Oxford, August 84, p.287-294 (CHA86) Chamberlain Richard, Experiances with the Intel iPSC hypercube, Supercomputer, p 24- 29, 1986. (DAV69) Davis R., The Illiac IV Processing Element, IEEE Trans on Comp., Sept. 69, p. 800- 816 (EDL85) Edler J., A. Gottlieb at.all, Issues Related to MIMD Shared-memory Computers: the NYU Ultracomputer Approach, IEEE Conf. on Comp. Architecture, 1985, p. 126-135 (EMMB5) Emmen ad, Intel's iPSCi a family of parallel computers based on microprocessors, SUPERCOMPUTER News, May 1985 (EMM86/1) Emmen Ad, Hypercube-toy or tool?, SUPERCOMPUTER News, July/September 1986 (EMM86/2) Emmen Ad, Vector extension for the iPSC, SUPERCOMPUTER News, July/September 1986 (ERH86) Erhel J., Parallel programming and applications on Cray X-MP, Supercomputer, Sept. 86, p. 53-60 (FIN77) Finnila, Charles A., H. Love, The Associative Linear Array Processor, IEEE Trans, on Comp., Feb. 77, p. 112-129 (GIL86) Biloi W. K., H. Muhlenbeim, Rationale and Concepts on the Supernum Supercomputer Architecture, MIPRO 86, Opatija, 1st Jugoslav Conf. on New Generation of Computers, p. 3.1- 3. 17 (G0T82) Gottlieb A. at al., The NYU Ultracompu- ter - Designing a MIMD Shared Memory Parallel Computer, IEEE Conf. on Comp. Architecture, 1982, p. 27-42 (G0T83) Gottlieb A. at al., The NYU ULTRA com- putei—Designing on MIMD shared Memory parallel Computer, IEEE Trans, on Comp., 1983 • (HAN85) Handler W. at al., A tightly coupled and hierarchial Multiprocessor architecture, Computer Physics Comm 37, 1985, p. 87-93 (HARB6) Hars N., New Systems offer near- supercomputer performance, IEEE, March 86, p. 104-107 (H0C81) Hockney R.W. and C. R. Jesshope, Parallel Computers, Adam Hilger Ltd, Bristol, p. 126-143, 1981 (H0LB5/1) Hollenberg Jaap The Cray-2 computer •system SUPERCOMPUTER 8/9 , September 1985 (H0L85/2) Hollenberg J., The Butterfly Parallel Processor Computer System, Supercomputer, Sept. 85, p. 23-27 (H0L85/3) Hollenberg J., The C-li A Minisuper Supercomputer, March 85, p. 7-8 (HOPB6) Hoppe H. C., H. Muhlenbein, Parallel adaptive ful1-multigrad methods on message- based multiprocessor, Parallel Computing, Oct. 86, p. 269-289 (HWAB5) Hwang K. and F. Briggs Computer Architecture and Parallel Processing, McGraw- Hill Book Company, p. 237-241, 1985. (JEN81) Jenevein R., D. Degroot, G. Lipovski, A Hardware Support Mechanism for Scheduling Re- sources in Parallel Machine Environment, IEEE Conf. on Comp. Architecture, 1981, p. 57-65 (JEN82) Jenevein R.,J.Brown, A Control Proces- sor for a Reconfigurable Array Computer, IEEE Conf. on Comp. Architecture, 1982, p.81-B9 (J0N80) Jones A., P Schwarz, Experience Using Multiprocessor Systemsi A Status Report, ACM Computing Surveys, June B0, p.121-167 (J0R82) Jordan T. L. A Buide to Parallel Compu- tation and Some Cray—1 Experiences Parallel 53 Computations AP, 1982 (KAPB4) Kapauan A., J. Field, D. Gannon, L. Snyder, The PRIN0LE Parallel Computer, IEEE Conf . on Comp. Architecture, 1984, p. 12-20 (KAR82) Kartashev S., S. Kartashev, Designing and programming modern Computers and Systems, vol. 1, chapter II, Prentice-Hall, p. 143-154, 1982. (K0C85) Koch Wilhelm , First European installa- tion of Siemens VP-200, SUPERCOMPUTER 7, May 1983. (K0G81) Kogge Peter M. The Architecture of Pipelined Computers, McGraw-Hill Bool Company, p. 159-162, 1981. (KRA87) Kramer 0. and Muhlenbein H., Mapping Strategies in Message Based Multiprocessor Systems (to be published). (KUC82) Kuck David J. and Richard A. Stokes The Burroughs Scientific Processor (BSP) IEEE TRAN- SACTIONS ON COMPUTERS vol. C-31, No. 5, May 1982 (LECB6) Leca P., The ONERA experimental MIMD system, Supercomputer, Sept. 86, p.91-96 (LINB2) Lincoln Neil R. Technology and Design Tradeoffs in the Creation of a Modern Supercom- puter IEEE TRANSACTIONS ON COMPUTERS vol. C-31, No. 5, May 1982 (LIN85) Linebock R., Parallel Processing! Why a Shakeout News, Electronics, Oct. 85, p. 32-34 (LIP77) Lipovski J., On a Varistructured Array of Microprocessors, IEEE Trans. on Computers, Feb. 1977, p 125-138 (LLU84) Llurba Rossend ,VP-200: Fujitsu's Supe- rcomputer, SUPERCOMPUTER 2, July 1984 (LLU86) Llurba R., The Alliant FX/8 entry level supercomputer, SUPERCOMPUTER, March B6, p. 7-11 (MAN85) Manuel Tom, Parallel Machine Expands Indefinitely, Electronics Week, May 85, p. 49- 53 (MAS82) Mashburn Henry , The C.mmp/Hydra pro- ject i An Architectural Overview, Computer Structures: Reading and Examples, ed D. Siewo- rek, Bell, Newell, p. 350 - 370, McGraw Hill, 19B2 (0ED86) Oed W., 0. Lang, Modeling, measurements and simulation of memory interference in. the Cray X-MP, Parallel Computing, Oct 86, 343-359 (0SL82) Oslund B., P. Hibbard, R. Whiteside, A Case Study in the Application of the Tightly Coupled Multiprocessor to Scientific Compu- tation, Parallel Computations, ed G. Rodrigue, p. 315-364, Academic Press, 1982 (PFI86) Pfister G. F., Parallel processor project to link 512 32-bit micros, IEEE Computer, Jan. 86, p 98-99 (PRE82) Premkumar U., J. Browne, Resource Allo- cation in Rectangular SW Banyans, IEEE Conf. on Comp. Architecture, 1982, p. 326-333 (PUR74) Pursell Charles J. The control STAR-100- Performance measurements NCC 74 data (RUD72) Rudolph J., A production implementation of an associative array processor- STARAN, Fall Joint Computer Conference, 1972, p. 229-241 (RUS78) Russell Richard M. The CRAY-1 Computer System Comm. ACM, vol. 21, No. 1, January 1978 (SCH80) Schwartz T., Ultracomputer, ACM Trans, on Programming Languages and Systems, Oct. 1980, p. 484-321 (SCH86) Schwederski Thomas and Siegel Howard Jay, Adaptable Software for Supercomputers, IEEE, Computers, pp.40-48, February 1986. (SEJ80) Sejnowski et al., Overview of the Texas Reconfigurable Array Computer, AFIPS National Computer Conference, 198O, p. 631-642 (SEIB5) Seitz Charles L. , The Cosmic Cube, Communications of the ACM vol.28. No.1, January 1985 (SIE81) Siegel H., PASMi A Partitionable SIM- D/MIMD System for Image Processing and Pattern Recognition, IEEE Trans on Computers, Dec. 1981, p. 934-947 (SIE86) Sieworek D., New Trends in Comp. chitecture, MIPRO Conference, Opatija 1986 Ai— (8IP84) Sips H., The DPPB1- an exercise in parallel processing, Supercomputer , Nov. 84, p. 31-37 (SNE85) Snelling D., HEP Applicationsi real time flight Simulation, Computer Physics Comm. 37, 1985, p.261-271 (SNY81/1) Snyder L., Programming Processor Interconnection Structures, Technical Report CDS-TR-381, Perdue University, 1981 (SNY81/2) Snyder L., Introduction to the Configurable, Highly Parallel Computer, IEEE Computer, Jan. 1981, p. 47-56 (SWA77) Swan, Fuller, Sieworek, Cm* - A Modu- lar Multi-microprocessor, AFIPS National Compu- ter Conference, 1977, p637-644 Uchida Keiichiro and Mikio Itoh, High Speed Vector Processor in Japan, Computer Phy- sics Communications 37 (1985) 7-13, NH Amster- dam (V0N84) Vons Peter Cyber 205 vector-features used by vectorizers SUPERCOMPUTER 3, September 1984 (WILB2) Wilson Kenneth 6. Experiences with a Floating Point Systems Array ProcessorParallei Computation AP, 1982 (YAW77) Yaw S. S., H. S. Fung, Associative Processor Architecture- A Survey, ACM Computing Surveys, March 77, p. 3-27 (ZS086) Zsohar Leslie et all., Bus Hierarchy Facilitates Parallel Processing in 32-bit Multicomputer, Computer Technology Review, summer 1986, pp 51-59. (NEW85) News, China's first SUPERCOMPUTER 6, March 1985 supercomputer,