SHIFTER DESIGNS FOR ASICs Dalibor Grgec* and Željko Butkovič Faculty of Electrica! Engineering and Computing University of Zagreb, Croatia Keywords: computer science, shifters, phase shifters, logarithmic shifters, data buses, data paths, arithmetic operations, decoders, ALU, Arithmetic-Logic Unit, FPU, Fast Processing Units, ASIC, Application Specific Integrated Circuits, VLSI circuits. Very Large Scale of Integration circuits, CAD, Computer Aided Design, logical simulations, electrical simulations, MAGIC computer tools, IRSIM computer tools, SPICE computer tools. Simulation Program with Integrated Circuit Emphasis computer tools, transmission gates, restoring buffers, propagation delay power dissipation Abstract: This paper presents four versions of 32 bit shifter designs that can be used in ASICs, namely: barrel shifter and logarithmic shifter, each implemented with pass transistors and transmission gates. The circuits are designed in the standard MOSIS scalable CMOS n-well technology with the 0.8 |,im minimal feature size fabrication process. The design procedure is thoroughly explained. The designs are logically and electrically simulated. They are compared according to the functionality, wafer area, number of transistors, delay and power dissipation. The usage and optimization guidelines are given. Načrtovanje premikalnih podsklopov za Integrirana vezja po naročilu Ključne besede: računalništvo, pomikalniki, pomikalniki fazni, pomikalniki logaritmični, vodila podatkovna, operacije aritmetične, dekoderji, ALU enota aritmetično-iogična, FPU enote obdelave hitre, ASIC vezja integrirana za aplikacije specifične, VLSI vezja integracije zelo visoke stopnje, CAD snovanje računalniško podprto, simulacije logične, simulacije električne, MAGIC orodja računalniška, IRSIM orodja računalniška, SPICE orodja računalniška, vrata prenosna, bufferji obnovitveni, zakasnitev razširjanja, stresanje moči Izvleček : V prispevku predstavljamo štiri tipe 32-bitnih pomikalnih vezij, ki jih lahko uporabimo pri načrtovanju integriranih vezij po naročilu. To sta matrični in logaritemski premikalni vezji, pri čemer je vsako lahko izvedeno s prehodnimi tranzistorji, oz. prenosnimi vrati. Vezja so načrtovana v standarni MOSIS CMOS tehnologiji z n otokom in minimalnimi risanimi dimenzijami 0.8 jim. Podrobno smo razložili postopek načrtovanja. Opravili smo logično in električno simulacijo vezij ter jih med seboj primerjali glede na funkcionalnost, površino čipa, število tranzistorjev, hitrost in porabo. Podali smo napotke za uporabo in optimizacijo vezij. natorial delay and it is usually shorter than one machine cycle. The dominant shifter implementation in modern datapaths is the flow-through type. The flow-through shifters can further be classified into /4,6/: binary shifter, crossbar switch, barrel shifter, logarithmic shifter, other shifter implementations. The common requirements set upon shifter implementations in modern datapaths are: 1. INTRODUCTION Shifting of binary numbers is an arithmetic operation required in many operations such as multiplication, division and bit-manipulation. Shifting is performed in specially designed circuits. Shifters are part of every contemporary datapath, usually located at the output of the Arithmetic-Logical Unit (ALU). Shift operations can be classified into left-right, logical, arithmetical or circular shift (rotating). Usually shifting is implemented only in one direction, i.e., right. Shifting left by m bits is realized with a shift right of n~ m bits in an n-bit machine IM. During the logical shift the LSB takes the value of a predefined input (usually 0/1 or bit-stream from an outer source). Arithmetical shift is a shift operation where the MSB, which represents the sign of the binary number, is preserved. Circular shift puts the LSB in the place of MSB and vice-versa. All sorts of shift operations are required in modern processing units 12, 31. According to implementation, shifters can be classified into shift-register (sequential logic) and flow-through (combinatorial logic) types. In the shift-register type, shift by one bit requires at least one machine cycle. In the flow-through type, the time required for shifting depends only on the circuit combi- ■ Now with Institute for Theoretical Electrical Engineering and Microelectronics, University of Bremen, Germany 1. 2. 3. 4. 5. 6. 7. bus width of 32 or 64 bits {n bits in general), performing nxn shift in one clock cycle, performing many types of shifter operations according to control signals (left-right, logical, arithmetic or circular shift, masking, etc.) separate control signals, usually perpendicular to direction of data, coded control signals, low propagation delay and no degradation of output signal electrical characteristics, compatibility with the rest of the datapath. Barrel and logarithmic shifter implementations satisfy these requirements best, and thus are two most frequently used shifters /4, 5, 6/. Binary shifter only performs a one-place left-right shift, and crossbar switch is a universal circuit and a basis for a barrel shifter design. In this paper, designs of barrel and logarithmic shifter in most common logic styles and comparisons of their performance are presented. The general block schematic of the designed shifters is shown in Fig. 1. Fig. 1. General shifter block. The design requirements set upon shifter designs are as noted above except for the requirement no. 3. For the purpose of comparison only the circular right shift as the most common shift operation is implemented. Other shift operations can be easily implemented by adding some peripheral circuitry /2, 3/ and by modifying control signals. Furthermore, the layouts of the shifters are designed in standard MOSIS /7/ scalable CMOS (SCMOS) n-well technology using 2 metallization layers, in such a way that they follow both the standard SCMOS, as well as submicron SCMOS (SCMOS_SUBM) set of design rules. They are, therefore, without any modification of the circuit or layout design, realizable in a whole range of different fabrication processes offered by MOSIS Service, starting with the 2 |.im minimal feature process down to the more recent 0.35 (im minimal feature fabrication process. Given the above mentioned requirements, the shifters are designed with the goal of a minimal layout area on the chip. Circuit and layout designs are modified and optimized in order to satisfy these requirements and the functionality of the shifters or, in some cases, to lower the circuit power dissipation to the acceptable level. No attempts to optimize the circuits according to other criteria (e.g. delay, current drive, etc.) are undertaken. This is to insure the impartiality of the shifter performance comparison. Next, a detailed overview of the shifter circuit and layout design is given. The shifter performance is evaluated through simulations, which are described in the following section. At the end of the paper, the performance of shifter designs and some usage and optimization guidelines are presented. 2. Shifter design 2.1. Circuit design Barrel shifters All versions of barrel shifters are based on the crossbar switch. A version of crossbar switch with pass transistors is shown in Fig. 2. nlO il -ur iLf C, oll II "ixf TXf c TTz: iii "Uf 9l(n-l) _oO(!vl) 11 TiT c. -oOl -oOO Fig. 2. Crossbar switch with pass transistors. The crossbar switch can in theory emulate any multi-input multi-output logical function. By careful design of wiring one can readily obtain the main field of the barrel shifter shown in Fig. 3a. Control inputs to the pass transistors are designated Si and only one of them is high at a time, defining the shift value. If we replace the pass transistors in the circuit in Fig. 3a with transmission gates, we obtain the main field of the barrel shifter with transmission gates shown in Fig. 3b. As usual in circuits with transmission gates, this shifter requires both inverted and non-inverted control signals designated Si' and Si, respectively. By examination of both schematics, one can easily deduce that they perform the required function: circular right shift. Sil Si2 Si(ii-l) SiO l(n-l> O(n-l) © © © Fig. 3a. Main field of the barrel shifter with pass transistors I(n-1> O(n-l) II«»-©: Fig. 3b. Main field of the barrel shifter: witln transmission gates. In order to comply with the requirements set upon the shifter designs, and to have coded control input signals, a decoder must be added to the circuit. In order to minimize the layout area, a NOR decoder in dynamic logic with p-type precharge transistors is chosen /4, 6/. The schematic of this decoder is shown in Fig. 4. J 1 1 —1 — — P "t:- 7 r 7 r7 I 1 I . Fig. 4. NOR decoder in dynamic iogic (4-bit exampie). During the precharge phase the clock 0 is low, and during the evaluation phase the clock is high. Outputs of the decoder are control inputs S/^„.„ „ to the barrel shifter main field. In order to ensure the correct logic value of control inputs S/ and S/' during the precharge phase /4/, interface clocked buffers, specially devised for this pur- pose, are used. The circuitry is shown in Fig. 5. The buffer exists in two versions: inverting and non-inverting, depending on the used output. Both buffer versions are used in the decoder for the barrel shifter with transmission gates and can be seen in Fig 4. Only the non-inverting buffer version is used in decoder for the barrel shifter with pass transistors. Both decoders require an inverted clock (j)'. It should be noted that the shifters are fully functional even without the interface buffers /8/, but the power dissipation is too large, due to the current leakage during the evaluation phase. cli (1,A') Symbol + A elk -o (0,A) GND Fig. 5. interface ciocl|Ü , m a Nšlpil 9 fr^Sil m '' / C iti: " "ÄSIbI 'if sm mis äs « ^ I / # ft 12b. Barrei shifters layouts: barrel shifter with transmission gates. 12a. Barrel shifters layouts: barrel shifter with pass transistors " t '' / '' ''''////'//''//' "/yy////' Fig. 11. The output level-restoring buffer layout. ..... lllfelifiiiililiiiiMS Fig. 13a. Logarithmic shifter basic ceiis iayouts: muitipiexer with pass transistors. Logarithmic shifters The layout of a logarithmic shifter consists of two parts: columns of multiplexers and switch field between the columns. The layouts of logarithmic shifter basic cells are shown in Fig. 13. The vertical buffers, which are twice the minimal size, are added in the path of control signals at the top of each multiplexer column. These buffers are shown together with multiplexer cells in Fig. 8. This is necessary due to the large capacitance of polysilicon multiplexer control lines. The switch field connects the output of the cell m in the stage /cwith the input of the cell (/77-2'') modulo 2" in the stage l<+^, and is typical of logarithmic shifters. It is made in two metal layers. The minimal distance between the vertical metal lines determines the size of the switch field. The width of the switch field increases exponentially with the stage shift value - 2'' /4/. The complete layouts of the logarithmic shifters are shown in Fig. 14. The difference in layout aspect ratio is also visible in this case. Fig. 13b. Logarithmic shifter basic cells layouts: multiplexer with stransmission gates. 10 Fig. 14a. Logarithmic shifters layouts: logarithmic shifter with pass transistors, Finally, the layout complexity of the designed circuits is presented in Table II by comparing: the number of transistors, linear dimension in units of X, and chip area for implementation in the HP CMOS26G 0.8 )am fabrication process. Table II. Layout parameters of designed circuits. Shifter circuit layout ; Number of Transistors Linear Size height widlli, ;:Chip Area, S ;; ^=!0i4 urn), inm^ nMOS piflos: Total Barrel 1 1364 180 1544 848 X 989 0.134 Barrel 2 : 1428 1204 2632 1522 X 894 0.218 l.ogat!thrhlp 1 522 362 884 645 X 1080 0.111 Logarithmio 2 522 522 1024 1635 X 829 0.217 ■ 1 corresponds to circuit version with nMOS pass transistors 2 corresponds to circuit version with transmission gates 3. SIMULATIONS In order to evaluate circuit performance, logical and electrical simulations are used. The simulations are performed with circuit models extracted from the layout by using MAGIC and additional tools that come with the software 191. All circuit models are extracted with the parameters of HP CMOS26G 0.8 j,im n-well fabrication process. For logic validation, debugging of design and initial electrical simulation, the event-driven electrical simulator IRSIM /11/ is used. It enables easy handling of wide data buses and logical states. The results obtained with this program include logical values, propagation delays and power dissipation 18/. However, due to the simple MOS transistor model based on resistance, this program could only give the approximate values of electrical parameters; the range of values and their relationships. Simulations in IRSIM showed that all designed shifters were fully functional at a chosen clock frequency of 50 MHz. The results of one typical simulation are shown in Fig. 15. IRSIM simulations determine the combinations of input and control signals for which the propagation delay is the largest. After initial simulations in IRSIM, additional electrical simulations for chosen input and control signal combinations are performed by SPICE /12/, in order to determine the circuit electrical parameters: voltage levels, propagation delay and power dissipation. The MOS transistors are modeled with the SPICE Level 3 MOSFET model obtained from MOSIS /10/. This model offers the advantage of faster convergence compared to the more sophisticated BSIM (Level 4 and 5) SPICE MOSFET models also available from MOSIS. The accuracy of the model is satisfying for the used minimal feature size of 0.8 ,um. Summarized results of SPICE simulations are given in Table III. Two types of propagation delays are determined from SPICE simulations: the delay from control input to output (S-^0, fjjjp) and delay from input to output (/^O, f^ J. The control input to output delay for barrel shifters, which have a decoder in dynamic logic, and require clock signals. Fig. 14b. Logarithmic shifters layouts: logarithmic shifter with transmission gates. Table III. Summarized results of SPICE simulations. Siniulateci Shifter ■Cirouitj; " K Pröpagatiöo detay, Average /R. Power jDissipation Riin^ ? Specific V; Power : ; Dissipation : Vy/ctTl^ ■ taso. tpio il-ib ;; I 'torn ' ; t4H;v: Barrel 1 7.1 1.7 1.1 6.4 4.8 Barrel 2 1.4 0.6 0.5 6.0 2.8 Logarithmic 1 s : 2.6 1.8 1.6 5.8 5.2 : Loyarithmio 2 2.4 1.2 1.1 5.2 2.4 is defined starting from the rising edge of clock signal (]), i.e., from the beginning of the circuit evaluation phase. Only the longest delay t^^^ is shown in Table III regardless of the transition direction. The input to output delays are evaluated for both high to low (HL) and low to high (LH) transitions of the output signal for various input and output signals. The longest delays are given separately for HL and LH transitions. Fig. 16 shows the simulated electric signals from a typical SPICE simulation. pgii^stl .gsj Sat Sep a? <53 08888880 Ž2222222 1^1111111 Fig. 15 Signal simulated by IRSIM (logic validation for barrel shifter with pass transistors). Note the difference in the propagation delays for HL and LH transitions. High to low transition time is always longer. This is caused with the lower output voltage of the nMOS pass transistor and lower current drive of pMOS transistor in the transmission gate. Both of these effects cause slower rising of the signal at the output of the pass logic (which is than inverted in the output buffer). The level-restoring buffer, shown in Fig. 6, can only partially decrease this problem. The difference in delay times is however smaller for the circuits with transmission gates, which are therefore more often used in present pass logic circuits /4, 51. The total power dissipation of the circuits is evaluated at the clock frequency of 50 MHz (and corresponding input signal frequency of 25 MHz, see Fig. 15) for selected signal pattern by simulations over a longer time period. The input and control signal patterns were chosen to achieve the maximum dynamic power dissipation by permanent change of logic states at every clock cycle at as many circuit nodes as possible. The final signal pattern is determined by test simulations. The calculated specific power dissipation, dissipated power to chip area ratio, is given in Table III. This parameter is an indicator of thermal flux in VLSI chips and limits the integration density as well as the clock frequency for the circuit implemented in the chosen fabrication process. -10--031 Barrel sliiflerl delay lest l-»0 1---- i 1 'r- V 1 1 1 1 i 1 1 1 1 1 1 1 1 1 ■" i 1 1 1 ; 1 i 1 1 1 30 35 Time (rrs) Fig. 16 Signals simulated by SPICE (propagation delay t^^^ for barrel shifter with pass transistors). 4. COMPARISONS AND CONCLUSIONS The comparison of designed circuits can be made according to many criteria: number of transistors, chip area, speed, power dissipation. All designed circuits offer the same functionality; they perform circular right shift at the clock frequency of at least 50 MHz and satisfy all the requirements listed in the introductory section. One must also note, according to delay times shown in Table ill, that the majority of the circuits could support significantly higher clock frequencies. The main design difference is the existence of the dynamic logic decoder in the barrel shifters, and therefore the requirement for independently generated non-overlapping clock pair(j) and (|)'in these shifters. This requirement must not be considered as a deficiency, since in the datapath where this shifter would be integrated, a clock would be present anyway. By using the decoder in dynamic logic, the chip area and transistor count are minimized. Dynamic logic circuits usually have a higher speed due to the lower capacitive load /4/. In the case where the decoded control signals are already available in the chip, one would implement only the barrel shifter's main field, which would lead to significant circuit simplification, decreased chip area and increased speed. The logarithmic shifters presented in this work are designed in combinatorial pass and static logic and do not require clock signals. When integrating these shifters in the datapath, one can add latches at the input and output of the shifter. Pipelined versions of logarithmic shifters with multiplexers in dynamic logic and with latches appeared in the literature, and are used mostly in high-performance chips /3/. The comparison of the main shifter parameters is given in Table IV. The comparison is performed according to three criteria: complexity, delay and power dissipation. According to Table IV, no shifter comes as a clean winner. The application of a particular shifter will depend on the requirements set upon the chip as a whole. The shifters with nMOS pass transistors have in general lower complexity. The logarithmic shifter has lower complexity than the barrel shifter in equivalent logic style. In particular, the logarithmic shifter with pass transistors has the lowest transistor count and smallest chip area. The lowest propagation delay is achieved in the barrel shifter with transmission gates. The logarithmic shifter with transmission gates has the smallest power dissipation. However the total power dissipated in other circuits is not substantially larger. The difference in the specific power dissipation is much larger and determined by the difference in the used chip area. The specific power is not a deciding design criterion except in some special cases (low-power electronics with reduced cooling possibilities). The barrel shifter with transmission gates, due to its high speed and low power demands, seems to be the best choice for implementation in general purpose ASIC designs if no limits to the chip area are set, In the applications where a circuit with low power dissipation is needed, the best choice is the logarithmic shifter with transmission gates. The shifters with pass transistors should be preferred if the chip area is limited. The barrel shifter has the disadvantage of larger power dissipation and, especially, substantially larger control signal delay ^DSO' Further improvements of the shifters are possible. For example, it is possible to enlarge or replace interface buffers or decoders. With none or small gain in chip area, one can also enlarge the otherwise minimal channel width of pass transistors. With such modification one could equalize the LH and HL propagation delays in shifters with transmission gates or lower the propagation delay in shifters with pass transistors. By adding additional peripheral and control circuitry one can enhance the shifter functionality to support more shift operations. Finally, it is worth noting that the shifter design procedures presented in this work can be used as application guidelines for integrated circuits in general. Acknowledgments This paper is supported by "Ministry for Science and Technology" the Republic of Croatia within the sci- Table IV. The comparison of shifter designs and performance. Designed Complexity Delay Power Shifter Circuit Tot. # Chip ■■ ■ toso ''■tpiii:/! Tot. Spec. Tran. Area LH Power Power Barrel 1 1.747 1.207 5.071 2.833 2.200 1.231 2,000 Barrel 2 2.977 1.964 1.000 1.000 1.000 1,154 1.167 Logarithmic 1 1.000 1.000 1.857 3.000 3.200 1,115 2,167 Logarithmic 2 1.158 1.955 1.714 2.000 2.200 1,000 1.000 Informacije MiDEM 31(2001)1, str. 10-20 entific project "036001 Research on VLSI/ULSI semiconductor structures". We are grateful to Dr. Adrijan Baric for many useful suggestions regarding the paper. REFERENCES /1/ /2/ /3/ /4/ /5/ /6/ /7/ /8/ /9/ /10/ /11/ R.S. Lim, "A Barrel Switch Design", Computer Design, August 1972, pp. 76-79 S.M. Kang, "Domino-CMOS Barrel Switch for 32-Bit VLSI Processors", IEEE Circuits and Devices Magazine, May 1987, pp. 3-8 R. Pereira, J.A. Michell, J.M. Solana, "Fully Pipelined TSPC Barrel Shifter for High-Speed Applications", IEEE Journal of Solid-State Circuits, Vol. 30, No. 6, June 1995, pp. 686-690 J.M, Rabaey, "Digital Integrated Circuits - a Design Perspective", Prentice Hall, 1996 N.H.E. Weste, K. Eshraghian, "Principles of CMOS VLSI Design", Addison Wesley, 1993 E.D. Fabricius, "Introduction to VLSI Design", McGraw-Hill, 1990 The MOSIS Service, "MOSIS Scalable CMOS (SCMOS) Design Rules, Revision 7.2,1996 D. Grgec, Ž. Butkovič, "Barrel Shifter Design and Simulations", Proceedings of MIPRO '99 Conference, Opatija, Croatia, pp. 62-65,1999 Digital Western Research Laboratory: "WRL Research Report 90/7 -1990 DECWRiyUvermore Magic Release", September 1990, "Magic Addendum: Version 6.5 differences", 1994 MOSIS Service Web Site: http://www.mosis.org University of California at Berkeley IRSIM ver. 9.4, "IRSIM User's Manual", 1993 /12/ MicroSim Corporation, "MicroSim PSpice AID & Basics-i- Circuit Analysis Software User's Guide", Version 6.3, April 1996 Dalibor Grgec Institute for Eiectromagnetic Theory and Microelectronics University of Bremen Kufstelner Str , Postfach 33 04 40 28334 Bremen, GERMANY Phone: +49 421 218 2204 Fax: +49 421 218 4434 E-mail: grgec@item.uni-bremen.de Željko Butkovič Department of Electronics, Microelectronics, Computer and Intelligent Systems Faculty of Electrical Engineering and Computing Unska 3, HR-10000 Zagreb, CROATIA Phone. +385 1 6129 924 Fax. +385 1 6129 653 E-mail: Zeljko.Butkovic@ferhr Prispelo (Arrived): 17.07.00 Sprejeto (Accepted): 22.11.00