TESTABILITY, A VITAL INGREDIENT FOR MCM TECHNOLOGY Nihal Sinnadurai Principal Consultant, TWI, Cambridge Keywords; microeiectronics, !C, integrated circuits, complex electronics, testability, ULSI circuits, ultra large scale integration circuits, MCM, multi-chip modules, products cost, test costs, BScan, boundarx scan, BIST, built-in self-test, impacts on quality, impacts on reliability, SM, surface mounting, KGD, know good dice, CAMELOT, computer aided measure of logic testability, ATPG, automated test pattern generation, LSSD, level sensitive scan design, ATG, automatic test generation, DLBI technology, die level burn-in technology, SiBiS, silicon burn-in substrates, JTAG, joint test action group, AFM inspection system Abstract: MGMs use bare-chip complex ICs which are wafer-probe tested and also sometimes hazardously temporarily packaged for full parametric test and burn-in depending on the specific known-good-die procedure, prior to assembly. Wafer-probe testing today can be developed hierarchically to provide parametric and diagnostic testing. The ideal solution is to achieve maximum fault coverage of every die and the assembled MCM circuit so that the total MCM technology is viable. Such a design policy must invoke IEEE P1149.1 embedded Boundary Scan or test access in all ICs and in the full MCM circuit. Thus enabling actual or virtual probing of all nodes and interconnects for circuit diagnostics and thus to know good dice and assembly. The addition of a little more internal "tester" logic provides built-in self-test (BIST), which simplifies testing during wafer probing and delivers more authentic information on "Known Good Dice". Indeed this takes wafer probing beyond the hierarchical and towards a more intelligent approach, for instance to enable selective probing of only some 30% of the connection pads of the ICs in order to exercise the whole IC and to stimulate seiftest and retrieve test data. A design for testability policy which is applied early in the development cycle, has maximum impact on life cycle quality and reliability and therefore is also most cost-effective than the temporary package test approach which is applied later in the production cycle. Testabilnost, pomemben dejavnik za MCM tehnologijo Ključne besede: mikroelektronika, IC vezja integrirana, elektronika kompleksna, preskusljivost, ULSI vezja integracije stopnje ultra visoke. MCM moduli multichip, cene proizvodov, cene preskušanja, BSoan skaniranje mejno, BIST preskušanje vgrajeno vase, vplivi na kakovost, vplivi na zanesljivost, SM montaža površinska, KGD tabletke poznane dobre, CAMELOT preskusljivost logike z merjenjem računalniško podprtim, ATPG generiranje vzorcev testnih avtomatizirano, LSSD snovanje s skaniranjem nivojsko občutljivim, ATG generiranje vzorcev testnih, DLBI tehnologija za življenjski preskus na nivoju tabletke, SiBIS substrati silicijevi za življenjski preskus, JTAG skupina delovna za preskušanje spojev, AFM sistem nadzora z mikroskopijo sile atomske Povzetek: V MCM module vgrajujemo kompleksna integrirana vezja v obliki tabletk, ki jih predhodno testiramo na nivoju rezine. Občasno jih pred vgraditvijo v MCM modul začasno montiramo v ohišja, da bi jih v popolnosti parametrično atestirali in na njih opravili ostale potrebne življenjske teste, odvisno od zahtevanega postopka za ugotovitev "zanesljivo dobre tabletke", KGD (Known Good Dice). Testiranje na nivoju rezine lahko danes razvijemo hierarhično in tako zagotovimo parametrično in diagnostično testiranje. Idealno gledano bi s tem dosegli največjo možno stopnjo ugotovitve napak na nivoju tabletke in celotnega MCM modula. Ta pristop zahteva vgrajen IEEE P1149.1 Boundary Scan test ali dostopnost vseh vgrajenih int. vezij in MCM modula v celoti za testiranje. To omogoča dejansko in navidezno preskušanje vseh priključkov in povezav, oz. omogoča ugotavljanje dobrih tabletk in celotnega testiranega vezja. Z dodatkom notranje "testne" logike pa omogočimo vgrajeno samotestiranje, BIST (Built-in-self-test). Le-to poenostavi testiranje na nivoju rezine in omogoči zanesljivejše ugotavljanje KGD. Na ta način presežemo preje omenjeno hierarhično testiranje z bolj inteligentnim pristopom. Tako npr. z izbirnim testiranjem samo 30% priključnih blazinic vzpodbudimo samotestiranje in omogočimo testiranje in zbiranje testnih podatkov iz celotnega Int. vezja. Politika načrtovanja za testiranje, ki jo uporabljamo zgodaj v razvojni fazi Int. vezja, ima največji vpliv na njegov kvalitetni življenjski cikel in zanesljivost in je torej tudi bolj cenovno učinkovita kot občasna zapiranja v ohišja in naknadno testiranje, ki ga je potrebno izvajati kasneje v fazi proizvodnje. 1. THE NEED FOR TESTABILITY? MCMs use bare-chip complex ICs which, if not fully tested prior to assembly in the module, will lead immediately to a poor yield of the MCM, and subsequently also pose reliability hazards because of their unknown performance and ageing characteristics. Conventional wafer-probe testing achieves rudimentary fault coverage. The consequence of inadequate testing is passed on to the assembled MCM. MCM technology is essentially a hybrid solution to semiconductor integration. Yet some test solutions are more akin to archaic board level approaches. Some intermediate solutions are very complicated and add a number of process steps and potential damage to the KGD, for instance requiring temporary bonding into carrier packages for testing and burn-in and then de-mounting and rebonding into the final packaging or interconnection location/1/. Such extra handling and processing add extra hazards and cost and do not fully represent eventual die performance. Such steps also contradict the aims of semiconductor foundries who iteratively test and improve their manufacturing practices to eliminate the need for burn-in. Therefore embedded generic solutions are required which commence at the source of the IC dice. 2. THE MAGNITUDE OF THE PROBLEM The burden of cost borne by design for test and actual testing in production lias progressively increased (Fig 1), as a consequence of increasing complexity of monolithic ULSI ICs and the recent complicated approaches to KGD and MCM testing. The expectation is that the present inexorable trend towards increased costs will be eased as more intelligent and embedded approaches to testability are adopted. The rate of rise of test cost is already slowing, because the problem of design for testability is being addressed and embedded solutions are being delivered by many, but not all, manufacturers. However, the ideal solution of fully embedded solutions is not yet available, and therefore interim solutions are necessary. As the circuits become more complex and are no-longer assemblies of simple discrete ICs, they may no longer be probed to achieve in-circuit testing and diagnostics. Even today, in advance of the expected rapid advance of MCMs early in the next millennium the problems of ULSI testability to achieve high fault coverage are quite severe and are solved either by partitioning, adding extra test pins, long test sequences or by reconfiguring the circuits to create sequentially testable paths. 1970 1980 1990 2000 Complexity SSI LSI ULSI MCM Gate Count 10 5k 200k 2000k Memories 256 16k 16Mb 10Gb Transistors 10^ 10^ 10® Speed(Hz) 100k 10M 100M 500M Pins 14 44 356 1000 Test/Total Cost % 5% 20% 60% 60% to be clocked onto the scan path and the response (Observability) shifted out. The benefits of LSSD are revealed by the dramatic improvement in automatic test generation (ATG) that can be achieved for, say, a 2000 gate IC within about 14 minutes for 99.9% fault coverage, contrasted with the laborious fault grading techniques which took over 50 hours to achieve about 60% fault coverage. Indeed today there exist commercial CAE tools which combine logic simulation and testability measures to achieve ATG. Meanwhile, the problem of in-circuit testing of high density micropackage assemblies may be probed by special fixtures /3/ with "pogo probes", or fixtureless robotic controlled probes, to gain access to small test access pads. Today probe resolution is typically as small as 6 microns and pad separation typically 100 microns. The contact pad options range from the generous use of extended test pads through to probing the leads of the micropackages (e.g. Fig 2). The presence of components on the same side as the probes adds complication to the design of the test assembly, requiring precise guide holes and the use of a physical stop to halt the travel of the probe bed. The problem is more severe with leadless packages and where designs incorporate buried vias. Therefore, an alternative philosophy for in-circuit testing of high density assemblies is needed /4/ and has been delivered. Fig.1. Technology and cost trends Fig. 2. Ball-on-ball wire bonding 3. EARLIER SOLUTIONS FOR TESTABILITY ICs do not lend themselves to mechanical in-circuit testing. Nevertheless, weaknesses in design or realisation have to be diagnosed and solved. As ICs became more complex, electronic access to internal nodes became difficult, and required corporate test strategies to ensure the design incorporated one or more testability features, such as partitioning, test access connections, breaking of loops. Level Sensitive Scan Design (LSSD) /2/. LSSD is a rigorous technique whereby every register in a circuit resides on, and can be configured into, a scan path separating complex combinatorial circuits into smaller blocks for ease of testing. The registers are configured into the scan path for testing together with the use of a scan input and a scan output, enabling a stimulus (Controllability) pattern 4. EMERGING PHYSICAL ACCESS SOLUTIONS FOR KGD 4.1 Temporary connections for KGD Assessment An useful comparison of die level burn-in (DLBI) carrier technologies was produced by Vasquez and Lindsey /5/ in which trade-offs are that permanent micropackages offer fully tested dice but take up more area on the substrate, semi-permanent attachments require either force or energy (heat) to detach the tested chip, thereby deliver a stressed contact surface into the subsequent permanent joints with the potential impairment to their reliability. An example of such temporary connection is that reported by Kim et al /1/ uses gold ball-wedge bonding onto a burn-in PCB from which the chips are liiliii^^M Fig. 3. Solder ball after bonding excised after burn-in by cutting the wires above the balls, which then serve as the bonding pads for subsequent ball bonding, i.e. ball-on-ball (Figure 2). Thus the underlying ball is subjected twice to bonding stress which is a condition that accelerates ageing of the joint with the IC bond pad. An alternative promising development by nChip for their silicon burn-in substrate (SiBIS) which incorporates compliant solder bumps for wafer-probe testing. The solder bumps on a silicon probe card are softened and distorted into ohmic contact with the IC bond pads on the wafer. The solder bump shapes are re-established by reflowto achieve up to 1000 reuses. The bump distortion is illustrated in Figure 3 indicating that a corresponding force must have occurred on the wafer surface. Also, this force would progressively increase as the solder bumps become less amorphous with each reflow. Despite this critical comment, there is promise from this technology which may yet be developed into a manifold reuse technique and be translated into a production technology. A third option is the use of temporary carriers (Figure 4) /5/, which also add the hazards of introducing stress at the IC bond pads in order to ensure good ohmic contact for measurement and burn-in. The contact is initiated by deforming the bond pad to break through the native oxide layer requiring more force per contact than the normal wirebond. Another form is piercing contacts which employed irregular sharp edged surface texture. A third form is to establish ohmic contact by scrubbing which also causes damage to the bond pad surfaces. forcB delivery mechantsiTT^^^^^ 5. COMBINED PHYSICAL AND ELECTRONICS SOLUTION TO IN-CIRCUIT TESTING The initiatives of the Joint Test Action Group (JTAG) in Europe, produced and defined the concept of "Boundary Scan", which allows "virtual probing" of internal nodes beyond the I/O buffer, which is now the IEEE Standard 1149.1 /6/. Major IC manufacturers are delivering VLSI and ULSI conforming to the testability standard. The alternative CrossCheck /7/ technique for high density logic ASICs involves embedding a test point array in the design - ensuring that each node is located at an intersection of an x-y line - thereby providing access to all nodes. CrossCheck can deal with synchronous and asynchronous logic and covers bridging and stuck-at faults and transistor defects. Despite the promise, it has not emerged as the preferred option because of the required added interconnection complexity that has to be embedded in the IC and thereafter in the system. Boundary Scan, on the other hand, can be used hierarchically and is now in widespread use. Boundary Scan which is now an established standard and already is being used to develop a testability hierarchy up to system level. This recognises the ongoing activity on IEEE standards 1149.x for equipment design (e.g. P1149.5 for communicating maintenance messages between field replaceable units in a system). 5.1 An Integrated Approach Modern design-to-test policies require the inclusion of appropriate test features and testing at the chip level, as the key building blocks to board level testability, and also rigorous IC and bare-board (substrate) inspection and testing to ensure the subsequent MCM testability is not compromised. Prototype testing of wafers can make use of scanning electron microscope (SEM) electron beam probing of IC chips to verify the design and diagnose faults /8/. Such DFT methods were instrumental in the 80386 programme in enabling early delivery of a fully functional device and testing of production parts quickly. Integrated test strategies are employed by a number of IC and original equipment manufacturers of MCMs who no longer find the temporary packaging option acceptable for reasons of cost, quality and reliability. Electronics design automation (EDA) tools incorporate many of the tools to enable electronics design for testability, with some dedicated tools in place for some time, as indicated above. EDA assisted design-to-test begins at simulation to verify functionality and to develop and verify test stimuli, facilitating LSSD, built-in self-test (BIST), test ROMS, design partitioning and the addition of consequent extra pins and packaging. boundary-scan path -a TDI test -o TMS access -D TCK port -G TDO (TAP) Fig. 5. Boundary scan for an IC 5.2 Boundary Scan for ICs The incorporation of Design for Testability (DFT) of systems starts with the critical components, namely the VLSI and ULSI. Boundary Scan requires the incorporation of a 4-wire serial test bus comprising Test Data In' (TDI), Test Data Out' (TDO), Test Mode Select' (TMS), and Test Clock' (TCK), and a boundary scan register (Fig 5) /9/. The additional test circuitry consumes about 10% of the chip area. The technique takes full advantage of Scan Path design, described earlier, which today is no longer an interesting option, but an essential solution to the need to deliver tested system " diagnostic bus board-level maintenance controiler Fig. 6. Boundary scan extended to circuit boards and MCMs and testable chips - it is the key to reducing overall product cost through significant increases in quality and reductions in production costs. The IC test functions are controlled by a Test Access Port (TAP) controller which is a state machine communicating overthe test bus. TDI and TMS are kept at logic-high unless otherwise driven. TDO is normally high impedance, and can acquire one of 3 states according to the data shifted through the IC. TMS initiates the state of the TAP, which selects the test mode. TCK clocks data into the IC through TDI and out through TDO. Either test instructions or test data can be scanned through the IC. The addition of a little more internal "tester" logic to the IC provides built-in seiftest (BIST). Such logic typically comprises a linear feedback shift register and a pseudorandom test pattern generator, thus minimising the stimulus and response vectors to be stored in the BIST circuitry. Because BIST uses the same types of transistors as the rest of the circuit, the tests can run at the maximum clock rate. The availability of Boundary Scan and BIST greatly improves the information gained, and simplifies the testing during wafer probing. By probing just a few of the IC pads and the four TAP terminals instead of probing all pads, and initiating the test routine, a significant portion of the logic can be exercised. For example, Vertex Semiconductors (San Jose, California) has achieved 99% fault coverage by accessing just 20 pads per chip, thereby avoiding the cost (up to $10,000) and alignment problems of a 300 pin probe card. 5.3 Modelling Approach to Testing Analog and Mixed Signal ICs The problem of testability is different when testing analog circuits, because access is not the problem, duration is. Testing mixed signal devices, in particular analog-to-digital converters (ADC), can be very time consuming - testing all possible output codes of a 13 bit ADC requires 213 (8192) different values of input voltage. An effective alternative /10/ is to solve the 13 independent equations from 13 sets of information obtained by measurements made at each binary exponent, thereby obtaining the data to calculate, rather than measure, all 8192 values. Hence the skills of the test engineer may now be directed at defining the variables and the reduced set of test points and setting up the calculation program, in order to fully characterise the ADC. Commercially developed aids such as QR High-Level Integration & Design Late Nineteen Nineties System Configuration and Test Generation Slnjctured Design Testability Implementation Simulation and Analysis prototype — Production Fig. 7. Design to test Factorisation (QRF) (factoring a right (R) triangular matrix and an orthogonal (Q) matrix) are available to make machine solutions less subject to computer rounding errors. Where mixed signal circuits are more complex, then partitioning into individual testable segments and extra test access terminals are necessary to permit board level diagnostics, as described later. 6. EXTENSION OF TESTABILITY TECHNIQUES TO CIRCUIT BOARDS AND MCMs 6.1 Board Level Testability Ideally, the circuit design can extend the test bus throughout the PCB or the MCM (Fig 6) /11/. Testing is then effected through a board-level maintenance controller to facilitate production testing of the assembled PCB or MCM. Boundary scan at this level works by passing test data or instructions from TDO of one IC to TDI of the next, allowing test information to be scanned through the interconnected ICs and enabling access to each IC to be interrogated in turn. If the ICs were previously tested, then the same static vectors can be reused fortesting the IC when it has been assembled on the PCB or MCM. The versatility of the technique is such that the testability can then be hierarchically escalated to the equipment level to enable the PCBs to be tested and diagnosed at the system level in the field by means of a system diagnostic bus. This would avoid the return of some 60% of boards from the field, subsequently found to be fault-free, saving considerably in logistics and cost. Standards are already emerging in this area, for instance the IEEE proposals: - P896 FutureBus-i- for modules, backplanes and chassis, - PI 396 COMBUS for a standard backplane interface bus protocol for high speed communications access to Synchronous Optical Network (SONET) and Synchronous Digital Hierarchy (SDH) based networks, - P1149.x series for equipment design for testability. Proprietary systems already in use include IBM's Diagnostic Expert for Testing (DEFT) /12/ and BT's Line Card diagnostics system. Both systems make use of field experience and expertise to build the rules for knowledge bases covering the range of faults identified. Thereafter, because the rules are the same for all users, and they are guided through the diagnoses, the users' skills are reinforced with use, and gets fed back into the knowledge base. ATE is already available from many suppliers to test JTAG ICs and PCBs. In parallel, Computer Aided Engineering (CAE) software emphasis enables involvement of test designers in the earliest stages of design and HDL methodology /13/. Already there are novel simulation tools such as the Tl SimuBoard which provides software "breadboard" simulation of a DSP application by combining ASICs, standard device models, JTAG DFT functions and board timing delays. 6.2 LSSD and Boundary Scan Solutions for MCMs Motorola provides both IC and systems solutions, and has developed expertise in volume production of MCMs, including the essential test methodology /14/. The individual chips Incorporate special features for their testability in production as monolithic ICs. The solution consists of LSSD for random logic fault coverage greater than 95% of individual chips, and boundary scan corresponding to P1149.1, together with some non-standard P1149.1 instructions via TAP. Thus the need for additional special sort die /15/ to invoke special test instructions is eliminated. In the example, high MCM yield is achieved by a combination of Main Control Chip and a smaller support chip which has high fault coverage (>99%) at wafer probe which may be tested at high speed at 66 MHz. The yield is then limited by the lesser fault coverage Control chip. MCM testing is effected by using the LSSD test control pins (multiplexed) to set the combined die function pins (databus and address) into a high impedance tri-state condition and de-select chips not being tested while using the original pattern structure to test the selected chip with LSSD and functional patterns. This fulfills the requirements for cost-effective MCM production that there are no extra overheads in space, pin-out or test procedures. Complex MCMs incorporate many costly VLSI chips built on costly multilayer substrates. Therefore, they must be designed with diagnostics testability to enable rework. Not all available VLSI incorporate P1149.1 testability features. In these circumstances either the substrate can be designed to facilitate testability or additional circuitry is necessary. A complex DRAM MCM developed by Blood and Flint/16/ incorporates P1149.1 featured buffer circuits on all Address and Control inputs and bidirectional input/output (I/O) data lines. Thus all MCM I/O terminals have P1149.1 registers connected in a scan path for board/MCM assembly and diagnostics. Although the RAM chips do not have P1149.1 features, those on the buffer internal ports provide for MCM assembly fault (wirebonds, tracks) diagnostics via an MCM P1149.1 TAP. The outcome comprises nine P1149.1 buffer chips to verify I/O and interconnect, six extra pin out nodes connected to the DRAM RAS and CAS signals for DRAM diagnostics plus fifty two test probe points for MCM internal diagnostics. A similar exploitation of partial population of an MCM P1149.1 ICs is reported by Posse /17/ in which an MCM comprising four P1149.1 featured ICs and two non conforming ICs, in which diagnosis was achieved by disabling the non-conforming ICs from terminations on MCM. Today, complex MCMs are built onto active silicon substrates incorporating simple switching functions. Currenttestability developments, for example by NS, are exploiting this opportunity to build scan rings into such active substrates in order to access chips which do not have P1149.1 capability. Clearly practical solutions are therefore optimised using P1149.1 boundary scan chips where possible, with additional partitioning and test point access to provide diagnostics access into the assembled MCM. 7. AT WHAT STAGE OF THE PRODUCT SHOULD TESTABILITY BE ADDRESSED? 7.1 Design to Test Design engineers may no longer "Design for Test" and pass their designs "Over the Wall" to Production Test engineers. Instead, it is clear that design and production engineers have learned that they are part of the same team, that the wall has to be removed (Fig 7) and they now have to "Design to Test". 7.2 Cost Benefit of Design-to-Test A new analysis for this paper of an earlier analysis by Mitre /18/ has taken account of product lifecycles in today's marketplace (Fig 8). The new analysis shortens the lifecycle to around 10 years from the previous 20 years and finds that there is a consequential marginal increase of the impact of design on subsequent lifecyle costs - that 72% of operation and maintenance costs are determined at the design stage of a product, a further 13% being influenced at the subsystem development stage. In other words, only 15% of operation and maintenance costs can be controlled by those responsible for operation and maintenance! Therefore Design-to-Test is crucial in controlling logistics and consequent costs of all systems operations. 70%-60% 51)%' 40%-30V.. 20%-10%-0%' Life CycleCost (y. Of Total) Impact of Life Cycle Phases on Operation & Maintenance Costs (%) Keastbtllty Development Fig. 8. Causes of life-cycle costs REFERENCES /1/ I U Kim, S H Lee, I H Hyun, K J Lee, J M Park, "A New Approach to Produce Cost-Effective Known Good Die", Proc. ISHM International Conference on MultiChip Modules, ISBN 0-930815-39-4, 13-15 April 1994. /2/ E B Eichelberger and T B Williams "A Logical Design Structure for LSI Testability", Journal of Design Automation and Fault Tolerant Computing, Vol 2, No 2, pp165-178, May 1978 /3/ For example: R N Barnes, "Fixturing for Surface Mounted Components" Proc IEEE International Test Conference, pp 72-76, 1983 /4/ C Maunder, D Roberts, N Sinnadurai, "Chip Carrier Based Systems and Their Testability" Hybrid Circuits, No 5, pp 29-36, 1984 /5/ B Vasquez and S Lindsey, "The Promise of Known Good Die Technologies", Proc ISHM International Conference on MultiChip Modules, ISBN 0-930815-39-4, 13-15 April 1994. /6/ "IEEE Standard Test Access Port and Boundary-Scan Architecture", Ttie Institute of Electrical and Electronics Engineers Inc., 15 February 1990 /7/ CrossClieck Technology Inc, San Jose, California /8/ For example: O C Woolard, "Voltage Contrast Electron Beam Tester", Hybrid Circuit Technology, February 1991 /9/ C M Maunder "Status of iC Design for Testability" Br Telecom Technol J, Vol 7, No 1, pp 44-49, January 1989 /10/T Michael Souders and Gerard N Stenbal