UDK 621.3:(53+54+621 +66)(05)(497.1 )=00 ISSN 0352-9045 Strokovno društvo za mikroelektroniko elektronske sestavne dele in materiale 2007 Strokovna revija za mikroelektroniko, elektronske sestavne dele in materiale Journal of Microelectronics, Electronic Components and Materials 43rd INTERNATIONAL CONFERENCE ON MICROELECTRONICS,"DEVICES AND MATERIALS and the WORKSHOP on ELECTRONIC TESTING INFORMACIJE MIDEM, LETNIK 37, ŠT. 4(124), LJUBLJANA, december 2007 ■SfHRft 0 % f- September 12.* September 14.2007 Hotel Ästbfiä ät BIed, Slovenia UDK 621.3:(53+54+621 +66)(05)(497.1 )=00 ISSN 0352-9045 INFORMACIJE 2007 INFORMACIJE MIDEM LETNIK 37, ŠT. 4(124), LJUBLJANA, DECEMBER 2007 INFORMACIJE MIDEM VOLUME 37, NO. 4(124), LJUBLJANA, DECEMBER 2007 Revija izhaja trimesečno (marec, junij, september, december). Izdaja strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale - MIDEM. Published quarterly (march, june, september, december) by Society for Microelectronics, Electronic Components and Materials - MIDEM. Glavni in odgovorni urednik Editor in Chief Dr. Iztok Šorli, univ, dipl.inž.fiz., MIKROIKS, d.o.o., Ljubljana Tehnični urednik Executive Editor Dr. Iztok Sorli, univ. dipl.inž.fiz., MIKROIKS, d.o.o., Ljubljana Uredniški odbor Editorial Board Dr. Barbara Malic, univ. dipl.inž. kern., Institut "Jožef Stefan", Ljubljana Prof. dr. Slavko Amon, univ. dipl.inž. el., Fakulteta za elektrotehniko, Ljubljana Prof. dr. Marko Topič, univ. dipl.inž. el., Fakulteta za elektrotehniko, Ljubljana Prof. dr. Rudi Babic, univ. dipl,inž. el., Fakulteta za elektrotehniko, računalništvo in informatiko Maribor Dr, Marko Hrovat, univ. dipl.inž. kern., Institut "Jožef Stefan", Ljubljana Dr. Wolfgang Pribyl, Austria Mikro Systeme Intl. AG, Unterpremstaetten Časopisni svet International Advisory Board Prof. dr. Janez Trontelj, univ. dipl.inž. el., Fakulteta za elektrotehniko, Ljubljana, PREDSEDNIK - PRESIDENT Prof. dr. Cor Claeys, IMEC, Leuven Dr. Jean-Marie Haussonne, EIC-LUSAC, Octeville Darko Belavič, univ. dipl.inž. el,, Institut "Jožef Stefan", Ljubljana Prof. dr. Zvonko Fazarinc, univ. dipl.inž., CIS, Stanford University, Stanford Prof. dr. Giorgio Pignatel, University of Padova Prof. dr. Stane Pejovnik, univ. dipl.inž., Fakulteta za kemijo in kemijsko tehnologijo, Ljubljana Dr. Giovanni Soncini, University of Trento, Trento Prof, dr. Anton Zalar, univ. dipl.inž,met., Institut Jožef Stefan, Ljubljana Dr. Peter Weissglas, Swedish Institute of Microelectronics, Stockholm Prof. dr. Leszek J. Golonka, Technical University Wroclaw Naslov uredništva Headquarters Uredništvo Informacije MIDEM MIDEM pri MIKROIKS Stegne 11, 1521 Ljubljana, Slovenija tel.: + 386(0)1 51 33 768 faks: + 386 (0)1 51 33 771 e-pošta: Iztok.Sorli@guest.arnes.si http://www.midem-drustvo.si/ Letna naročnina je 100 EUR, cena posamezne številke pa 25 EUR. Člani in sponzorji MIDEM prejemajo Informacije MIDEM brezplačno. Annual subscription rate is EUR 100, separate issue is EUR 25. MIDEM members and Society sponsors receive Informacije MIDEM for free. Znanstveni svet za tehnične vede je podal pozitivno mnenje o reviji kot znanstveno-strokovni reviji za mikroelektroniko, elektronske sestavne dele in materiale. Izdajo revije sofinancirajo ARRS in sponzorji društva. Scientific Council for Technical Sciences of Slovene Research Agency has recognized Informacije MIDEM as scientific Journal for microelectronics, electronic components and materials. Publishing of the Journal is financed by Slovene Research Agency and by Society sponsors. Znanstveno-strokovne prispevke objavljene v Informacijah MIDEM zajemamo v podatkovne baze COBiSS in INSPEC. Prispevke iz revije zajema ISIe v naslednje svoje produkte: Sci Search5, Research Alert® in Materials Science Citation Index™ Scientific and professional papers published in Informacije MIDEM are assessed into COBISS and INSPEC databases. The Journal is indexed by ISIS for Sci Search®, Research Alert® and Material Science Citation Index™ Po mnenju Ministrstva za informiranje št.23/300-92 šteje glasilo Informacije MIDEM med proizvode informativnega značaja. Grafična priprava in tisk BIRO M, Ljubljana Printed by Naklada 1000 izvodov Circulation 1000 issues Poštnina plačana pri pošti 1102 Ljubljana Slovenia Taxe Perçue UDK621,3:(53+54+621 +66), ISSN0352-9045 Informacije MIDEM 37(2007)4, Ljubljana ZNANSTVENO STROKOVNI PRISPEVKI PROFESSIONAL SCIENTIFIC PAPERS D.Strle, V.Kempe: Inercijski sistemi na osnovi MEMS tehnologije 199 D.Strle, V.Kempe: MEMS-Based Inertial Systems F.Novak: Sodobni izzivi na področju testiranja elektronskih vezij in sistemov 210 F.Novak: Challenging Issues in Electronic Testing S.Hellebrand, C.G.Zoellin, H.Joachim Wunderlich, S.Ludwig, T.Coym, B.Straube: Testiranje nanosistemov - izzivi in strategije zagotavljanja kakovosti 212 S.Hellebrand, C.G.Zoellin, H.Joachim Wunderlich, S.Ludwig, T.Coym, B.Straube: Testing and Monitoring Nanoscale Systems - Challenges and Strategies for Advanced Quality ssurance Z.Peng, Z.He, P.Eies: Izzivi in rešitve pri testiranju sistemov na čipu 220 Z.Peng, Z.He, P.EIes: Challenges and Solutions for Thermal-Aware SoC Testing P. Cauvet, S. Bernard, M. Renovell: Načrtovanje in testiranje sistemov v enem ohišju 228 P. Cauvet, S. Bernard, M. Renovell: Design &Test of System-in-Package H.Joachim Wunderlich, M.Elm, S.Holst: Diagnoza in odkrivanje napak: obvladovanje življenske dobe nanosistemov na čipu 235 H.Joachim Wunderlich, M.Elm, S.Holst: Debug and Diagnosis: Mastering the Life Cycle of Nano-Scale Systems on Chip Konferenca MIDEM 2007 - poročilo 244 MIDEM2007 Conference - report Vsebina letnika 2007 246 Volume 2007 Content MIDEM prijavnica 251 MIDEM Registration Form Slika na naslovnici: Konferenca MIDEM2007je potekala na Bledu Front page: MIDEM2007 Conference took place at Bled VSEBINA CONTENT Obnovitev članstva v strokovnem društvu MIDEM in iz tega izhajajoče ugodnosti in obveznosti Spoštovani, V svojem več desetletij dolgem obstoju in delovanju smo si prizadevali narediti društvo privlačno in koristno vsem članom.Z delovanjem društva ste se srečali tudi vi in se odločili, da se v društvo včlanite. Življenske poti, zaposlitev in strokovno zanimanje pa se z leti spreminjajo, najrazličnejši dogodki, izzivi in odločitve so vas morda usmerili v povsem druga področja in vaš interes za delovanje ali članstvo v društvu se je z leti močno spremenil, morda izginil. Morda pa vas aktivnosti društva kljub temu še vedno zanimajo, če ne drugače, kot spomin na prijetne čase, ki smo jih skupaj preživeli. Spremenili so se tudi naslovi in način komuniciranja. Ker je seznam članstva postal dolg, očitno pa je, da mnogi nekdanji člani nimajo več interesa za sodelovanje v društvu, se je Izvršilni odbor društva odločil, da stanje članstva uredi in vas zato prosi, da izpolnite in nam pošljete obrazec priložen na koncu revije. Naj vas ponovno spomnimo na ugodnosti, ki izhajajo iz vašega članstva. Kot član strokovnega društva prejemate revijo »Informacije MIDEM«, povabljeni ste na strokovne konference, kjer lahko predstavite svoje raziskovalne in razvojne dosežke ali srečate stare znance in nove, povabljene predavatelje s področja, ki vas zanima. O svojih dosežkih in problemih lahko poročate v strokovni reviji, ki ima ugleden IMPACT faktor. S svojimi predlogi lahko usmerjate delovanje društva. Vaša obveza je plačilo članarine 25 EUR na leto. Članarino lahko plačate na transakcijski račun društva pri A-banki: 051008010631192. Pri nakazilu ne pozabite navesti svojega imena! Upamo, da vas delovanje društva še vedno zanima in da boste članstvo obnovili. Žal pa bomo morali dosedanje člane, ki članstva ne boste obnovili do konca leta 2007, brisati iz seznama članstva. Prijavnice pošljite na naslov: MIDEM pri MIKROIKS Stegne 11 1521 Ljubljana Ljubljana, december 2007 Izvršilni odbor društva UDK621.3.'(53+54+621 +66), ISSN0352-9045 Informacije MIDEM 37(2007)3, Ljubljana MEMS-BASED INERTIAL SYSTEMS Drago Strle*, Volker Kempe** 'University of Ljubljana, Faculty for Electrical Engineering ** SensorDynamics A.G. Austria INVITED PAPER MIDEM 2007 CONFERENCE 12.09. 2007 - 14.09. 2007, Bled, Slovenia Key words: MEMS-sensors, inertial systems, systems in package, gyro sensor, acceleration sensor, MEMS technology, fail-safe electronic systems. Abstract: MEMS-based Inertial systems are one of the fastest growing application segments of the micro-sensor market. The applications range from consumer to personal navigation and automotive systems, where they are used for rollover detection and stability control. This paper presents an overview of MEMS-based Inertial systems composed of acceleration and gyro sensors, ASIC for driving the sensor and sensing the signals, packages to house the sensors and ASICS and software for controlling and fail-safe operation of the system. To produce such systems, several technologies are needed, such as IC and MEMS technologies, packaging, calibration tests, etc. The driver for technology is the automotive industry, which requires as little volume and power consumption as possible, as good a performance as possible for as small price as possible. Following this rule, a number of inertial systems have been developed for automotive ranges from 75°/s to 300°/s, programmable band from 10 to 200Hz, noise floor of 0.02(o/sWHz, single supply voltage of 5V, linearity better than 0.1%, automotive temperature range and probability of undetected errors smaller than 10"9/h. Inercijski sistemi na osnovi MEMS tehnologije Kjučne besede: MEMS senzorji, Inercijski sistemi, sistemi v ohišju, giroskop, senzor pospeškov, MEMS tehnologija, varni elektronski sistemi Izvleček: Inercijski sistemi, ki bazirajo na MEMS tehnologijah so najhitreje rastoče področje trga senzorskih sistemov. Uporaba sega od potrošnih, osebnih navigacijskih do avtomobilskih sistemov, kjer se uporabljajo za zaznavo stabilnosti in prevračanja ter pri merjenju pospeškov. Članek obravnava pregled integriranih inercijskih sistemov, ki so sestavljeni iz senzorja pospeška, senzorja vrtenja, ASIC vezja, ki skrbi za krmiljenje senzorjev In detekoijo šibkih signalov, mikrokontrolerja In programov za krmiljenje In upravljanje varnega sistema ter ohišja. Za učinkovito gradnjo takšnega sistema potrebujemo različne tehnologije kot so: tehnologija integriranih vezji, MEMS tehnologije, pakiranje, kalibracija, testiranje itd. Glavni razlog za razvoj novih tehnologij je avtomobilska industrija, ki zahteva Inercijski sistem z majhnim volumnom, težo, majhno porabo moči, avtomobilskimi specifikacijami in čim nižjo ceno. V preteklih letih je bilo razvitih nekaj takšnih sistemov, ki delujejo v avtomobilskem področju od 75°/s do 300°/s, programabllno pasovno širino od 10 do 200Hz, šumom, ki je manjši od 0.02(°/sWHz, linearnostjo boljšo kot 0.1% pri napajalni napetosti 5V in delujejo v avtomobilskem temperaturnem področju. Poleg tega avtomoblski standardi zahtevajo veliko zanesljivost delovanja, kjer mora biti verjetnost za neodkrito napako manjša kot 10"9/h. 1 Introduction MEMS inertial systems, consisting of MEMS accelerome-ters and gyro sensors, ASIC that drive the sensors and sense the signals, package and embedded software are very important parts of silicon-based smart sensor-systems. The technology driver is the automotive industry, because it requires more and more "smart-sensor systems" to be built Into the vehicles. Applications of automotive inertial systems include stability control, rollover detection, acceleration detection for the airbags and add-on for the GPS navigation system. Other applications are biomedical, sport, industrial and robotic, military and consumer like picture stability control in cameras, 3D mouse and virtual reality devices, etc. Common requirements for all these applications are appropriate performances, low price, low volume and low power consumption. These factors are the most important drivers for further technological development of MEMS Inertial systems, composed of sensors, analogue and digital signal-processing hardware and embedded algorithms. The paper is organized as follows: Section 2 covers basic principles of operation of both sensors; In Section 3 his- torical background is presented, while in Section 4 the most important automotive requirements are shown. Section 5 describes MEMS inertial measurement system used in automotive applications. The introduction of MEMS technology is presented in Section 6, together with a description of the most important sensing and actuation principles (electrostatics in MEMS). In section 7, possible implementations of the MEMS accelerometer and gyro sensors are presented with an electro-mechanical model of the gyro sensor. Section 8 explains possible implementation of electronic systems able to drive the sensors and sense weak signals coming from the sensors. 2 Principles of operation 2.1 Accelerometer Generally, an accelerometer consists of a proof mass suspended by springs connected to a fixed frame. Figure 1 shows a general accelerometer sensor structure and its simplified model. When accelerating, according to Newton's second law, the proof mass is moved from its rest position and the displace- 199 Informacije MIDEM 37(2007)4, str. 199-209 D. Strie, V. Kempe: MEMS-Based Inertial Systems rest position under acceleration M Fig. 1: Accelerometer sensor and its simplified model ment can be sensed. The acceleration sensor has proof mass M, effective spring constant K damping factor ¡5and can be approximately described by second order differential equation (2.1): Mx + (3 x + Kx= Ma The displacement is approximately proportional to x- = (2.1) _ JL or The linear mechanical transfer function of the device can be described by the second order system (2.2): H(sy. s~ +- ®o Q -s + co 0 (2.2) k where resonance frequency of the system is co0 = —, qual- m ity factor of a system in resonance is proportional to 4km P ' If proof mass is very small as in MEMS, then Brownian motion of the air can disturb the operation of the sensor creating random force FB = ^n/\I~Hz j, which presents the lower resolution limit of the acceleration sensor (2.3): i ¡4kBm0 mq 4Hz (2.3) To reduce noise level a sensor can be placed in a vacuum. In that case, the lower limit is defined by the noise generated in the electronic sensing circuitry. 2.2 Gyroscope A gyroscope is a device that can sense rotation of an object in space: it measures the angular velocity of a system with respect to the inertial reference frame. Different physical mechanisms can be used for sensing the rotation; Coriolis Effect is most commonly the used. Coriolis force is apparent force that arises in a rotating reference frame. If an observer is sitting on the x axis observing moving object with velocity vector v (Figure 2) and if the coordinate system is rotating together with the observer with angular velocity n then the observer thinks that the particle is changing its path in the direction of x axis with acceleration a = 2vxi2: this is a Coriolis effect, according to French scientists. Z â ^ Fcor = ma cor =2mvxft Travelling particle a =2vxa observer/ ^J/ For spring 50kHz Resolution 50-200mg 1 -2mg 4 fig 1 ug O.lg Off-axis sensitivity <3% <5% <0.1% Non -linearity <2% <2% <0.1% Shock in 1ms >2000g >2000g >10g Temp, range -40 to 125 -40 to 125 -40 to 80 TC of offset <50iig/°C <5 pg/°C <0.5 ng/'C TC of sensitivity <900ppm/°C <100ppm/°C <5ppm/° C Different applications require different characteristics regarding sensitivity, range, frequency response, resolution, non-linearity, off-axis sensitivity, shock survivability, etc. The most demanding are, of course, accelerometers used in micro-gravity measurements. The requirements for automotive applications as regards resolution, sensitivity, stability, etc. are not as demanding as for navigation and mi-cro-gravity, and have a smaller range compared with ballistic and impact-sensing applications: for automotive applications, the price, volume and power consumption must all be as low as possible. This can be achieved only if MEMS integration technology is used. Table 2 shows typical characteristics of gyro sensors for different applications. Many different implementations are possible, each covering its own application area. Automotive requirements (rollover detection and stability control) are not as demanding as navigation requirements regarding stability, noise and range, however, the price, volume and power consumption must be much lower, which is possible to implement only by integration. Additional automotive requirements equally important for the proper operation of inertial measurement systems are: Allowed mechanical shock during operation: <1500g Allowed mechanical shock un-powered: <2000g Mechanical vibrations A=0.75mm, 10g, 55Hz to 2000Hz ..., 24h EMI requirements (0.1 MHz to 400MHz) EMI Emission: 0.15 ... 200MHz, E< 30dB|iV/m Lifetime: 15000 hours in 17 years Probability of undetected error: <10"9/h 201 Informacije MIDEM 37(2007)4, str. 199-209 D. Strie, V. Kempe: MEMS-Based Inertial Systems Table 2 Typical characteristics of gyro sensors Applications Rate Tactical Inertial (Automotive) (Fibre -optics, (Ring -laser optical) Parameter MEMS MEMS Full scale Range [°/s] 50-1000 >500 >400 Noi se dens. [7s/VHz] 0.01 -0.03 0.001 0.0001 Scale factor accur. [%] 0.1 to 2 0.01 to 0.1 <0.001 Bias error[7s] <2 0.1 -1 <0.01 Angle rand, walk [7\'h] Unimportant 0.05 to 0.5 <0.001 Bandwidth [Hz] 10-25-70 100 10 Max. shock in 1ms [g] 103 104 - 105 105 Temp, range [°C] -40 to 125 -40 to 125 -40 to 125 Power cons. [ W ] <0.1 >10 >10 Price <10EUR High Very high MEMS gyro sensors are mechanical sensors so they are sensitive to mechanical shocks and vibrations. Robustness against mechanical shocks is in contradiction with sensitivity and scale factor. A special design of sensor and control electronics is therefore needed to overcome these conflicting requirements. In addition, MEMS accelerome-ters and gyro sensors are usually capacitive sensors using very small capacitances: they have very high impedance, so they are very sensitive to external electro-magnetic fields; appropriate packaging, shielding and fully differential electronics could solve problems. The hardest requirement is reliability regarding probability of undetected error, which can be achieved only by running so-called fail-safe system, which measures important parameters of the system in real-time. 5 MEMS inertial measurement unit An automotive application of a MEMS inertial measurement unit with 6 degrees of freedom is shown in Figure 6. It is composed of x, y and z acceleration sensors to be able to measure longitudinal, lateral and vertical acceleration and 3 gyro sensors capable of measuring rotation about each axis: yaw rate about z-axis, pitch rate around y-axis and roll rate around x-axis. MEMS IMU Includes sensors and electronics for driving all sensors and processing all signals from the sensors. Required ranges for all sensors for rollover detection and stability control (ESC) are shown in Figure 6. Currently it is not possible to integrate all six elements into the same ASIC: this is not, however, a distant dream. At present, it is possible to Integrate two acceleration sensors and two gyro sensors on top of the ASICs, which includes all necessary electronics to get System in Package (SiP). An integrated z-axis gyro is still awaiting implementation. IMS (Inertial Measurement System) Is in reality a system where sensors are only one part. Other equally important parts are: low noise analogue signal-processing hardware, digital signal-processing algorithms and hardware, microcontroller with embedded software running fail-safe algorithms, package, calibration and built-in self-test. Integration of all these elements in a single chip or as SiP provides an opportunity to reduce price, increase reliability and robustness and reduce power consumption. 6 MEMS technologies To reduce fabrication cost and increase reliability, the sensors must be scaled down by borrowing fabrication technologies from IC technology. In this way, the dimensions are reduced (millimeter down to tenths of nanometer range) and batch-processing of thousands of devices in parallel is possible, thus reducing the price considerably. To be able to scale mechanical structures, special micro-fabrication technologies have been developed /6/. The related field is called Microsystems technology and the systems built in this way are called Micro-Electromechanical Systems (MEMS). Micromachining in combination with standard IC processing steps (doping, deposition, photolithography and etching) forms the technological base for the implementation of today's micro-electromechanical sensors and systems. Usually a MEMS sensor is just one of the elements of an intelligent sensor system, which also includes ASIC for driving and processing signals from the sensors, package, software, calibration, testing, etc. To reduce further the number of external passive or active components, different techniques and technologies have been developed: SoC (System on Chip), SSoC (Sensing System on Chip) and SSiP (Sensing System in Package). Their implementation depends on available technology, physical requirements and required price. Micromachining is used to produce three-dimensional mechanical structures (cantilevers, bridges, membranes, springs, etc.) /7/, (Figure 7). using different etching techniques: bulk micromachining is used to etch silicon substrate, while surface micromachining (Figure 8) is used to release a cantilever, beam or plate deposited on top of a sacrificial thin film layer. The critical technology step is etching and most promising is anisotropic deep reactive ion etching (DRIE), which can produce very high aspect ratio micro-structures (Figure 9) needed for inertial sensors. Fig. 6: MEMS inertial measurement unit 202 D. Strie, V. Kempe: MEMS-Based Inertlal Systems Informacije MIDEM 37(2007)4, str. 199-209 isotropic anisotropic 1 ICO) i ■:) : : Fig, 7: Bulk micromachining Poly-2 Aickr GoM / / \ \ Nitndi PoM OxÄ-l (kids-2 Fig. 8: Surface micromachining Fig. 9: DRIE example If MEMS structures are to be integrated in the same sill-con substrate as the ASICs, then a number of constraints are imposed on the micromachining steps so as not to detract from the performance of the electronics. Most important limiting factors are temperature of post-process-ing steps and/or contamination of the ASICs substrate and surface. Companies have developed several strategies for the integration: Pre-CMOS micromachining, Intra-CMOS micromachining and Post-CMOS micromachining /9/. Figure 10 shows dimensions of MEMS elements compared to different physical structures. Despite miniaturization, all classical physical laws are still valid for most of the MEMS sensors. The reduction in size, however, has the following important consequences: surface/volume ratio is drastically increased, so surface properties dominate and surface charges become very important, friction isgreaterthan Inertia, heat dissipation is greater km MEMS Microtechnology Ant 3mm Cell 5)tm M I I ¿ I It~M~| F m Atom 0.1nm um Himalaya 8km Man 1.8m Hair 75um nm Molecule 5nm Virus 80nm Fig. 10: MEMS technology element sizes than heat storage, electrostatic force is greater than magnetic force and finally new phenomena negligible at mac-ro-scale can be observed. To be able to design and produce reliable MEMS sensors it is essential to model the device and technology as good as possible and to control the MEMS process as good as possible. Thorough testing is also part of successful production. During production of MEMS structures, several critical issues may corrupt the performances of the sensor. Two of the most damaging are: Stlction (Figure 11) and surface stress (Figure 12). Fig. 11: Critical Issue: Stiction Fig. 12: Critical Issue: Surface stress Stiction is caused by different mechanisms. The consequences for accelerometer and gyro sensors are disastrous since sensors with short circuits do not operate correctly. It is therefore important that we remove the possibility of stiction by appropriate etching and/or release techniques or by using anti-stlction coating of the structure and/ or pulsed laser post-processing. Another critical problem is surface stress, which causes bending of mechanical structures in an unwanted direc- 203 Informacije MIDEM 37(2007)4, str. 199-209 D. Strie, V. Kempe: MEMS-Based Inertial Systems tion. The consequence is again disastrous for the operation of MEMS inertial sensors: it is especially dangerous for gyro sensors, where surface stress may cause prohibitively high quad-bias. It may become several orders of magnitude bigger than the bias signal and may saturate the measurement channel. It is very important to prevent surface stress during processing by choice of appropriate materials, appropriate device design, annealing after process steps and customized deposition and etching processes. 7 Electrostatics in MEMS In MEMS, electrostatic force is biggerthan magnetic force. In addition, sensing capacitance can be more accurate than sensing magnetic field or resistance because of noise. If we place inertial sensors in a vacuum, then Brownian motion of the air or gas does not define the resolution limit of the sensor any longer: the noise generated in the first stages of analogue signal-processing electronics dictates the resolution limit. Electrostatic force can be used for actuation and capacitance as a measure of mechanical displacement. Two different shapes of capacitances are possible in MEMS: plate capacitor (Figure 13) and comb capacitor (Figure 14). /+ + + + + - - + + + + + - Fig. 13: Plate capacitor Plate capacitance as a function of displacement x if we neglect fringe capacitance Is (6.1). C(x) = £,. e0A (d-x) = Cn d (6.1) For small movements the relation is approx. linear. If one plate Is anchored, the capacitance is a measure of a displacement assuming negligible fringe capacitance and no bending. If voltage Vis connected between the plates, then the attractive force between the plates is (6.2) F{x) =- £,. eA ■ V (6.2) 2(d - xj We can see that the force is in quadratic relation to the applied voltage and inversely quadratic in displacement x. Only attraction force is possible. In addition, applying a sine wave signal to produce oscillatory movements causes a second harmonic component, which must be carefully considered. Electrostatic force can be used for driving and for electrostatic trimming. X Fig. 14: Comb capacitor For comb capacitors (Figure 14), the capacitance is proportional to the position x of the inner plate even for large displacement. Assuming that fringe capacitances are much smaller than comb capacitances than the capacitance as a function of x is (6.3): C{x)=2^-(L-x) (6.3) Applying voltage V to the structure gives a force proportional to the square of the applied voltage (6.4): in this case, a force is linearly related to distance d and independent of the position L. Again, we tactically assumed no bending and no fringing field. In reality, this is not the case and fringing fields cause a different non-linear relation between position and capacitance, unwanted coupling between neighboring structures, and additional forces not covered by equations above, such as levltation effects, etc. Here we will not cover these effects in detail. F{x)=- d (6.4) Figure 15 shows the effect of ground plate capacitance: any voltage difference between moving structures and ground plates causes bending owing to the presence of electrostatic force, which changes the resonance frequency of the structure In z direction. Fig. 15: Lévitation effect owing to parasitic capacitance towards substrate 204 D. Strie, V. Kempe: MEMS-Based Inertlal Systems Informacije MIDEM 37(2007)4, str. 199-209 8 MEMS acceleration and gyro sensors 8.1 MEMS acceleration sensors Many possible transduction mechanisms can be used for acceleration and gyro sensors and many were tested in the past: piezo-resistive, electromagnetic, piezoelectric, ferroelectric, optical, tunneling and capacitive. An extensive reference list of possibilities is given in /2/. Capacitive accelerometers are popular because it is relatively easy to measure very small capacitance changes. In addition, for carefully processed MEMS capacitors the actuation voltage is low, which is an additional advantage. Lateral capacitive MEMS accelerometer is presented in Figure 16. In the presence of external acceleration, the support frame moves from its rest position; plate capacitance between proof mass and support frame changes. The sensor has good DC response, good noise performance, low drift, low temperature sensitivity, low power dissipation and relatively simple structure. Capacitive sensors are, however, sensitive to mechanical shock and vibrations and to electromagnetic fields so appropriate design and shielding are necessary. Fig. 16: Simplified view of lateral acceleration sensor under acceleration The challenge is finding appropriate interface circuitry, where low noise and low drift readout/control circuitry with high sensitivity and large dynamic range are needed. Electronic readout/control circuit requirements for acceleration and gyro sensor systems are very similar, therefore, both are treated in subsection 9.1. 8.2 MEMS gyro sensor and its model Almost all reported MEMS gyro sensors are vibratory type devices: in this case, no rotational elements are needed and miniaturization and batch fabrication are possible. The most important specifications are: Resolution: defined by the random noise of the sensor and electronic given in [(°/s)/\fHz~J. In the ab- sence of rotation, the output signal Is random, composed of white noise and slowly varying function; the white noise part defines the resolution. Drift: peak-to-peak value of the slowly varying noise component defines short and long-term stability expressed in p/h ] ■ Scale factor: is defined as the amount of change in the output per unit change of rotation rate and is expressed in [f/(7'j)]- Bias: is zero rate output and is the output when there is no rotation; it is given in [7s]. It can be digitally compensated if changes because of ageing and temperature drift are small. A number of vibratory MEMS gyro sensors have been developed in the past: tuning fork, piezoelectric, gimbals-vibratory, etc. Usually, micro-machined gyro sensors rely on coupling of excited vibration mode into a secondary mode owing to Coriolls acceleration. The magnitude of secondary movement is proportional to input angular velocity and driving speed and is perpendicular to the primary motion, as suggested for the tuning fork gyro example shown in Figure 17. Driving could be electrostatic or electromagnetic and sensing could be electrostatic or piezoelectric. One such example is explained in /13/. Input rotation Coriolis Fig. 17: Tuning fork gyro principle Many other implementations have been demonstrated and their explanations are collected in /2/, for example, one from Samsung /14/ and one from Bosch /16/. The principle of operation of a gyro sensor presented in /14/ is suggested in the left part of Figure 18. If the mass M is suspended from the suspended frame and is vibrating in a vertical direction as suggested, then because of Coriolis acceleration the inner frame is vibrating perpendicular to the direction of the vibration of the mass M: this vibration is sensed by comb capacitors (sense fingers). Because of imperfections, part of the driving vibrations Is transferred to the sense direction and produce output even without 205 Informacije MIDEM 37(2007)4, str. 199-209 D. Strie, V. Kempe: MEMS-Based Inertial Systems Coriolis acceleration (ZRO). This transferred movement is in quadrature to the Coriolis movement: part of It can be removed by electrostatic trimming. Unfortunately, remaining quad-bias increases the demand for already high dynamic range in the measurement channel and produce cross-talk from driving to the sensing mode. In addition, mechanical shocks and vibrations can seriously corrupt the operation of such sensors and reduce their usefulness, especially in safety critical systems like automotive. Companies have improved robustness by using a fully differential sensor that reduces the influence of vibrations and shocks by order of magnitude. Resonating mass Springs _ Inner frame Outer frame— -WW AiWl- Sens fingers— * mm lliäiii HI jgUH slli WÊÊm hHb mam Fig. 18: Schematic presentation of Samsung gyro and its photomicrograph Further improvements are possible by mechanical decoupling of drive and sense vibrating modes implemented in gyro sensors produced by Bosch /16/ (Figure 19) and further Improved in other implementations (Figure 20). The operation of the sensor presented in Figure 20 is as follows. Driving electrostatic force applied to blue comb capacitors causes the sensor to start oscillating around z-axls. Red comb capacitors sense the rotation of the sensor around z-axis. When no Coriolis acceleration is present, the sensor is ideally vibrating around z-axis with no displacements in z direction. Coriolis acceleration around y-axis causes the ring to start moving vertically: these move- ••I», »ik"", .ï.ïïi' Fig. 19: x-axis vibratory gyroscope by Bosch Fig. 20: Vibratory gyro with decoupled primary and secondary motion ments are transferred to the plate capacitors through the springs that are stiff in z and y direction; Plate capacitance change is a measure of the Coriolis acceleration around y-axis. Driving frequency must be precisely controlled using appropriately-tuned PLL, while amplitude of oscillation in the resonance is controlled by the AGC: the scale factor is proportional to the velocity of the drive mode and therefore to the amplitude; It is constrained by material, gaps between combs, reliability of operation, good S/N ratio, etc. The coupling from drive mode to the sense mode owing to Coriolis force is weak and must be amplified mechanically by operating in the resonance if possible. Drive and sense mode could be approximately described by 2nd order transfer function modeling a mass-dumping-spring system, where the dominant dumping factor comes from the movement in the air. If proof mass is vibrating in a vacuum, then very high Q could be realized (several 10000) and if driving and sensing frequencies could be matched, the coupling would be amplified by the Q factor. Unfortunate- 206 D. Strie, V. Kempe: MEMS-Based Inertlal Systems Informacije MIDEM 37(2007)4, str. 199-209 ly, drive and sense resonance frequencies are different and it is almost impossible to match them precisely, even by using electrostatic trimming and electronic tuning, which Is suggested In experimental Implementation /17/. Here, reported performances are considerably better than any other MEMS gyro sensor reported up to now, as far as bias stability and noise floor are concerned. Mode matching causes higher sensitivity and greater scale factor, but the sensor becomes a very narrow band and sensitive to vibrations and shocks. A compromise between sensitivity, bandwidth and robustness is always necessary. Another concern is quad-bias signal: It happens because part of the driving movement Is transferred to the sensing mode owing to imperfections in the mechanical structure; it corrupts the sensing channel. Fortunately, it is in quadrature with bias signal and it can be almost completely removed by appropriate signal-processing. Quad-bias signal could be orders of magnitude bigger than bias signal, so linearity and SnR requirements of the measurement channel are very demanding. 9 Electronic systems concepts and system integration Electronics system concepts for accelerometers and gyro sensors are very similar. The gyro sensor, however, requires electronic with much better performances because of weaker signals, Most high performance MEMS gyro sensors operate in a vacuum to reduce the influence of Brown-ian motion of the molecules of the air and thus reduce the noise floor of the sensor. Consequently the noise floor for that kind of device is limited by the noise floor of the sensing channel of the ASIC and by analogue and digital signal-processing of driving and sensing signals. Simulation for efficient design of inertial system require accurate modeling of sensor and electronics. Udcw UdCCV. u,„ u„ Upp. Up, Mew C^ccw -HI— UdCCW. Qpp OPn Hz Wz —>a ■4 ©z3 1© m MECHANICAL CHARGES Fig. 21: Simplified model of a gyro sensor. Figure 21 shows simplified coupled mechanical-electrical model of a gyro sensor. The sensor is driven In resonance by sine wave-like signals generated In the ASIC PLL: amplitude of oscillations is maintained by the use of AGC. Rotation of the sensor is sensed through measurement of capacitors Cmsi and CmS2 (Figure 23) using HF sensing voltages Umi and Umi■ Sensing voltages Upn and Upp are used to sense plate capacitances. One plate of each capacitor of a sensor is connected to the common node and thus all charges from driving, motor sensing and plate capacitors are added together: to distinguish individual contributions, different sensing frequencies can be used. ............jf.....ï N^ci ....... --"-**"'! i i n"*----^ i <*> ; ; Frequency [Hz] Fig. 22: Characteristics of Gyro sensor Figure 22 shows resonance characteristics of a sensor. Sensing dynamics for constant rate can be described by (6.5). For matched modes, the highest sensitivity could be achieved but with very small bandwidth and very poor robustness against vibrations, while for mismatched modes the sensitivity is smaller but the bandwidth is bigger, and robustness is increased. y + 2t,0)yy + tfy = 2Q(ûi/xo sin (©/) (6.5) A simplified electrical model of a sensor is presented in Figure 23. Vd2p 0 Vbp O Vmp O Hh Sum./ Con / y y > s S ' ibo S iniD / ipp 7T7' 7" ^ Fig. 23: Simplified electrical model of a sensor Approximate capacitances for sensor presented in Figure 20 are given in (6.6): Cpar~ 5-10 pF Cdx=0.\pF Cmx = 0.2 pF CM. =~0.05 pF (6.6) Cp^2pF ACp= 0.05 aF (fl = 0.1°/s) 207 Informacije MIDEM 37(2007)4, str. 199-209 D. Strie, V. Kempe: MEMS-Based Inertial Systems We can see that the biggest capacitor is parasitic capacitance Cpar. Sensor capacitances are much smaller while changes owing to Coriolis acceleration are extremely small: electronics must resolve capacitance changes smaller than 0.05aF at angular speed of Q. = 0.1 °/s in as big a bandwidth as possible, which requires the dynamic range of a measurement channel in a range of more that 150dB. 9.1 Sensing electronics Fig. 24: Analogue and digital signal-processing block diagram Figure 24 shows a block diagram of analogue and digital signal-processing needed to drive the sensor and sense all charges. Critical element regarding noise is charge amplifier Gc (s), which transforms the spectrum of added HF and LF charges from the sensor into appropriate voltages. Since CMOS transistors are contaminated with 1 /f noise this is one of the reasons for using HF sensing signals, as suggested in Figure 25. Sensing components must appear out of 1 /f noise to be able to detect very small capacitor changes. A second reason for using high U)s-Wd Ws+Wd OJs Fig. 25: HF sensing spectrum sensing frequency is reduction of feedback "resistor" in the charge amplifier, which is needed to keep voltages around reference level and to provide appropriate low impedance virtual ground for the moving structure and com- mon plate of sensors capacitors. Other noise sources of the ASIC may corrupt the measurement channel, so every possible noise source in a system must be carefully analyzed and optimized. The following noise sources were considered: charge amplifier thermal and 1/f noise, input referred noise of programmable gain stages and mixers, quantization and circuit noise of the A/D converters, 1/f and thermal noise of sensing and driving signals, quantization noise caused by different digital signal-processing algorithms, etc. In addition, cross-talk noise coming from on chip digital signal-processing blocks to the sensitive analogue channels were among the hardest problems to solve during the design of the MEMS inertial system. 10 Packaging MEMS inertial sensors are mechanical elements, so it is very important to reduce any stress during production or during use, because mechanical stress can eventually change the behavior of sensitive mechanical structures. Wafer level packaging provides an appropriate level of vacuum for a sensor. The whole sensor is then placed in an open cavity package or into a plastic package together with the ASIC. Figure 26 shows the gyro sensor and ASIC in an over-molded plastic package. In this case, the main problems are related to mechanical stress because of different expansion coefficients of different materials. \ ASIC "Leadframe Damping Gel Sensor Fig. 26: Gyro sensor in plastic package 11 Calibration and test Fig. 27: Turning table in temperature chamber 208 D. Strie, V. Kempe: MEMS-Based Inertlal Systems Informacije MIDEM 37(2007)4, str. 199-209 For efficient production of MEMS inertial systems the calibration and the test are as important as all other activities. During production, every sensor-ASIC pair is carefully calibrated and tested at different temperatures using automated test equipment (Figure 27). Companies like SensorDy-namics are working on new equipment for handling accelerated tests and calibration procedures. 12 Conclusions An overview of MEMS-based inertial systems is presented in the article. The principles of operation of acceleration and gyro sensors have been presented, together with some historical background, followed by specification of important parameters for different applications. The technology driver for inertial systems is automotive industry, which requires reliable precise and cheap systems. MEMS technology is the most appropriate technology at the moment and provides batch-processing possibility like ASICs technology, which is the basis for price reduction. The final goal of integration, however, which has not yet been reached, is a six degrees of freedom inertial system with three accelerometers, three gyros and all signal-process-ing electronics integrated in a single package or even in a single chip. Basic technology steps needed for the implementation of MEMS inertial sensors are presented, together with possible problems and some basic structures for acceleration and gyro sensors. The design of MEMS-based inertial systems requires careful modelling of sensor, electronics and package and optimization of inter-related parameters. In vacuum-packed sensors, the resolution is limited by noise of signal-processing electronics, so the HF sensing principle can be used. In addition, all possible noise sources must be carefully analyzed and optimized. MEMS inertial systems need thorough testing of all electronic blocks and sensors. To meet all specifications in automotive temperature range extensive calibration at different temperatures is necessary. References /1/ B. McCullom and O. S. Peters, "A New Electric Telemeter,"Technology Papers, National Bureau of Standards No. 247, Vol. 17, January 4, 1924. /2/ N. Yazdl, F. Ayazi and K.Najafi, "Micro machined Inertial Sensors, " Proc. IEEE, vol. 86, No. 8, Aug. 1998. pp. 1640-1659. /3/ M. W. Judy, "Evolution of integrated inertial MEMS technology," in Proc. Solid-State Sensor, Actuator and Microsystems Workshop, 2004, pp. 27-32. /4/ T. Soheiter, H. Kapels, K.-G. Oppermann, M. Steger, C. Hier-old, W. M. Werner and H,- J. Timme, "Full integration of a pres- sure-sensor system into a standard BiCMOS process," Sens. Actuators A, vol. 67, pp. 211-214, 1998. /5/ A. E. Franke, J. M. Heck, T.-J. King and R. T. Howe, "Polycrys-talline silicon-germanium films for integrated microstructures," J. Microelectromech. Syst., vol. 12, pp. 160-171, 2003. /6/ O. Brand, G.K. Fedder, „CMOS-MEMS, Advanced Micro and Nano systems," Wiley-VCH 2005, vol. 2. /7/ G. T. A. Kovacs, N. I. Maluf and K. E. Petersen, "Bulk microma-chining of silicon," Proc. IEEE, vol.86, no.8, pp. 1536-1551, Aug. 1998. /8/ J. M. Bustillo, R. T. Howe and R. S. Muller, "Surface microma-chining for micro-electromechanical systems, " Proc. IEEE, vol. 86, no.8, pp. 1552-1574, Aug.1998. /9/ O. Brand, "Microsensor Integration Into Systems-on-Chip", Proc. IEEE, vol. 94, No. 6, June 2006. /10/ H. Seidel, R. Reider, R. Kolbeck, G. Muck, W. Kupke and M. Koniger, „Capacitive silicon accelerometer with highly symmetrical design," Sensors Actuators, vol. A21/A23, pp. 312-315, 1990. /11/ K. J. Ma, N. Yazdi and K. Najafi, "A bulk-silicon capacitive micro accelerometer with built in over range and force feedback electrodes," in Tech. Dig. Solid-State Sensors and Actuators Workshop, Hilton Head Island, SC June 1994, pp. 160-163. /12/ B. Boser and R. T. Howe, "Surface micro machined accelerometers," IEEE, J. Solid-State Circuits, vol. 31, pp.366-375, Mar. 1996. /13/ M. Weinberg, J. Bernstein, S.Cho, A.T.King, A. Kourepenis, P. Ward and J. Sohn, "A micro-machined comb-drive tuning fork gyroscope for commercial applications, " in Proc. Sensor Expo, Cleveland OH, 1994, pp. 187-193. /14/ K. Tanaka, Y. Mochida, M. Sugimoto, K. Moriya, T. Hasegawa, K. Atsuchi and K. Ohwada, "A micro-machined vibrating gyroscope," Sensors Actuators A, vol. 50, pp. 111-115, 1995. /15/ J. Geen and D. Krakauer, "New iMEMS Angular-Rate-Sensing Gyroscope," Analog Devices (http://www.analog.com/analog-Dialog/archives/37-03/gyro.html, 2003. /16/ W. Geiger, B. Folkmer, J. Merz, H. Sandmaierand W. Lang, „A new silicon rate gyroscope," in Proc. IEEE Micro Electro Mechanical Systems Workshop (MEMS'98), Heidelberg, Germany, Feb. 1998, pp.615-620. /17/ A. Sharma, M. F. Zaman and F. Ayazi, "A 0.2§/h Micro-Gyro-scope with Automatic CMOS Mode Matching," ISSCC-2007 digest of technical papers. Drago Strle University of Ljubljana, Faculty for Electrical Engineering Tržaška 25, Ljubljana e-mail: drago. strie@fe. uni-ij.si Volker Kempe SensorDynamics A.G. Austria Prispelo (Arrived): 15.07.2007 Sprejeto (Accepted): 01.09.2007 209 UDK621.3.'(53+54+621 +66), ISSN0352-9045 Informacije MIDEM 37(2007)3, Ljubljana CHALLENGING ISSUES IN ELECTRONIC TESTING Franc Novak Institut Jožef Stefan, Ljubljana, Slovenia EDITORIAL NOTE TO INVITED PAPERS MIDEM 2007 CONFERENCE - WORKSHOP ON ELECTRONIC TESTING 12.09. 2007 - 14.09. 2007, Bled, Slovenia Key words: test, diagnosis, nano-scaie systems, thermal-aware test, SiP test Abstract: This editorial note summarizes the topics of the Workshop on Electronic Testing, Bled, Slovenia, September 13, 2007. The workshop gathered participants from seven European countries presenting papers on their current research in the areas of digital, sensor and mixed-signal test. Testing, debugging and diagnosing nano-scale systems, thermal-aware SoC testing and system-in-package test were hot topics covered by the Invited papers. Sodobni izzivi na področju testiranja elektronskih vezij in sistemov Kjučne besede: testiranje, diagnostika, nano sistemi, testiranje ob upoštevanju energijske uporabe, testiranje sistemov integriranih na nivoju sestavov Izvleček: V okviru 43. mednarodne konference MIDEM je bila 13.septembra 2007 na Bledu organizirana delavnica o testiranju elektronskih vezij in sistemov. Udeleženci iz sedmih evropskih držav so predstavili svoje raziskovalne rezultate s področja testiranja digitalnih vezij, senzorjev in mešanih analogno digitalnih vezij. Vabljena predavanja so osvetlila aktualne probleme testiranja in diagnosticiranja vezij izdelanih s sodobnimi nano-tehnološkimi postopki, testiranja ob upoštevanju energijske porabe ter testiranja sistemov integriranih na nivoju sestavov (angl. System-in-Package, SiP). Nowadays, electronic testing is addressing challenging problems of providing high-quality cost-effective tests coping with ever-increasing design complexity of modern electronic devices. Increased interest in the above problems and the growing needs for practical solutions led to the decision of the programming committee of the International Conference on Microelectronics, Devices and Materials to host a workshop on this topic in the frame of the 43rd International Conference MIDEM 2007. According to the established practice, special issue of "Informacije MIDEM" is devoted to the key points of the event. This issue features, among others, invited papers from the resulting Workshop on Electronic Testing, Bled, Slovenia, September 13, 2007, The workshop gathered participants from seven European countries presenting papers on their current research in the areas of digital, sensor and mixed-signal test. Testing, debugging and diagnosing nano-scale systems, thermal-aware SoC testing and system-ln-pack-age test were hot topics covered by the invited papers. In the nanometer domain, fabrication defects, variations in device parameters and coupling effects in the interconnect impact the device behavior and produce significant variability in timing, drive, leakage, and so forth. Instead of determinism and static analysis employed so far, the new design process is increasingly becoming subject of statistical variation, which consequently calls for a new paradigm in verification and test. As robust design becomes mandatory to ensure fail safe operation and acceptable yields, design robustness invalidates many traditional test approaches. The new test solutions should also incorporate verification of robustness properties. The RealTest Project presented by the first invited speaker Sybille Helle-brand addresses the problems of robust design and associated efficient test procedures. The need of new fault models comprising statistical profiles of circuit parameters and conditions for fault detection is demonstrated in the case study of single event transients in random logic. In contrast to the traditional transient current model a refined model (also referred to as UGC model) is proposed more closely focusing on the impact of a single event transient (SET) on a pn-junction. The analysis of the impact of UGC model based on the simulations of a set of finite state machine benchmarks with randomly injected SET in the combinational logic have shown that longer duration of glitches is likely to affect the behavior of the circuit and should therefore be considered in both system design and test. Fault diagnosis of electronic systems has been a challenge for decades. From the early 70s marked by the fundamental reference Fault diagnosis of digital systems authored by H.Y.Chang, E.Manning and G.Metze, technology made tremendous progress while theoretic fault diagnosis principles remained basically unchanged. Emerging nano-scale systems on chip make debug and diagnosis a complicated and difficult task /1/. Establishing good quality control and improvement over the whole system life cycle necessitates alternative descriptions and treatment of er- 210 F. Novak: Challenging Issues in Electronic Testing Informacije MIDEM 37(2007)4, str. 210-211 rors In individual steps of the life cycle, points out Hans-Joachim Wunderlich in his invited paper. New approaches are needed for description of effects originating from defects in nano-scale technologies. While in the past, fault detection and location were considered two distinct steps for achieving diagnosis, in very deep submicron technology their roles are indivisible and very demanding. Diagnostic resolution is determined by the test set, hence diagnostic ATPG focused on fault localization may lead to more precise localization of defects. Adaptive diagnosis approach described in the paper by an illustrative example gives encouraging results. Design for debug and diagnosis (DDD) is the next obvious step to increase diagnostic resolution in practice. Major problems to be overcome by DDD are highlighted in the paper with some more detailed discussion related to compaction techniques and trace buffers. Full-scan methodology supported by Automatic Test Pattern Generation (ATPG) is a widely adopted test strategy for testing integrated circuits. In this approach, considerable peak power occurs periodically at the capture cycles during Input application and result capturing. Excessive test power dissipation may permanently damage a circuit under test or reduce its reliability. High temperature has become a technological barrier to the testing of high performance SoC, especially when deep submicron technologies are employed. In his invited paper, Zebo Peng discusses several issues related to the thermal problem during SoC testing. In the following, he presents a thermal-aware SoC test scheduling technique to generate the shortest test schedule such that the temperature constraints of individual cores and the constraint on the test-bus bandwidth are satisfied. The idea is to partition the entire test set into a number of test sub-sequences and to introduce a cooling period between two consecutive test sub-sequences in order to avoid overheating. Since long cooling periods may substantially increase test time some additional test scheduling heuristics have been developed and are presented in the paper. Comparison of different strategies given in the experimental case study demonstrates the advantages of the developed test scheduling heuristics. According to the definition given by the International Electronics Manufacturing Initiative (iNEMI) /2/ "System in Package (SiP) is characterized by any combination of more than one active electronic component of different functionality plus optionally passives and other devices like MEMS or optical components assembled preferred into a single standard package that provides multiple functions associated with a system or sub-system". (SiP) design is gaining importance because it can provide a number of advantages over SoC in specific areas such as high-performance consumer electronics or mobile phones. Its design inherently poses problems of testing due to a broad mix of process technologies, different types of interconnections and 3-D stacked IC. In presenting design and test issues of SiP, Michel Renovell first outlines quality concerns for achieving the required acceptable yield (e.g., known good die concept). Next, problems related to bare-die testing are addressed. New probing techniques developed due to the limitations of traditional cantilever probe card technology are summarized. Challenges in system test imposed by RF, analog and MEMS components are described in the last part of the paper. Since mixed-signal and RF test requires expensive test instrumentation alternative test solutions such as built-in self-test or signal transformations from analog to digital domain are often preferred. Initiative for setting up a SIP test standard similar to IEEE Std 1149.1 or IEEE Std 1500 is discussed. The Workshop on Electronic Testing brought together participants from both industry and academia providing a forum for the exchange of ideas and dissemination of research results. Since this was the first workshop on testing organized in Slovenia it had also the goal of identifying local groups and individuals working on test and establishing close contacts among them. ETTTC (IEEE European Test Technology Technical Council) /3/ fosters national activities and encourages test society to joint collaboration. An example of successful joint collaboration has been the European IST project EuNICE-Test (European Network for Initial and Continuing Education in VLSI/SOC Testing using remote automatic test equipment (ATE)), /4/, addressing the shortage of skills in the microelectronics industry in the field of electronic test. In this project, 4 academic centres: Universität Politécnica de Catalunya, Spain, Politécnico di Torino, Italy, University Stuttgart, Germany and Jožef Stefan Institute Ljubljana, Slovenia, joined the existing network accessing remote test facilities (Agilent 83000 F330t ATE for testing digital and mixed-signal integrated circuits) located at CRTC In Montpellier. Established links offer opportunities for future collaboration including exchange of information, consulting, joint experimental work of Ph.D. thesis up to bilateral research projects and EC projects. References /1/ IEEE Design and Test of Computers, special issue on Defect-oriented diagnosis for very deep submicron systems, Vol. 18, No. 1, 2001. /2/ The International Electronics Manufacturing Initiative (iNEMI), http://www.ineml.org/oms/ /3/ European Test Technology Technical Council, http ://www. etttc .org/ /4/ F. Novak, A.Biasizzo, Y. Bertrand, M. Flottes, L.Balado, J. Figuer-as, S. Di Carlo, P. Prinetto, N. Pricopi, l-kJ. Wunderlich, J-P. Van der Hayden, Academic network for test education, International Journal of Engineering Education, Vol. 23, No. 6, 2007, pp. 1245-1253. Dr. Franc Novak Institut Jožef Štefan, Jamova 29 1000 Ljubljana, Slovenia franc.novak@ijs. si 211 UDK621.3.'(53+54+621 +66), ISSN0352-9045 Informacije MIDEM 37(2007)3, Ljubljana TESTING AND MONITORING NANOSCALE SYSTEMS -CHALLENGES AND STRATEGIES FOR ADVANCED QUALITY ASSURANCE Sybille Hellebrand1, Christian G. Zoellin2, Hans-Joachim Wunderlich2, Stefan Ludwig3, Torsten Coym3, Bernd Straube3 1 University of Paderborn, Germany 2University of Stuttgart, Germany 3Fraunhofer IIS-EAS Dresden, Germany INVITED PAPER MIDEM 2007 CONFERENCE - WORKSHOP ON ELECTRONIC TESTING 12.09. 2007 - 14.09. 2007, Bled, Slovenia Key words: nanoelectronic systems, soft errors, robust design, testing for quality assurance, single event upset Abstract: The Increased number of fabrication defects, spatial and temporal variability of parameters, as well as the growing impact of soft errors in nanoelectronic systems require a paradigm shift In design, verification and test. A robust design becomes mandatory to ensure dependable systems and acceptable yields. Design robustness, however, invalidates many traditional approaches for testing and implies enormous challenges. The RealTest Project addresses these problems for nanoscale CMOS and targets unified design and test strategies to support both a robust design and a coordinated quality assurance after manufacturing and during the lifetime of a system. The paper first gives a short overview of the research activities within the project and then focuses on a first result concerning soft errors in combinational logic. It will be shown that common electrical models for particle strikes in random logic have underestimated the effects on the system behavior. The refined model developed within the RealTest Project predicts about twice as many single events upsets (SEUs) caused by particle strikes as traditional models. Testiranje nanosistemov - izzivi in strategije zagotavljanja kakovosti Kjučne besede: nanoelektronskl sistemi, mehke napake, robustno načrtovanje, testiranje za zagotavljanje kakovosti, nenadni osamljeni dogodki Izvleček: Povečano število defektov pri izdelavi, prostorska In časovna spremenljivost parametrov, kakor tudi rastoč vpliv mehkih napak v nanoelektron-skih sistemih zahteva spremembe v njihovem načrtovanju In testiranju. Robustno načrtovanje postaja nujno za zagotavljanje delovanja in sprejemljivega izkoristka. Tako načrtovanje pa zavrača do sedaj mnoge tradicionalne pristope k testiranju in tako postavlja nove Izzive. Projekt RealTest naslavlja opisane probleme pri CMOS nanosistemih in si za cilj zastavlja združeno načrtovanje in testno strategijo z namenom doseči robustno načrtan sistem, ki bo proizvodljiv z zagotovljeno kvaliteto. V prispevku opišemo raziskovalne aktivnosti v okviru tega projekta in se osredotočimo na prve rezultate glede mehkih napak pri komblnacijski logiki. Pokažemo, da z novimi modeli, ki simulirajo nenadne osamljene dogodke, lahko bolje napovemo in simuliramo napake na nivoju celega sistema. 1 Introduction* Continuously shrinking feature sizes offer a high potential for Integrating more and more functionality into a single chip. However, technology scaling also comes along with completely new challenges for design and test. As in the past, manufacturing defects are still a major problem, and efficient test and diagnosis procedures are needed to detect and sort out failing devices. While "random" or "spot" defects, such as shorts or opens, have been the major concern so far, the scenario has changed in the nanoscale era. The increasing variability of transistors, the degradation of devices, as well as the increasing susceptibility This work has been supported by the DFG-grant "RealTest". to transient faults during system operation lead to massive reliability problems /5, 39/. One major reason for static parameter variations is sub-wavelength lithography. For nanoscale fabrication processes the wavelength used for lithography is greater than the size of the structures to be patterned. As in pictures with a low resolution, the resulting structures don't have exactly the intended contours. Even if techniques for resolution enhancement (RET) such as optical proximity correction (OPC) are applied, these effects cannot be fully compensated /24/. A second source of static variability is the extremely small number of dopant atoms in the channel of a transistor. Al- 212 S. Hellebrand, C. G. Zoellin, H.-J. Wunderlich, S. Ludwig, T. Coym, B. Straube: Testing and Monitoring Nanoscale Systems Informacije MIDEM 37(2007)4, str. 212-219 though the concentration of dopant atoms in the channel remains more or less constant, the decreasing channel lengths lead to an exponential decrease of the number of dopant atoms with successive technology generations, and below 50 nm only tens of atoms are left. This implies that the "Law of Large Numbers" is no longer valid and disturbances in a few atoms already result in different electric characteristics of the transistors, as for example different threshold voltages. This phenomenon is also referred to as "random dopant fluctuations". Finally the varying power density in different components of a system is a reason for dynamic parameter variations. Extremely high switching activity in certain areas, e.g. the ALU in a microprocessor, may for example cause "hot spots" which in turn may result in voltage droops and supply voltage variations. During the lifetime of a chip, aging and degradation of devices can produce new permanent faults, which stay in the system. Transient faults or "soft errors", which affect the system operation for a short time and then disappear again, can be caused by a-particles emitted from the packaging material or by cosmic radiation. Traditionally, soft errors have only been considered for memories, because the more aggressive design rules for SRAM and DRAM arrays made them more susceptible to particle strikes. Meanwhile, a saturation of the soft error rate (SER) in memories can be observed, while the vulnerability of combinational logic and latches is increasing /2, 13/. To cope with these inevitable problems, a "robust" design will become mandatory not only for safety critical applications but also for standard products. On the one hand, a shift from deterministic to statistical design is necessary to deal with parameter variations /5, 39/. On the other hand, fault tolerance and soft error mitigation techniques are necessary to compensate a certain amount of errors /2, 13, 29, 34/. However, the changing design paradigms also require a paradigm shift in test. As "robust" systems are designed to compensate faults to a certain extent, it is no longer sufficient to classify chips into passing and failing chips. Instead, additional information about the remaining robust-ness of passing chips is required ("quality binning"). Furthermore, the "acceptable" behavior of a system may vary within a certain range, which is possibly application specific (e.g. accuracy or speed). Consequently, test development cannot only be based on classical measures such as fault coverage, but tests have to verify that modules fulfill their specifications including robustness properties. Additional problems arise, because traditional observ-ables such as Iddq are no longer reliable failure indicators. 2 The RealTest Project The problems explained above are addressed by the Real-Test Project, which targets unified design and test strategies supporting both a robust design and efficient test pro- cedures for manufacturing test as well as online test and fault tolerance. The project is a joint initiative of the Universities of Freiburg (Bernd Becker, Ilia Polian), Stuttgart (Hans-Joachim Wunderlich), and Paderborn (Sybille Hellebrand), and the Fraunhofer Institute of Integrated System Design and Design Automation Dresden (Bernd Straube) /4/. It is funded by the German National Science Foundation (DFG) and gets industrial support from Infineon Technologies, Neubiberg, and NXP Semiconductors Hamburg. In detail the research focus is on the following topics: Fault modeling, State monitoring in complex systems, Testing fault tolerant nanoscale systems, Modeling, verification and test of acceptable behavior. The research activities are strongly dependent on each other. To design for example a robust system, which can compensate disturbances during system operation, a detailed analysis of possible defect and error mechanisms is indispensable. This analysis must take into account statistical variations of the circuit parameters and provide a statistical characterization of the resulting behavior. Depending on the results the appropriate design and fault tolerance strategies can be selected. Particular attention must be paid to flip-flops and latches, as they become the dominating components in random logic and are extremely vulnerable. As the known techniques for hardening flip-flops and latches are very costly, new efficient techniques for state monitoring are needed. The design strategy and the data obtained by the initial defect and error analysis determine the constraints for the test of the system. The cost for test and design can be reduced, if it is possible to identify critical and non-critical faults depending on the application. For example, a fault in a DVD player resulting in only a few faulty pixels at certain times is tolerable for the user and need not be considered. A precise and application specific model of the acceptable behavior of the system is the basis for this step. A short outline of the specific problems dealt with in each topic is given in the following subsections. 2.1 Fault modeling Defects, soft errors and parameter variations in future technologies cannot be accurately characterized by existing fault models. To be able to deal with the complex physical phenomena responsible for the circuit behavior, new fault models must be developed comprising, in particular, statistical profiles of circuit parameters and conditions for fault detection. This work is based on techniques for inductive fault analysis, which extract the behavior of defective layouts via the electrical level to higher levels of abstraction /18/. As classical approaches for inductive fault analysis do not take into account spatial and temporal variabilities, they must be extended accordingly. A first result concerning soft errors in combinational logic will be described in Section 3. 213 S. Hellebrand, C. G. Zoellin, H.^J. Wunderlich, S. Ludwig, Informacije MIDEM 37(2007)4, str. 212-219 T. Coym, B. Straube: Testing and Monitoring Nanoscale Systems ... 2.2 State monitoring in complex systems The percentage of flip-flops in logic components is rapidly growing, which is for example due to massive pipelining or speculative computing based on large register files. In particular, fault tolerant architectures rely on redundant structures and also work with an increased number of memory elements. Already today, circuits with more than a million flip-flops can be found both in data dominated and in control dominated designs /21/. Flip-flops are particularly susceptible to hard and soft errors, and, as it will be analyzed in more detail in Section 3, soft errors in the combinational logic also propagate to the system flip-flops with a higher probability than assumed so far. An additional problem appears in power aware designs, where clock gating is used to keep the system state for longer periods of time. Similar as the contents of memory arrays, the system state is then exposed to disturbances over longer time spans. Ensuring the correct system state is thus a problem of major importance. However, while online testing and monitoring of memory arrays is already state of art, respective techniques for logic circuitry are still in their infancy. Here the goal is to investigate monitoring techniques and reconfiguration strategies, which are suitable for both manufacturing and online test. In particular new and robust hardware structures for scan chains are under development. Similar as in memory arrays, the key issue is not to harden each single memory element but to partition the flip-flops into appropriate subsets, which can be monitored with the help of failure characteristics /14/. 2.3 Testing fault tolerant nanoscale systems On the one hand robust design styles are contradictory to traditional design for testability rules, as they decrease the observability of faults. On the other hand fault masking helps to increase yield. Consequently, a "go/nogo" test result is no longer satisfactory, instead information about the remaining robustness in the presence of faults is needed for quality binning. As classical fault tolerant architectures such as triple modular redundancy (TMR) are very costly to implement, they are still restricted to safety critical applications /36/. For other systems, less hardware intensive solutions are of particular interest. The research activities within the project therefore focus on self-checking designs, which are able to detect errors and initiate a recovering phase once an error has happened /33/. Typically self-checking systems aim to achieve the totally self-checking goal (TSC), i.e. to detect an error when it results in a wrong output for the first time. Strongly fault secure circuits e.g. achieve the TSC by guaranteeing for each fault either a test pattern or fault free operation even in the case of fault accumulation /40/. Design guidelines for strongly fault secure circuits are already given In /40/, more advanced techniques are described in /22/. In principle, tools for automatic test pattern generation (ATPG) can be used to both verify the self-checking properties of the design and to generate test patterns for manufacturing test. Clearly an ATPG tool can verify the existence of test patterns, and checking fault free operation in the presence of faults corresponds to the known problem of redundancy identification. However, there are several challenges, which are not yet addressed in state of the art tools. To deal with fault accumulation, the tools must be able to handle multiple faults efficiently. Furthermore, self-checking designs usually work with input and output encoding, and test patterns for online checking must be in the input code and result in a circuit response outside the output code. This requires ATPG with respective constraints. For manufacturing test, the fault model may be different from that for online checking. The interaction between both fault models must be analyzed, and a test set must be determined which can detect not only manufacturing defects but also reduced self-checking properties. 2.4 Modeling, verification and test of acceptable behavior As mentioned above, the behavior of nanoscale systems may be "acceptable" within a certain range, which is possibly application specific (e.g. accuracy or speed). This observation has been exploited in /6/ to introduce the concept of error tolerant design. Within the framework of the RealTest Project a more general approach is followed to develop metrics for "acceptable behavior" taking into account aspects of both offline and online testing. Along with the development of respective metrics and their integration into ATPG tools, an important issue is to provide means for estimating the impact of hard or soft errors. The "severity" of a soft error in a sequential circuit can for example be measured by the number of clock cycles the system needs to return to a fault free state /12/. The respective classification of soft errors in /12/ is based on a temporary stuck-at fault model for soft errors and an efficient estimation of the error probability Perr associated with each fault. Perr reflects the probability that a soft error causes an erroneous output or system state. It can also be used as a guideline for selective hardening of circuit nodes /30/. 3 Single event transients -An underestimated problem As soft errors in random logic are a key challenge in nanoscale systems, within the framework of the Real Test Project special emphasis has been placed on modeling the effects of particle strikes in combinational logic /15/. The results of this work have shown that soft errors in random logic are still an underestimated problem. In particular, it has been shown that in the majority of investigated cases soft errors remain in the system about twice as long as predicted by traditional approaches. For a better understanding of these results, the differences between tradi- 214 S. Hellebrand, C. G. Zoellin, H.-J. Wunderlich, S. Ludwig, T. Coym, B. Straube: Testing and Monitoring Nanoscale Systems ... Informacije MIDEM 37(2007)4, str. 212-219 tional modeling and the refined approach from /15/ are pointed out in more detail in the sequel. A particle strike in combinational logic can cause a glitch in the output voltage of a logic gate /8/. Usually such a "single event transient" (SET) only leads to a system failure, if it can propagate to a register and turn into a single event upset (SEU) there. As a precondition, propagation paths must be sensitized in the logic, and the glitch must arrive at the register during a latch window /23, 31/. In Figure 1 this is illustrated for a small example. Fig. 1: Logical and latch window masking. If the particle strike at the AND gate produces a glitch at the output, this can only be propagated through the OR gate for w = 0. The glitch at the output of the OR gate is not latched in the FF, because it has disappeared before the next rising edge of the clock. In addition, depending on the amplitude of a glitch, its propagation can also be prevented by electrical masking /9/. Overall, it is particularly important not only to predict the occurrence of an SET but also to accurately characterize its expected shape. State of the art device simulators allow a precise characterization of SETs, but they are also highly computationally intensive /10/. In many cases circuit level techniques offer a good compromise between accuracy and computational cost /3, 20, 25, 32, 35/. They can also be combined with device level analysis to mixed level approaches /9, 10/. 3.1 Refined electrical modeling for particle strikes Most circuit level approaches model the effect of a particle strike with the help of a transient current source as shown in Figure 2. A common approximation to determine the current slope /(f) is the double exponential function in equation (1) /28/. Here xa is the collection time-constant of the pn-junction, and Xb denotes the time-constant for establishing the electron-hole track. s 6 ob NMOS Fig. 2: Transient current model. m=h exp|-- -exP|-- (1) An alternative model is given by formula (2) with parameters Q, x and K, where Q is the collected charge, x is a pulse-shaping parameter and K is a constant /11/. KQ t exp| (2) Both models assume a constant voltage V across the pn-junction and do not considerthe interdependence between charge collection and the change in voltage overtime. This simplification is appropriate for modeling strikes at a significant distance from a pn-junction, where charge is collected by diffusion. However, if an a-particle or a heavy ion generated by a neutron strike crosses a pn-junction, this leads to a "tunneling" process, which has first been described by Hsieh for a-particle strikes /16/. Here, charge collection by drift is the dominating phenomenon, and this process depends on the electric field strength, and thus on the voltage. Among several models for the charge collection by drift, Hu's model has been selected as the basis for the work in /15/, because it is also valid for variable field strength /17, 27, 28/. Hu only considers a-particle strikes, but it has been shown by device simulations that ions crossing a pn-junction lead to similar effects /37/. For the sake of simplicity, in the following explanations it is assumed that the particle strikes the pn-junction at an angle of 90°, and the discussion is restricted to NMOS without loss of generality. The particle strike in Figure 3 generates a track of free electron-hole-pairs, which disturbs the depletion zone. The electrons from the track are drifting to the drain/source region while the holes are drifting into the substrate generating an electric field. The depletion zone is gradually regenerated in the regions where no holes are left over. This tunneling process is finished when all the holes have drifted out of the original depletion zone. To model the current flow Hu assumes an ideal voltage source V as depicted in Figure 3. 215 S. Hellebrand, C. G. Zoellln, H.-J. Wunderlich, S. Ludwig, nformacije MIDEM 37(2007)4, str. 212-219 T. Coym, B. Straube: Testing and Monitoring Nanoscale Systems ... V ► W ■ ...... + - ■ n—^,...: / ^DPL v+u„-uDPL + i ; y Rc p-substrate Fig. 3: Funneling process. In addition to V, the drift current Idridt) is determined by the diode potential Ud of the pn-junction, the voltage Uopdt) across the depletion zone, the resistance Rt of the electron-hole-track, and the resistance Rs of the substrate. With G = (Rt + Rs)"1 the curve Idridt) is given by equation (3). L„0) = G-(V+Ul>-Uon(t)) (3) To determine the voltage Uopdt) Hu assumes that the charge carrier density is equal to the density NSUb of acceptors in the substrate. However, Juhnke has shown by device simulation that this approximation may not be precise enough /19/. Exploiting the condition of quasi-neu-trality in semiconductors Juhnke derives an improved model with equation (4) for Uopdt)- uDnXt) = ^-4v+u,\iJr,ft(f)di' (4) The parameter Neh,i is the line density of the electron-hole-pairs along the track, which depends on the energy of the particle strike. K is a technology dependent parameter mainly determined by the mobilities of the electrons and holes and by the density of acceptors in the substrate. Inserting (4) into (3) pro-vides the differential equation (5) for Idridt)- L,,(I) = G V+Ud -JL.Jv+UD \l^(f)dt I (5) For constant voltage V this equation has a closed form solution and Juhnke's model can be summarized by formula (6). N., (6) G-K^V + U,, As observed In /15/ the assumption of constant voltage is only necessary to derive a closed form solution for Idridt). The term V + Ud in equation (5) can therefore be replaced by a variable voltage U(t), which pro-vides equation (7). (0 = g\ U(t)--— -Jrn J /„,.,,(t)df I' ,.(■ i n (7) With C(f) = NdiJ/(K-4U(t j) equation (7) can be rewritten to formula (8), which suggests the interpretation as a serial connection of a capacitance and a conductance. Since the capacitance C(t) depends on U(t), the model is also referred to as UGC model. idrífxt)=G-\u(t)-^¡idnfxm (8) State-of-the-art circuit simulators based on advanced description languages such as VHDL-AMS allow the implementation of arbitrary two terminal networks. Thus, it is not necessary to solve equation (7) analytically, but it can be passed directly to the simulator for numerical analysis. A symmetric analysis can be carried out for PMOS devices, but then the network must be connected with opposite polarity and the technology parameter K must be adapted. First experiments reported in /15/ have shown that the traditional transient current model (based on equation (6)) and the UGC model provide significantly different results. Analyzing for example the behavior of a transistor after an a-particle strike of 1 MeV, the glitches in the drain voltage predicted by the UGC model have smaller amplitude but longer duration. To justify this different view on single event transients, the UGC model has been validated by comparing it to the device level analysis of an NMOS transistor reported in /9/. As shown in /15/ both the device level simulations and the circuit level simulations using the UGC model yield smaller amplitudes and longer du-rations than traditional circuit level simulations based on a transient current source. 3.2 Gate level modeling and simulation results The impact of the UGC model on SEU prediction can be two-fold. On the one hand, smaller amplitudes may increase electrical masking, but on the other hand a longer duration of glitches is likely to increase the probability of propagation through the circuit. In order to analyze the impact of the UGC model in more detail, in /15/ the gate level behavior in the presence of SETs has been extracted using standard techniques as described In /1/. The circuit level parameters were based on a 130 nm process, and for each gate full parasitic information was taken into account during extraction. This way a gate library was created and used to synthesize a set of finite state machine benchmarks with the SIS synthesis tool, the characteristics of which are summarized in Table 1 /26, 38/. The columns show the names of the finite state machines, the number of states, the number of primary inputs and outputs, the number of flip-flops and the number of gates after state minimization, state coding and logic minimization as well as the minimum cycle times in picoseconds. For the simulation at the gate level with a state of the art event driven simulator, the properties of the library cells were mapped to VHDL behavioral descriptions. To model electrical masking at the gate level, the observations reported in /7/ were exploited. Electrical masking is most 216 S. Hellebrand, C. G. Zoellin, H.-J. Wunderlich, S. Ludwig, T, Coym, B. Straube: Testing and Monitoring Nanoscale Systems Informacije MIDEM 37(2007)4, str. 212-219 Table 1. Characteristics of FSM examples FSM States PI PO FF Gates tc [ps] bbara 10 4 2 8 90 670 dkl4 7 3 5 3 145 993 dkl 6 27 2 3 5 409 2068 ex5 9 2 2 2 18 348 ex6 8 5 8 3 123 928 fetch 26 9 15 9 210 697 keyb 19 7 2 8 333 905 lion 4 2 1 2 20 308 mc 4 3 5 9 50 381 nucpwr 29 13 27 5 271 568 si 20 8 6 8 199 1159 sand 32 11 9 21 928 1186 scf 122 27 56 24 1280 1668 shiftreg 8 1 1 4 16 209 styr 30 9 10 5 767 2677 sync 52 19 7 33 529 1403 train 11 11 2 1 2 15 211 pronounced in the first two logic levels after the struck node and after this, electrical masking effects can be neglected and strictly Boolean behavior can be assumed. To quantify the impact of the UGC model, the following simulation flow is reported in /15/. The behavior of a finite state machine is monitored during a given number of cycles with a random input sequence. To compare the UGC model to the common model based on a transient current source, in fact three copies of the finite state machine are simulated under exactly the same conditions. In each clock cycle a random SET is injected into the combinational logic of the finite state machine: an SET characterized by the UGC model in one copy and an SET characterized by a transient current source into the other copy. For comparison the third copy simulates the fault free case. If the SET cannot propagate to a flip-flop in neither copy, then the next SET is injected in the next cycle. Otherwise, a checkpoint for the simulation of the good machine is generated, and the simulation is continued until a fault free state is reached again. This way it can be determined how long the fault effects remain in the system, which can be used as a measure of the "severity" of the faults /12/. If the fault effects remain in the system for more than a given limit, then the analysis is stopped to save simulation time. After the states of both copies agree with the good machine or the analysis of fault effects has been stopped, the checkpoint for the simulation of the good machine is restored, and simulation continues with the injection of the next SET. For the first series of experiments in /15/, a clock of maxi-mum frequency was assumed while monitoring the finite state machine for 10 million SET injections. The results showed that once an SET manifested itself as an SEU in the system, the average time for the SEU to stay in the system was similar for both the UGC and the traditional transient current model. However, comparing the number of occurrences of SEUs showed significantly different results for both models. To simplify the discussion of the results in the following let tuGc denote the number of cycles an SET remains in the system when the simulation is based on the UGC model, and let ttrans represent the same number for the transient current model. Furthermore, the number of SETs with tuGc > k is denoted by n(tuGc > k), and the number of SETs with ttrans > k is denoted by n(ttrans > k). In particular, a value of tuGc or ttrans larger than zero means that the SET has been propagated to one or more registers, consequently causing an SEU. In sequential circuits an SEU can sometimes be tolerated, if it remains in the system only for or a few clock cycles and the system recovers quickly to fault free operation /12/. But if it repeatedly propagates through the next state logic and stays in the system for many cycles, then the risk of a severe system failure increases considerably. Thus, it is also particularly important to compare the results for the number of SEUs staying in the system for more than a tolerable number of cycles. Figure 4 compares n(tuoc > 0) and n(ttrans > 0) as well as n(tuGc > 20) and n(ttrans > 20). For each circuit, the left bar shows the ratio n{tuGc > 0)/n(ttrans > 0), and the right bar represents the ratio n(tuGc > 20)/n(ttrans > 20). There are some cases where no SEUs stayed in the system for more than 20 cycles in both cases. Here the respective bars are omitted. 3-- 2,5'" —---------------------- 1,5" - -- -- ----'- ---- - 0,5" " — -------------- n(tUGC>0)/n(ttrans>0) ■ n(tUGC>20)/n(ttrans>20) Fig. 4: Comparing the ratios n(tuGC > 0)/n(ttrans > 0), and n(tuGc > 20)/n(ttrans > 20} for maximum frequency. It can be observed that the major trend is a factor of two between the UGC model and the transient current source model. This implies that the more realistic prediction by the UGC model results in twice as many (severe) SEUs as a prediction by the traditional transient current model. The detailed results in /15/ show that there are also some cases where the transient current source model predicts 217 S. Hellebrand, C. G. Zoellin, H.-d. Wunderlich, S. Ludwig, Informacije MIDEM 37(2007)4, str. 212-219 T. Coym, B. Straube: Testing and Monitoring Nanoscale Systems ... longer times for the SEUs to stay in the system. In this case the smaller amplitudes predicted by the UGC model result in electrical masking. But five to ten times more often the longer duration of glitches is the dominating effect. Although the probability for an SET to be latched in a flip-flop increases with the operating frequency, these trends have been confirmed also for simulations based on different clock frequencies /15/. 4 Conclusions The increasing variability of parameters and the increasing vulnerability to defects, degradation, and transient faults require a paradigm shift in design and test of nanoscale systems. A robust and fault tolerant system design becomes mandatory also for non critical applications, and testing has to characterize not only the functionality but also the robustness of a system. The RealTest Project addresses these problems by developing unified design and test strategies supporting both a robust design and efficient test procedures for manufacturing test as well as online test and fault tolerance. First results concerning the susceptibility of random logic to soft errors have shown that the effects of SETs have still been underestimated so far. Simulations at gate level based on a refined electrical model for SETs have revealed about twice as many critical effects as simulations based on a traditional model. References /1/ L. Anghel, R. Leveugle, P. Vanhauwaert, "Evaluation of SET and SEU at Multiple Abstraction Levels," Proc. 11th IEEE Int. Online Testing Symposium 2005 (IOLTS'05), San Raphael, France, pp. 309-312, 2005 /2/ R. Baumann, "Soft Errors in Advanced Computer Systems," IEEE Design &Test of Computers, Vol. 22, No. 3, pp. 258-266, 2005. /3/ M. Baze, et al., "An SEU analysis approach for error propagation in digital VLSI CMOSASICs," IEEE Trans, on Nuclear Science, Vol. 42, No. 6 Part 1, pp. 1863-1869, 1995. /4/ B. Becker, et al., "DFG Projekt RealTest - Test und Zuverlässigkeit nanoelektronischer Systeme (DFG-Project RealTest - Test and Reliability of Nano-Electronic Systems)", it - Information Technology, Vol. 48, No. 5, 2006, pp. 304-311. /5/ S. Borkar, "Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation", IEEE Micro, Vol. 25, No. 6, pp. 10-16, Nov./Dec. 2005. /6/ M. A. Breuer, S. K. Gupta, and T. M. Mak, "Defect and Error Tolerance in the Presence of Massive Numbers of Defects", IEEE Design &Test, Vol. 21, No. 3, pp. 216-227, May-June 2004 /7/ H. Cha, et al., "A gate-level simulation environment for alpha-particle-induced transient faults," IEEE Trans, on Computers, Vol. 45, No. 11, pp. 1248-1256, 1996. /8/ P. Dodd and L. Massengill, "Basic mechanisms and modeling of single-event upset in digital microelectronics," IEEE Trans, on Nuclear Science, Vol. 50, No. 3, pp. 583-602, 2003. /9/ P. Dodd, etal., "Production and propagation of single-event transients in high-speed digital logic ICs," IEEE Trans, on Nuclear Science, Vol. 51, No. 6 Part 2, pp. 3278-3284, 2004. /10/ P. Dodd, "Physics-based simulation of single-event effects," IEEE Trans, on Device and Materials Reliability, Vol. 5, No. 3, pp. 343-357, 2005. /11/ L. Freeman, "Critical charge calculations for a bipolar SRAM array," IBM Journal of Research and Development, Vol. 40, No. 1, pp. 119-129, 1996. /12/ J. Hayes, I. Polian, and B. Becker, "An Analysis Framework for Transient Error Tolerance", Proc. 25th IEEE VLSI Test Symp. (VTS'07), Berkeley, CA, USA, pp. 249-255, 2007. /13/ P. Hazucha, et al., "Neutron soft error rate measurements in a 90-nm CMOS process and scaling trends in SRAM from 0.25-/ spl mu/m to 90-nm generation," Technical Digest IEEE Int. Electron Devices Meeting 2003 (IEDM'03), pp. 21-526, 2003. /14/ S. Hellebrand et al., "Efficient online and offline testing of embedded DRAMs", IEEE Trans, on Computers, Vol. 51, No. 7, pp. 801-809, July 2002. /15/ S. Hellebrand, et al., "A Refined Electrical Model for Particle Strikes and its Impact on SEU prediction", Proc. 22nd IEEE Int. Symposium on Defect and Fault Tolerance in VLSI Systems (DFT07), Rome, Italy, September 26-28, 2007. /16/ C. Hsieh and P. Murley, "A field-funneling effect on the collection of alpha-particle-generated carriers in silicon devices," IEEE Electron Device Letters, Vol. 2, No. 4, pp. 103-105, 1981. /17/ C. Hu, "Alpha-particle-induced field and enhanced collection of carriers," IEEE Electron Device Letters, Vol. 3, No. 2, pp. 31-34, 1982. /18/ A. Jee and F. J. Ferguson, "Carafe: An Inductive Fault Analysis Tool for CMOS VLSI Circuits", Proc. 11th IEEE VLSI Test Symp., pp. 92-98, 1993 /19/ T. Juhnke, "Die Soft-Error-Rate von Submikrometer-CMOS-Logik-schaltungen," PhD Thesis, Technical University of Berlin, 2003. /20/ N. Kaul, B. Bhuva, and S. Kerns, "Simulation of SEU transients in CMOS ICs," IEEE Trans, on Nuclear Science,Vol. 38, No. 6 Part 1, pp. 1514-1520, 1991. /21 / R. Kuppuswamy et al., "Full hold-scan systems in microprocessors: Cost/benefit analysis", Intel Technology Journal, 8(1), pp. 63-72, Feb. 2004. /22/ P. K. Lala, "Self-Checking and Fault-Tolerant Digital Design", Morgan Kaufmann Publishers, San Francisco, 2001 /23/ P. Liden, et al., "On latching probability of particle induced transients in combinational networks," Digest of Papers 24th Int. Symp. on Fault-Tolerant Computing 1994 (FTCS-24), pp. 340-349, 1994. /24/ L. W. Liebmann, "Layout impact of resolution enhancement techniques: impediment or opportunity?", Proc. Int. Symposium on Physical Design 2003 (ISPD '03), Monterey, CA, USA, pp. 110-117, 2003. /25/ A. Maheshwari, I. Koren, and W. Burleson, "Accurate estimation of soft error rate (SER) in VLSI circuits," Proc. 19th IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems 2004 (DFT 2004), pp. 377-385, 2004. /26/ K. McElvain: IWLS'93 Benchmark Set: Version 4.0, distributed as part of the IWLS'93 benchmark distribution, available at http:/ /www.cbl.ncsu.edu:16080/benchmarks/LGSynth93/ /27/ F. McLean and T. Oldham, "Charge tunneling in N-and P-type Si substrates," IEEE Trans, on Nuclear Science, Vol. 29, No. 6, pp. 2018-2023, 1982. /28/ G. Messenger, "Collection of charge on junction nodes from ion tracks," IEEE Trans, on Nuclear Science, Vol. 29, No. 6, pp. 2024-2031, 1982. /29/ S. Mitra, et al., "Logic soft errors in sub-65nm technologies design and CAD challenges," Proc. 42nd Conf. on Design Automation, pp. 2-4, 2005. /30/ K. Mohanram and N. A. Touba, "Cost-effective approach for reducing soft error failure rate in logic circuits", Proc. IEEE Int. Test Conf. (ITC'03), Charlotte, NC, USA, pp.893-901, Sept./ Oct. 2003. 218 S. Hellebrand, C. G. Zoellln, H.-J. Wunderlich, S. Ludwig, T. Coym, B. Straube: Testing and Monitoring Nanoscale Systems Informacije MIDEM 37(2007)4, str. 212-219 /31/ S. Mukherjee, etal., "A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor," Proc. 36th Annual IEEE/ACM Int. Symp. on Microarchitecture 2003 (MICRO-36), pp. 29-40, 2003. /32/ H. Nguyen and Y. Yagil, "A systematic approach to SER estimation and solutions," Proc. 41st Annual IEEE Int. Reliability Physics Symp., pp. 60-70, 2003. /33/ M. Nicolaidis and Y. Zorian, "On-Line Testing for VLSI - A Compendium of Approaches", Journal of Electronic Testing: Theory and Appli cations (JETTA), Vol. 12, No. 1-2, pp. 7-20, February/April 1998. /34/ M. Nicolaidis, "Design for Soft Error Mitigation," IEEE Trans, on Device and Materials Reliability, Vol. 5, No. 3, pp. 405-418, 2005 /35/ M. Omana, etal., "A model for transient fault propagation in combinatorial logic," Proc. 9th IEEE On-Llne Testing Symp. 2003 (IOLTS'03), pp. 111-115, 2003. /36/ D. K. Pradhan, "Fault Tolerant Computer System Design", Prentice Hall, Upper Saddle River, NJ, USA, 1996. /37/ P. Roche, et al., "Determination of key parameters for SEU occurrence using 3-D full cell SRAM simulations," IEEE Trans, on Nuclear Science, Vol. 46, pp. 1354-1362, 1999 /38/ E. M. Sentovich, etal.: SIS: A System for Sequential Circuit Synthesis; Electronics Research Laboratory, Mem. No. UCB/ERL/ M92/41, Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720 /39/ S. K. Shukla, R. I. Bahar (Eds.), "Nano, Quantum and Molecular Computing - Implications to High Level Design and Validation", Boston, Dordrecht, London, Kluwer Academic Publishers, 2004. /40/ J. E. Smith and G. Metze, "Strongly Fault Secure Logic Networks", IEEE Transactions on Computers, Vol. c27, No. 6, pp. 491-499, June 1978. Sybille Hellebrand University of Paderborn, Germany Christian G. Zoellin, Hans-Joachim Wunderlich University of Stuttgart, Germany Stefan Ludwig, Torsten Coym, Bernd Straube, Fraunhofer IIS-EAS Dresden, Germany Prispelo (Arrived): 15.07.2007 Sprejeto (Accepted): 01.09.2007 219 UDK621.3.'(53+54+621 +66), ISSN0352-9045 Informacije MIDEM 37(2007)3, Ljubljana CHALLENGES AND SOLUTIONS FOR THERMAL-AWARE SOC TESTING Zebo Peng, Zhiyuan He and Petru Eles Embedded Systems Laboratory, Linkoping University, Sweden INVITED PAPER MIDEM 2007 CONFERENCE - WORKSHOP ON ELECTRONIC TESTING 12.09. 2007 - 14.09. 2007, Bled, Slovenia Key words: electronic testing, SoC devices, thermal-aware SoC testing techniques, test efficiency Abstract: High temperature has negative impact on the performance, reliability and lifespan of a system on chip. During testing, the chip can be overheated due to a substantial increase of switching activities and concurrent tests in order to reduce test application time. This paper discusses several issues related to the thermal problem during SoC testing. It will then present a thermal-aware SoC test scheduling technique to generate the shortest test schedule such that the temperature constraints of individual cores and the constraint on the test-bus bandwidth are satisfied. In order to avoid overheating during the test, we partition test sets into shorter test sub-sequences and add cooling periods in between. Further more, we interleave the test subsequences from different test sets in such a manner that the test-bus bandwidth reserved for one core is utilized during its cooling period for the test transportation and application of the other cores. We have developed a heuristic to minimize the test application time by exploring alternative test partitioning and interleaving schemes with variable length of test sub-sequences and cooling periods. Experimental results have shown the efficiency of the proposed heuristic. Izzivi in rešitve pri testiranju sistemov na čipu Kjučne besede: testiranje elektronike, SoC sistemi na čipu, tehnike testiranje SoC z obvladovanjem pregrevanja, učinkovitost testiranja Izvleček: Visoke temperature negativno vplivajo na lastnosti, zanesljivost in življensko dobo sistemov na čipu. Med testiranjem lahko pride do pregrevanja čipa zaradi povečanega števila vklopov z namenom skrajšati čas testiranja. V prispevku opišemo tovrstne probleme in predstavimo tehnike, s katerimi načrtujemo take testne procedure, ki vodijo računa o tem, da ne prihaja do pregrevanja posameznih delov čipa. Z namenom preprečiti pregrevanje smo teste razdelili na krajše testne periode z vmesnimi pavzami za hlajenje elektronike. Dodatno smo programirali testna zaporedja tako, da so se signali širili preko testnih poti selektivno k posameznim delom elektronike med tem, ko so se drugi deli hladili. Izdelali smo ustrezno metodologijo, ki nam je omogočila skrajšati testne čase. Eksperimentalni rezultati so pokazali uspešnost predlagane metode. 1. Introduction The rapid development of System-on-Chip (SoC) techniques has led to many challenges to the design and test community. The challenges to the designers have been addressed by the development of the core-based design method, where pre-deslgned and pre-verifled building blocks, called embedded cores, are integrated together to form a SoC. While the core-based design method has led to the reduction of design time, it entails several test-related problems. How to address these test problems in order to provide an optimal test solution is a great challenge to the SoC test community /1 /. A key issue for SoC testing is the selection of an appropriate test strategy and the design of a test infrastructure on chip to implement the selected test strategy. For a core embedded in a SoC, the direct test access to its peripheries is impossible. Therefore, special test access mechanism must be included in a SoC to connect the core peripheries to the test sources and sinks. The design of the test infrastructure, Including the test access mechanism, must be considered together with the test scheduling prob- lem, in order to reduce the silicon area used for test access and to minimize the total test application time /2/, /3/, /4/, /5/, /6/, /7/, /8/. The rapid Increasing test data volume needed for SoC testing is another issue to be addressed, since it contributes significantly to long test application times and huge ATE memory requirements /7/. This Issue can be addressed by sharing the same test set among several cores as well as test data compression. Both test set sharing and test compression can exploit the large percentage of don't care bits in typical test patterns generated for complex SoC designs in order to reduce the amount of test data needed /9/. The issue of power dissipation should also be considered in order to prevent a SoC chip from being damaged by overheating during test /2/, /9/, /10/, /11/. High temperature has become a technological barrier to the testing of high performance SoC, especially when deep submi-cron technologies are employed. In order to reduce test time while keeping the temperature of the cores under test within a safe range, thermal-aware test scheduling tech- 220 Z. Peng, Z, He, P. Eles: Challenges and Solutions for Thermal-Aware SoC Testing Informacije MIDEM 37(2007)4, str. 220-227 niques are required, and this paper discussed several issues related to thermal-aware SoC testing. Thermal-aware testing has recently attracted many research interests. Liu et al. proposed a technique to evenly distribute the generated heat across the chip during tests, and therefore avoid high temperature/12/. Rosingeretal. proposed an approach to generate thermal-safe test schedules with minimized test time by utilizing the core adjacency information to drive the test scheduling and reduce the temperature stress between cores /13/. In our previous work /14/, we proposed a test set partitioning and interleaving technique, and employed constraint logic programming (CLP) to generate thermal-aware test schedules with the minimum test application time (TAT). In our work, we assume that a continuous test will increase the temperature of a core to pass a limit beyond which the core may be damaged. In order to avoid overheating during tests, we partition the entire test set into a number of test sub-sequences and introduce a cooling period between two consecutive test sub-sequences. As the test application time substantially increases when long cooling periods are introduced, we interleaved different partitioned test sets in order to generate a shorter test schedule. In /14/, we restricted the length of test sub-sequences that belong to the same test set to be identical. Moreover, we also restricted the cooling periods between test sub-sequences from the same test set to have equal length. The main purpose of these restrictions was to keep the size of the design space small and, by this, to reduce the optimization time, so that the CLP-based algorithm will be able to generate the optimal solutions in a reasonable time. However, these restrictions have resulted in less efficient test schedules, and longer test application times. In our recent work, we have eliminated these restrictions so that both test sub-sequences and cooling periods can have arbitrary lengths. Since breaking the regularity of test subsequences and cooling periods dramatically increases the size of exploration space, the CLP-based test scheduling approach proposed in /14/ is not feasible any more, especially for practical industrial designs. Therefore, new, low-complexity heuristics are needed which are able to produce efficient test schedules under the less restricted and more realistic assumptions. The rest of this paper is organized as follows. The next section discusses the thermal issue related to SoC testing, and some solutions. It will also motivate the importance of test partitioning and Interleaving with arbitrary partition/cooling lengths. Section 3 defines formally the thermal-award test scheduling problem we are addressing in this paper. Section 4 presents the overall strategy of our thermal-aware scheduling approach, and Section 5 the proposed test scheduling heuristic. The experimental results are described in Section 6, and conclusions in Section 7. 2. The thermal issue High temperature can be observed in most high-performance SoCs due to high power consumption. High power consumption results in excessive heat dissipation, and elevates the junction temperature which has large impacts on the operation of integrated circuits /15/, /16/, /17/, /18/. The performance of the integrated circuits is proportional to the driving current of CMOS transistors, which is a function of the carrier mobility. Increasing junction temperature decreases the carrier mobility and the driving current of the CMOS transistors, which consequently degrades the performance of circuits /19/. At higher junction temperature, the leakage power increases. The increased leakage power in turn contributes to an increase of junction temperature. This positive feedback between leakage power and junction temperature may result in thermal runaway and destroy the chip /19/. The long term reliability and lifespan of integrated circuits also strongly depends on junction temperature. Failure mechanisms in CMOS integrated circuits, such as gate oxide breakdown and electro-migration, are accelerated in high junction temperature. This may results in a drop of the long term reliability and lifespan of circuits /18/ . Advanced cooling system can be one solution to the high temperature problems. However, the cost of the entire system will substantially increase, and the size of the system is inevitably large. The thermal issue becomes even more severe in the case of testing than in normal functional mode, since testing dissipates more power and heat due to a substantial increase of switching activities /18/. In order to prevent excessive power during test, several techniques have been developed. Low power DFT and test synthesis techniques can be utilized, including low-power scan chain design /20/, /21 /, as well as scan cell and test pattern reordering /21 /, /23/, /24/. Although low power DFT can reduce the power consumption, such techniques usually add extra hardware into the design and therefore can increase the delay and the cost of the produced chips. For many modern SoC designs, when a long sequence of test patterns is continuously applied to a core, the temperature of this core may increase and pass a certain limit beyond which the core will be damaged, in such scenarios, the test has to be stopped when the core temperature reaches the limit, and can be restarted later when the core has been cooled down. Thus, by partitioning a test set into shorter test sub-sequences and introducing cooling periods between them, we can avoid the overheating during test. Figure 1 illustrates the temperature profile of a core under test when the entire test set for the core is partitioned into four test sub-sequences, TSu TS2, TS3, and 7"S4, and cooling periods are introduced between the sub- 221 Informacije MIDEM 37(2007)4, str. 220-227 Z. Peng, Z. He, P. Eles: Challenges and Solutions for Thermal-Aware SoC Testing sequences. In this way, the temperature of the core under test remains within the imposed temperature limit. Temperature Completion Time Fig. 1. Illustration of test set partitioning It is obvious that introducing long cooling periods between test sub-sequences will substantially increase the test application time (TAT). To address this problem, we can reduce the TAT by interleaving the partitioned test sets such that the test-bus bandwidth reserved for a core C/, during its cooling periods, are utilized to transport test data for another core Q (/' ^'), and thereafter to test the core C/. By interleaving the partitioned test sets belonging to different cores, the test-bus bandwidth is more efficiently utilized. Figure 2 gives an example where two partitioned test sets are interleaved so that the test time is reduced with no need for extra bus bandwidth. Temperature Completion Time -Cooling (Core Fig. 2. Illustration of test set interleaving There are many design alternatives which can be used to implement the above basic ideas of test partitioning and interleaving for thermal-aware test scheduling. In general, the objective is to minimize the test application time by generating an efficient test partitioning/interleaving scheme and to schedule the individual test sub-sequences which avoids violating the temperature limits of individual cores, and, at the same time, satisfies the test-bus bandwidth constraint. This is a complex optimization problem, and we have developed, as mentioned in the previous section, a solution for the problem with the restriction that the length of the test sub-sequences from the same test set should be identical and the cooling periods between test sub-se- quences from the same test set should have equal length /14/. However, this restriction has resulted in less efficient test schedules, and thus longer test application times. To illustrate the usefulness of eliminating this restriction so that both test sub-sequences and cooling periods can have arbitrary lengths, each test sub-sequence can be considered as a rectangle, with its height representing the required test-bus bandwidth and its width representing the test time. Figure 3 gives an example where three test sets, TSi, TS2, and TS3, are partitioned into 5, 3, and 2 test sub-sequenc-es, respectively. Note that the partitioning scheme which determines the length of test sub-sequences and cooling periods has ensured that the temperature of each core will not violate the temperature limit, by using a temperature simulation /19/. Figure 3(a) shows a feasible test schedule under the regularity assumption (identical test sub-se-quence length and identical cooling periods for each core). In Figure 3(b), an alternative test schedule is depicted, where the test sub-sequence and the cooling periods can have arbitrary lengths. This example shows the possibility to find a shorter test schedule by exploring alternative solutions, where the number and length of test sub-sequenc-es, the length of cooling periods, and the way that the test sub-sequences are interleaved are different from those in Figure 3(a). Bandwidth Limit Test Completion WW ts2, ts31 ts22 ts32 ts,2 ÏSS TSiï :ss 'tsw iss tss \ N \Y ts23 (a) A test schedule with regular partitioning scheme Bandwidth Bandwidth Limit Test Completion TS31 \:tsik> TS21 TS32 M-S,^ ts22 vrs,3\ w,\ ts23 •TS« ts24 0 (b) An alternative test schedule with irregular partitioning scheme Fig. 3. Comparison of two different test schedules 3. Problem formulation We have assumed a test architecture using a single test bus to transport test data between the tester and the cores 222 Z. Peng, Z. He, P. Eles: Challenges and Solutions for Thermal-Aware SoC Testing Informacije MIDEM 37(2007)4, str. 220-227 under test. A tester can be either an external automated test equipment (ATE) or an embedded tester integrated on the chip. Each core under test is connected to the test bus with a number of dedicated TAM wires. The test patterns, together with a generated test schedule, are stored in the tester memory. A test controller controls the entire test process according the test schedule, sending test patterns to and receiving test responses from the corresponding cores through the test bus and the TAM wires. Suppose that a system S, consisting of n cores Ci, C2, ... , Cn, employs the test architecture defined above. In order to test core Ci, a test set TS, consisting of /, generated test patterns is transported through the test bus and the dedicated TAM wires to/from core C/, utilizing a bus bandwidth 1/1/,-. The test bus is designed to allow transporting several test sets in parallel but has a bandwidth limit BL (BL > Wi, /=1,2,..., n). We assume that continuously applying test patterns belonging to TS, may cause the temperature of core C/ to go beyond a certain limit TLi so that the core can be damaged. In order to prevent overheating during tests, as discussed before, we partition a test set into a number of test sub-sequences and introducing a cooling period between two partitioned test sub-se-quences, such that no test sub-sequence drives the core temperature higher than the limit and the core temperature is kept within a safe rage. The problem that we address in this paper is to generate a partitioning scheme and a test schedule for system S such that the test application time Is minimized while the bus bandwidth constraint is satisfied and the temperatures of all cores during tests remains below the corresponding temperature limits. 4. Overall strategy We have proposed an approach to solve the formulated problem in two major steps. First, we generate an initial partitioning scheme for every test set by using temperature simulation and the given temperature limits. Second, the test scheduling algorithm explores different test schedules by selecting alternative partitioning schemes, interleaving test sub-sequences, and squeezing them into a two-dimensional space constrained by the test-bus bandwidth. In order to generate thermal-safe partitioning schemes, we have used a temperature simulator, HotSpot /17/, /25/, /26/, /27/, to simulate instantaneous temperatures of individual cores during tests. HotSpot assumes a circuit packaging configuration widely used in modern IC designs, and it computes a compact thermal model /27/ based on the analysis of three major heat flow paths existing in the assumed packaging configuration /26/, /27/. Given the floorplan of the chip and the power consumption profiles 1 The value of k can be experimentally set by the designers. of the cores, HotSpot calculates the instantaneous temperatures and estimates the steady-state temperatures for each unit. In this paper, we assume that the temperature Influences between cores are negligible since the heat transfer in the vertical direction dominates the transferring of dissipated heat, which has been validated by the simulation results with HotSpot /14/, /19/. When generating the initial thermal-safe partitioning scheme, we have assumed that a test set TS, is started when the core is at the ambient temperature TMamt>. Then we start the temperature simulation, and record the time moment th\ when the temperature of core C,- reaches the given temperature limit TLi. Knowing the latest test pattern that has been applied by the time moment th 1, we can easily obtain the length of the first thermal-safe test sub-sequence 7S/1 that should be partitioned and separated from the test set TS/. Then the temperature simulation continues while the test process on core Ci has to be stopped until the temperature goes down to a certain degree. Usually a relatively long time is needed in order to cool down a core to the ambient temperature, as the temperature decreases slowly at a lower temperature level (see the dashed curve in Figure 4). Thus, we let the temperature of core Ci go down only until the slope of the temperature curve reaches a given value k 1, at time moment fci. At this moment, we have obtained the duration of the first cooling period dn = fc 1 - foi- Restarting the test process from time moment tc 1, we repeat this heating-and-cooling procedure throughout the temperature simulation until all test patterns belonging to TS; are applied. Thus we have generated the initial thermal-safe partitioning scheme, where test set TS, is partitioned into m test sub-sequences [TSij | / = 1, 2, ... , m] and between every two consecutive test sub-sequences, the duration of the cooling peri- od is [dij | / = 1, 2.....m-1}, respectively. Figure 4 depicts an example of partitioning a test set into four thermal-safe test sub-sequences with three cooling periods added in between. Fig. 4. An example of generating initial partitioning scheme Once the initial thermal-safe partitioning scheme is obtained, our focus Is on how to schedule all the test sub- 223 Informacije MIDEM 37(2007)4, str. 220-227 Z. Peng, Z. He, P. Eles: Challenges and Solutions for Thermal-Aware SoC Testing sequences such that the test application time is minimized under the constraint on the test-bus bandwidth. In this paper, since we consider each test sub-sequence as a rectangle, the problem of generating a test schedule with minimized TAT while satisfying the constraint on the test-bus bandwidth can be formulated as a rectangular packing (RP) problem /28/, /29/, /30/. However, our test scheduling problem is not a classical RP problem, since the length of the sub-sequences, and the cooling periods are not fixed /31/. This makes our problem even more difficult to be solved. BW Test Completion 1 ts \nnn> ts11 v- w ts?1 12 ts22 ts31 x\'\\n ts32 -► 0 Time (a) Test schedule with the SCO = (TSi, TS2, TS3) Interleaving test sub-sequences belonging to different test sets can also introduce time overheads /32/, /11/, when the test controller stops one test and switches to another. Therefore, partitioning a test set into more test sub-sequences may lead to a longer test application time, since more time overheads and more cooling periods are introduced into the test schedule. On the other hand, partitioning a test set into more test sub-sequences results in a shorter average length of the individual test sub-sequences, which in principle can be packed in a more compact way and thus lead to shorter test application times. Thus, we need a global optimization algorithm, in which different numbers and lengths of test sub-sequences as well as variant cooling periods are explored, while taking into account the time overheads introduced by test sub-sequence interleaving. 5. Test scheduling heuristic We have proposed a heuristic to do the test scheduling with test set repartitioning and interleaving. Since the order in which the test sets are considered for test scheduling has a large impact on the final test schedule, we construct an iterative algorithm to obtain a good scheduling consideration order (SCO) for all partitioned test sets, and thereafter schedule the test sub-sequences according to the obtained SCO /31/. Figure 5 shows a simple example illustrating the impact of different scheduling consideration order on the test schedule of three test sets, TS1, TS2, and TS3, each of which is partitioned into two test sub-sequences. Figure 5(a) and Figure 5(b) respectively depicts the test schedule when the test sets are considered for scheduling in the order of {7~Si, TS2, TS3} and {TS3, TS2, TSi}. It is obvious that using the second SCO results in a shorter test schedule. Note that In this example the test sets are scheduled to the earliest available time moments. It should also be noted that the scheduling consideration order refers to the precedence of partitioned test sets to be considered for scheduling. However, when a test set is taken into account for scheduling, we do not schedule all the test sub-sequences of this test set at one time. Instead, we always take the first unscheduled test sub-sequence of the currently considered test set for scheduling, and thereafter take the first unscheduled test sub-sequence of BW Test Completion ■ r tsu ts12 ts22 tss! ts32 0 Time (b) Test schedule with the SCO = {TS3, TS2, TS{} Fig. 5. Illustration of how SCO affects test length the next test set into account. Thus, in this example, the overall scheduling consideration order (OSCO) for all test sub-sequences of all test sets is {TSu, TS21, TS31, 7"Si2, TS22, TS32} and {7S3i, 7S2i, TSu, 7S32, TS2Zi 7S12}, for the case of Figure 5(a) and Figure 5(b) respectively. The main concern of not scheduling all test sub-sequences of one test set at one time is to avoid generating low efficient test schedule due to unnecessarily long cooling periods, inappropriate partition length, and inefficient test-set interleaving. The basic idea of the proposed heuristic is to iteratively construct a queue that finally consists of all partitioned test sets in a particular order. Given a set of test sets U = {TS, I i = 1, 2, ... , n}, the heuristic iteratively selects test sets and inserts them into a queue Q. The positions of the test sets in Q represents the order in which the test sets are considered for test scheduling (SCO); the closer to the queue head, the earlier to be considered. The heuristic starts with an empty queue Q = (j). At each iteration step, the objective is to select one test set TSk from U, and insert it into Q at a certain position POS, such that the | Q | + 1 test sets are put in a good order while the precedence between test sets excluding the newly inserted one remains unchanged. The algorithm terminates when all test sets in U have been moved into 0, and thereafter it schedules the partitioned test sets according to the SCO obtained in Qbest- For each iteration step, there are | U\ alternative test sets for selection, where \U\ is the current number of test sets remaining in U. For each selected test set, there are |Q| + 1 alternative positions which the selected test set 224 Z. Peng, Z. He, P. Eles: Challenges and Solutions for Thermal-Aware SoC Testing Informacije MIDEM 37(2007)4, str. 220-227 can be inserted to, where |Q| is the current number of test sets that have already been inserted into Q In the previous iteration steps. Thus, at one iteration step, there are \U\ x (|Q| + 1) alternative solutions, in which a selected test set is associated with an insertion position in 0. The proposed algorithm evaluates the obtained scheduling consideration order by the efficiency of the generated partial test schedule; the higher efficiency, the better the SCO. The partial test schedule is generated by a basic scheduling algorithm. Based on the test-schedule efficiency, we explore different solutions and make decisions according to the efficiency of the generated partial test schedules. We define the efficiency of a test schedule, denoted with as follows. Suppose x is the size of the area covered by all scheduled test sub-sequences, and y is the total area size constrained by the bus bandwidth limit and the completion time moment of the test schedule. The efficiency of the test schedule is the value of x / y. The larger value of r\ represents the better test schedule. Given a queue Q of test sets, the basic scheduling algorithm takes the first unscheduled test sub-sequence from every test set for scheduling, in a round-robin fashion. More concretely, the strategy of the basic scheduling algorithm is explained as follows. According to the SCO given in 0, the scheduler considers one test set at a time for scheduling. When considering each test set, the scheduler only schedules the first unscheduled test sub-sequence to the earliest available time moment, and thereafter turns to consider the next test set. When one round is finished for all the test sets in 0, the scheduler takes the next round for consideration of scheduling test sub-sequences of all the test sets, in the same SCO. This procedure repeats until all test sub-sequences are scheduled. The detailed description of the heuristic and the basic scheduling algorithm and their pseudo-code can be found at /31 /. 6. Experimental results We have done experiments using SoC designs with randomly selected cores from the ISCAS'89 benchmarks. The designs for our experiments have 12 to 78 cores. The main objective of our experiments is to check how efficient the test schedules generated by our heuristic are. We compare the results of our heuristic with those of other two algorithms, a straightforward algorithm (SF) and the simulated annealing algorithm (SA). All the three algorithms employ flexible partitioning of test sets and arbitrary length of cooling periods. Further more, all the three algorithms employ the same basic scheduling algorithm used in the Inner iteration of the algorithms. The difference between them Is therefore how they generate the SCO for all test sets. The straightforward algorithm sorts all test sets decreasingly by the lengths of the entire test sets with the initial partitioning schemes. According to the obtained SCO, the scheduler chooses each test set and schedules the first unscheduled test sub-sequences to the earliest available time moment, until all test sub-sequences of every test set are scheduled. In the SA algorithm, the SCO of test sets Is generated based on a simulated annealing strategy. When a randomly generated SCO is obtained, the scheduler Is invoked to schedule the test sub-sequences according to the current SCO. During iterations, the best SCO that leads to the shortest test schedule Is recorded and the algorithm returns this recorded solution when the stopping criterion is met. The experimental results are listed in Table 1, where column 1 lists the number of cores for the SoC. Column 2 shows the test application time of the generated test schedule when the straightforward algorithm is employed, and column 3 lists the corresponding CPU times to obtain the test schedules. Similarly, columns 4 and 5 are the TAT and CPU times for our heuristic, respectively. Columns 6 and 7 list the TAT and execution times for the simulated annealing algorithm. In columns 7 and 8, the percentage of reduced TAT of the test schedules generated by our heuristic are listed, compared to those generated by the straightforward algorithm and the simulated annealing algorithm, respectively. Table 1. Comparison of different approaches. # cores SF Our heuristic SA TAT gain (%) TAT CPU Times (s) TAT CPU Times (s) TAT CPU Times is) From SF From SA 12 1213 0.01 1048 2.74 992 148.31 13.6% -5.6% 18 1716 0.01 1535 5.41 1513 208.06 10.5% -1.5% 24 2632 0.01 2318 21.88 2234 229.94 11.9% -3.8% 30 2274 0.01 1915 32.41 1869 417.08 15.8% -2.5%, 36 3161 0.01 2539 67.52 2494 540.48 19.7% -1.8% 42 3846 0.01 3334 101.39 3292 631.00 13.3% -1.3% 48 4328 0.01 3509 151.33 3485 898.77 18.9% -0.7% 54 4877 0.01 4290 244.36 4051 675.44 12.0% -5.9% 60 5274 0.01 4692 371.73 4457 2171.73 11.0% -5.3% 66 5725 0.01 5069 511.88 4917 2321.39 11.5% -3.1% 72 6538 0.01 5822 720.53 5689 1994.56 11.0% -2.3% 78 6492 0.01 5769 987.75 5702 3301.45 11.1% -1.2% AVG N/A N/A N/A N/A N/A N/A 13.4% -2.9% The comparison between our heuristic and the straightforward algorithm aims to show how much TAT can be reduced by the advanced heuristic we have developed. On the other hand, the comparison between our heuristic and the simulated annealing algorithm is to find out how close our generated test schedule is to a solution which is very 225 Informacije MIDEM 37(2007)4, str. 220-227 close to the optimal one. In order to generate close-to-optimal solutions, the SA algorithm has been run for long optimization times. It can be seen that, when using our heuristic, the TAT is in average 13.4% shorter than those using the straightforward algorithm. The TAT is in average 2.9% longer than those using the simulated annealing algorithm which however needs much longer execution times. 7. Conclusions In this paper, we have discussed several issues related to the thermal problem when a SoC is being tested. We have also proposed a heuristic to generate thermal-safe test schedules for SoC in order to minimize the test application time. Based on the initial partitioning scheme generated by a temperature simulation guided procedure, the heuristic utilizes the flexibility of changing the length of test subsequences and the cooling periods between test sub-se-quences, and interleaves them to generate efficient test schedules. Experimental results have shown the efficiency of our heuristic. 8. References /1/ Y. Zorian, S. Dey and M. Rodgers. "Test of future system-on- chips," Proc. ICCAD, 2000. /2/ E. Larsson, and Z. Peng. "An integrated framework for the design and optimization of SoC test solutions". Journal of Electronic Testing; Theory and Applications (JETTA), Vol. 18, No. 4/5, pp. 385-400, 2002. /3/ B. T. Murray, and J. P. Hayes. "Testing ICs: Getting to the core of the problem". IEEE Transactions on Computer, Vol. 29, pp. 32-39, Nov. 1996. /4/ Y. Zorian, E. J. Marinissen, and S. Dey. "Testing embedded core-based system chips". International Test Conference (ITC), 1998, pp. 130-143. /5/ Y. Huang, W.-T. Cheng, C.-C. Tsai, N. Mukherjee, O. Samman, Y. Zaidan, and S. M. Reddy. "Resource allocation and test scheduling for concurrent test of core-based SoC design". Asian Test Symposium (ATS), 2001, pp. 265-270. /6/ J. Aerts, and E. J. Marinissen, "Scan chain design for test time reduction in core-based ICs", International Test Conference (ITC), 1998, pp. 448-457. /7/ V. Iyengar, K. Chakrabarty, and E. J. Marinissen, "Test access mechanism optimization, test scheduling, and test data volume reduction for System-on-Chip", IEEE Transactions on Computer, Vol. 52, No. 12, Dec. 2003. /8/ P. Varma, and B. Bhatia, "A structured test re-use methodology for core-based system chips", International Test Conference (ITC), 1998, pp. 294-302. /9/ B. Pouya and A. Crouch. "Optimization trade-offs for vector volume and test power". International Test Conference (ITC), 2000, pp. 873-881. /10/ R. Chou, K. Saluja, and V. Agrawal. "Scheduling tests for VLSI systems under power constraints". IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 5(2):175-184, June 1997. Z. Peng, Z. He, P. Eles: Challenges and Solutions for Thermal-Aware SoC Testing /11/ Z. He, Z. Peng, and P. Eles. "Power constrained and defect-probability driven SoC test scheduling with test set partitioning". Design Automation and Test in Europe Conference (DATE), 2006, pp. 291-296. /12/ C. Liu, K. Veeraraghavant, and V. Iyengar. "Thermal-aware test scheduling and hot spot temperature minimization for core-based systems". IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT), 2005, pp. 552-560. /13/ P. Rosinger, B. M. Al-Hashimi, and K. Chakrabarty. "Thermal-safe test scheduling for core-based System-on-Chip integrated circuits". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. Vol. 25, No. 11, pp. 2502-2512, Nov. 2006. /14/ Z. He, Z. Peng, P. Eles, P. Rosinger, and B. M. Al-Hashimi. "Ther-mal-aware SoC test scheduling with test set partitioning and interleaving". International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT), 2006, pp. 477-485 /15/ S. Gunther, F. Binns, D. M. Carmen, and J. C. Hall. "Managing the impact of increasing microprocessor power consumption". Intel Technology Journal. 2001. /16/ R. Mahajan. "Thermal management of CPUs: A perspective on trends, needs and opportunities". Keynote presentation at the 8th Int'l Workshop on THERMal INvestigations of ICs and Systems (THERMINIC). 2002. /17/ K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan. "Temperature-aware microarchitecture: Modeling and implementation". ACM Trans. Architecture and Code Optimization (TACO). Vol. 1, No. 1. pp. 94-125, 2004. /18/ C. Shi and R. Kapur. "How power-aware test improves reliability and yield". EE Times, Sept. 15, 2004. http://www.eetimes.com/ showArticle.jhtml?articlelD=47208594. /19/ Z. He. System-on-Chip Test scheduling with Defect-Probabili-ty and Temperature Considerations, Licentiate thesis, Dept. of Computer Science, Linkoping University, No. 1313, 2007. /20/ S. Gerstendorfer and H. J. Wunderlich. "Minimized power consumption for scan-based BIST," in Proe. IEEE International Test Conference, 1999, pp. 77-84. /21/ J. Saxena, K. M. Butler, and L. Whetsel, "An analysis of power reduction techniques in scan testing," in Proc. IEEE International Test Conference, 2001, pp. 670-677. /22/ P. Rosinger, B. M. Al-Hashimi, and N. Nicolici, "Power profile manipulation: A new approach for reducing test application time under power constraints," IEEE Trans. CAD, vol. 21, no. 10, pp. 1217-1225, 2002. /23/ P. Flores, J. Costa, H. Neto, J. Monteiro, and J. Marques-Silva, "Assignment and reordering of incompletely specified pattern sequences targeting minimum power dissipation," in Proc. International Conference on VLSI Design, 1999, pp. 37-41. /24/ P. Girard, C. Landrault, S. Pravossoudovitch, and D. Severac, "Reducing power consumption during test application by test vector ordering," in Proc. IEEE International Symposium on Circuits and Systems, 1998, pp. 296-299. /25/ W. Huang, S. Ghosh, K. Sankaranarayanan, K. Skadron, and M. R. Stan. "HotSpot: Thermal modeling for CMOS VLSI systems." IEEE Trans, on Component Packaging and Manufacturing Technology. 2005. /26/ K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. "Temperature-aware microarchitecture." International Symposium on Computer Architecture, 2003, pp. 2-13. /27/ W. Huang, M. R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusamy. "Compact thermal modeling for temperature-aware design". Design Automation Conference (DAC), 2004. pp. 878-883. /28/ B. S. Baker, E. G. Coffman Jr., and R. L. Rivest. "Orthogonal packings in two dimensions". SiAM Journal on Computing, Vol. 9, No. 4, pp.846-855, 1980. 226 Z. Peng, Z. He, P. Eles: Challenges and Solutions for Thermal-Aware SoC Testing Informacije MIDEM 37(2007)4, str. 220-227 /29/ H. Dyckhoff, "A typology of cutting and packing problems". European Journal of Operational Research, Vol. 44, No. 2, pp. 145-159. 1990 /30/ N. Lesh, J. Marks, A. McMahon, and M. Mitzenmacher. "Exhaustive approaches to 2D rectangular perfect packings", Elsevier Information Processing Letters, Vol. 90, No. 1, pp. 7-14, Apr. 2004. /31 / Z. He, Z. Peng and P. Eles, "A heuristic for thermal-safe SoC test scheduling." International Test Conference (ITC), 2007. /32/ S, K. Goel, and E. J. Marinissen. "Control-aware test architecture design for modular SoC testing". European Test Workshop (ETW), 2003. pp. 57-62. 227 Zebo Peng, Zhiyuan He and Petru Eles Embedded Systems Laboratory Linkoping University, Sweden zpe@ida.liu.se Prispelo (Arrived): 15.07.2007 Sprejeto (Accepted): 01.09.2007 UDK621.3.'(53+54+621 +66), ISSN0352-9045 Informacije MIDEM 37(2007)3, Ljubljana DESIGN & TEST OF SYSTEM-IN-PACKAGE P. Cauvet1, S. Bernard2 and M. Renovell2 1NXP Semiconductors, Caen Cedex 9, France 2LIRMM, University of Montpellier, Montpellier, France INVITED PAPER MIDEM 2007 CONFERENCE - WORKSHOP ON ELECTRONIC TESTING 12.09. 2007 - 14.09. 2007, Bled, Slovenia Key words: System-in-package, SiP testing, SiP yield Abstract: System-in-Package (SiP) is now becoming a significant technology in the semiconductor industry. In this talk, the basic SiP concepts are first discussed, showing difference between SiP and SoC, Illustrated by some examples, drawn from real-life cases. The specific challenges are considered from the testing point of view, focussing on the assembled yield and defect level for the packaged SIP. Načrtovanje in testiranje sistemov v enem ohišju Kjučne besede: sistem v ohišju, testiranje SIP, izkoristek SiP Izvleček: Izdelava sistemov v ohišju (SiP) dandanes predstavlja eno od pomembnih tehnologij polprevodniške Industrije. V prispevku najprej opišemo osnovne koncepte SiP, kjer predstavimo razliko med SiP in SoC na nekaj primerih iz prakse. Še posebej predstavimo posebne izzive s stališča testiranja in se osredotočimo na izkoristek in gostoto defektov montaže SiP. 1 Introduction Around the year 2000, the mobile phone applications have induced a paradigm shift In multi-chip packaging: the SIP has now become the fastest growing area in the packaging domain due to its associated system integration benefits. In the mobile phone applications, the system integrators have to face short product life cycles and they came to the first evidence that Integrating existing and available ICs when a SiP can be used, is easier than to reinvent new ICs from scratch. The International Technology Roadmap for Semiconductors published by the Semiconductor Industry Association defines a system-in-package (SiP) as any combination of semiconductors, passives, and interconnects integrated Into a single package /1/. As a matter of fact, the definition can be even larger: A SiP can combine different die technologies and applications with active and passive components to form a complete system or sub-system, where the embedded components are Interconnected by wire-bond, flip-chip, stacked-die technology, or any combination of the above. A SoC is created from a single piece of substrate, i.e., a single die; the single die is fabricated in a single process technology with a single level of interconnections from the die to the package pads or to the interposer, whereas a SiP is created from several different dies, i.e., multiple parts; these dies can come from a broad mix of multiple process technologies, such as CMOS, GaAs, or BiCMOS, with mul- tiple levels of interconnections from die/component to die/ component, or from die/component to the package pads or to the interposer. One could say that everything is 'single' for a System-On-Chip while everything is 'multiple' for a System-ln-Package. Figure 1 shows an example SiP in a global system for mobile communications (GSM) application where multiple dies, components, and leadframe connections are embedded in a single package. The multiple dies as well as the multiple levels of interconnections are clearly visible on this example. In recent years, many types of SiP have been developed, differing by their type of carrier or interposer to be used for holding the bare die (or component) and the type of interconnections to be used for connecting components. The carrier or interposer can be a leadframe, an organic laminate, or sillconbased as illustrated in Figure 2. Another possibility is to stack components on top of each other, called stacked dies. Stacked dies are represented in Figure 3. SiP offers a unique advantage over SoC in its ability of integrating not only any type of semiconductor technology and passive components into a single package, but also micro-electromechanical system (MEMS) with circuitry to provide a fully functional system. 228 P. Cauvet, S. Bernard and M. Renovell: Design & Test of System-in-package Informacije MIDEM 37(2007)4, str. 228-234 tV !M /I'll» Ul'W WÊÊÂ Fig. 1: Multiple dies, components and interconnections in one package a) Leadframe b) Laminate c) Silicon-based Fig. 2: Examples of carrier style Fig. 3: Example of stacked components /2/ 2 SiP Challenges The fabrication and test flow of a standard SiP is more complex than the SoC flow. A SiP is basically the assembly of N dies (die #1 to die #n), and possibly some passive components, using a carrier or interposer. The resulting SiP can be a very sophisticated and expensive product. In addition to these costs, it is important to note that typically a defective SiP cannot be repaired. All these arguments lead to the following statement: 'The SiP process is economically viable only if the associated yield Ysip of the packaged SiP is enough high' The 'production' yield of the packaged SiP can be defined by the number of correct SiPs (Ac) divided by the total number of assembled SiPs (A): YSip = Ac / A In order to optimize the yield YsiP of the packaged SiP, it is obviously necessary to minimize the number of defective parts. The origins for these defective SiP are numerous and diverse, among them, a defective substrate, an incorrect placement or mounting of the dies and components, a defective soldering or wirebonding of the dies and components, a stress on dies during assembly, defective dies, etc. Consequently, for an n-die SiP, the yield can be expressed as follows: YSip = 100x [ Pi x p2 x...x pn] x ps x pA where Pi is the probability that die #i is defect-free, Ps is the probability of substrate being defect-free, and Pa is the probability of assembly process being defectfree. The above equation demonstrates the cumulative effect of the different defect levels. The substrate used as a carrier is generally made of a mature technology with a high level of quality, whereas, the assembly process and the quality of the mounted dies are particularly critical. Indeed, SiP are only viable if the quality of the assembly process is sufficiently high. In the IC fabrication context, yields (Yic) of around 75% are quite common. But the SiP assembly context is totally different, because the yield associated to the assembly process Ya must be very high. For example, a viable assembly yield for SiP is typically around 99%. Moreover, an acceptable assembly yield requires that every die in the SiP exhibits a very low defect level. In other words, only high quality components are used in the SiP assembly process. Consequently, the bare dies used in the SiP assembly process must exhibit the same, or better, quality level than a packaged IC. This is known as the Known Good Die (KGD) concept, which can be stated as follows: KGD: A bare die with the same, or better, quality after wafer test than its packaged and 'final tested' equivalent The majority of the challenges to achieve the KGD quality lie in the testing of the mixed signal and RF blocks. Under the assumption that only high quality components (KGD) are used in the assembly process, the test process of the packaged SiP must focus on the faults that originate from the assembly process itself. Therefore, testing a SiP with KGD is a combination of functional test at the system level with structural test and more precisely 'defect oriented test' at the die and interconnect level.. 3 Bare-die testing Two major factors have to be considered by the manufacturers in bare die testing: test escapes of the die and infant mortality in the final package, at system level. Meeting the KGD target for high-volume markets represents a big challenge for the industry, because high pin count, high speed 229 Informacije MIDEM 37(2007)4, str. 228-234 P. Cauvet, S. Bernard and M. Renovell: Design & Test of System-in-package and high frequencies can be handled more easily with a packaged stand-alone device than at wafer level /3/. In this section we present several solutions to help achieve this target, successively based on advanced probing techniques, alternative test methods, and reliability screening. 3.1 Advanced probing techniques The so-called 'cantilever1 probe card technology has been used for a long time and still represents the biggest market share in this segment /3/. However, the development of very complex devices, combined with the expansion of multi-site (parallel) testing, pushes the fabrication of probe cards towards the limits of the technology, while the size and the pitch of the pads regularly decreases, pulling the probe cards towards novel, but expensive, technologies /4/. On top of those requirements, the growth of chip-scale and wafer-level packages puts further demands on probes, which contact solder bumps instead of planar pads. Some solutions are now available able to push the mechanical limits forward. Among them, MEMS-based implementations of probe cards are now starting to replace the traditional macroscopic technologies /5/. An example of a MEMS-based probe card is shown in Figure 4. Another solution for KGD wafer testing is also emerging, which consists of replacing the traditional probe card by a non-contact interface /7/ to avoid any scrubbing of the bond pads. Fig. 4: View of probe tips in MEMS technology /6/ From an electrical perspective, KGD cannot be achieved for most of RF and high-speed mixed-signal ICs. So far, analog ICs are mainly tested against their specified parameters. This strategy has proved to be effective, but places a lot of requirements on the test environment including ATE, test board, and probe card. Indeed, testing RF and highspeed mixed-signal ICs represents a big challenge, because the propagation of the signal along the path may be disturbed by parasitic elements. The integrity of the RF signals can be guaranteed only with short, impedance-matched connections between the source and the load. Compared to the traditional cantilever probe card, the "membrane" technology can solve many problems, by reducing the distance between the pads and the tuning components. Micro-strip transmission lines are designed on a flexible dielectric material in order to connect the test electronics from the ATE to the DUT/8/, /9/, /10/. This technology already offers a number of significant advantages for high-performance wafer test, from both electrical and mechanical perspectives. Another approach to KGD for RF/analog products relies on alternative test methods. A representative example is given by a technique that consists of ramping the power supply and observing the corresponding quiescent current signatures /11 /. Other approaches propose to re-use some low-speed or digital internal resources of the DUT, and to add some DfT features in order to get rid of RF signals outside of the DUT /12/ /13/. The combination of such methods in conjunction with some structural testing techniques will help reach very high defect coverage for analog, mixed-signal and RF chips. 3.2 Reliability screening Burn-in testing is the traditional method for eliminating infantile defects. It is done by applying abnormally high voltage and elevated temperatures to the device, usually above the limits specified in the data sheet for normal operation. Burn-in test is an effective method, but too costly for high volume ICs for low cost consumer and mobile markets. Novel reliability screens need to be developed, that can be applied at the wafer level and that may fulfill the targets without burn-in testing. In recent years, diverse alternate methods were developed and published to reduce the infantile mortality of the dies, such as IDDq /14/, high-voltage stress /15/, /16/, or statistical-based methods /17/ . Screening methods are not unique and the trend is to couple them in order to achieve a reliability level that fulfills the requirements for KGD. 4 System test System test at the SiP level can be considered in two ways: functional system test, and access methods. 4.1 Functional system test In a traditional system test, the application specifications of the system are tested and the overall functionality is checked. The biggest advantage of this test method is the good correlation at system level between the measurement results of the SiP supplier and those of the SiP customer (end-integrator). Also, the required quality level can be reached in very short time. However, this approach suffers from many drawbacks, such as a complex and expensive test setup, long test times, and lack of diagnosis capabilities. 230 P. Cauvet, S. Bernard and M. Renovell: Design & Test of System-in-package Informacije MIDEM 37(2007)4, str. 228-234 Enhanced solutions have been proposed in recent years, mainly driven by wireless communications applications. Basically, we consider a system made of a transmitter and a receiver, as shown in Figure 5. In this example, we consider a SiP made of three dies: digital plus mixed-signal circuitry, an RF transceiver including a low-noise amplifier (LNA), and finally a power amplifier (PA). Other elements, such as switches, or filters, can also be placed on the substrate. In this typical architecture, two paths are considered: the transmitter (Tx) path, and the receiver (Rx) path. To measure the system performance, the test is split in two paths, the receiver path (BER, bit error rate) and the transmitter path (EVM, error vector magnitude), respectively. In practice, a receiver is tested using sources able to generate digitally-modulated RF signals, and a transmitter is tested using a demodulator and a digitizer. TX DSP core JUU1 jirui RX TX-FEM Transceiver Baseband Fig. 5: A typical transceiver system An original method is proposed in /18/, where the signals propagated through the analog paths are used to test the digital circuitry. Loop-back techniques are also increasingly proposed in literature. Most of these techniques are combined with alternate test methods, in order to reduce the test time and to be more predictable. Several solutions were recently published /19/, /20/, /21/. As previously discussed, testing bare dies after assembly is a critical phase to achieve an economically viable SiP and to give some diagnostic capabilities. The test consists of two complementary steps, structural testing of interconnections between dies, and structural or/and functional testing of dies themselves. The main challenge is to access these dies from the primary I/O of the SiP. The total number of effective pins of the embedded dies is generally much higher than the number of I/O for the package. Moreover, in contrast to SoC where it is possible to add some DfT between IP to improve controllability and observability, the only available active circuitry for test in the SiP are the connected active dies. Consequently, improving testability places requirements on the bare dies used for the SiP and the definition of a specific SiP Test Access Port (TAP). 4.2 SiP Test Access Port The SiP context imposes some specific constraints on the TAP. This SiP-TAP must afford several features, mainly, Cost = 1U ¥~ri Fig. 6: Conceptual view of an example of SiP among them, the access for die and interconnection tests, SiP test enabling at system level as it would be for a SoC, and additional recursive test procedures during the assembly phase. Taking all the requirements into consideration, the SiP TAP controller must have two configurations: one during the recursive test and the other for the end-user test. Following the ordered assembly strategy, the first die will integrate the SiP TAP controller and, as a result, the ID code of the SiP will be the ID code of this first die. Figure 7 shows the conceptual view of the multimode SiP TAP with switches and multiplexers to implement the star or the ring configuration. The 'star configuration allows a direct access to each die to facilitate recursive testing during the assembly. The 'ring' configuration is designed such that the end-user cannot detect the presence of several dies, either for identification (there is only one ID code), or for boundary scan test. Thus far, no SiP TAP standard exists, but architectures have been proposed based on the IEEE 1149.1 standard /22/ or the IEEE 1500 standard /23/. Die1 TS^Hn Fig. 7: Multi-configuration TAP 231 Informacije MIDEM 37(2007)4, str. 228-234 P. Cauvet, S. Bernard and M. Renovell: Design & Test of System-in-package 4.3 Interconnections There are two types of interconnections: interconnection between dies and Interconnection between die and SiP bond pads. The test method for interconnections is equivalent in both cases but the access issues are obviously different. Concerning digital Interconnections, the test of interconnection is performed through the boundary scan (IEEE 1149.1) with the external test mode. Similarly to SoC with internal IP, we face the problem of accessing the inputs and the outputs of internal dies. In fact, SiP with four times less external pads than internal pins of embedded dies is very common. Consequently, we rely on boundary scan facilities for testing the internal dies. By activating the bypass function in dies, it is possible to reduce the length of the scan chain. However, "at speed testing" requires using techniques such as compression, DfT, and BIST. Unfortunately, in the SiP context, no additional active silicon is available and so no additional circuitry can be implemented. Consequently, either BIST or DfT should already exist in the die itself. Obviously, there is very little chance to have this DfT facility available on the hardware of one of the other dies because the design of each die is completely independent. A solution consists of using the software or programmable capabilities available on the other digital dies to implement a fully configurable DfT. Another method uses a transparent mode of the other dies to directly control and observe from the primary I/O of the package. Unfortunately, we might find a SiP configuration where none of these techniques can be applied. In this case, the only solution to access the specific internal pin is to add direct physical connection SiP I/O pins while attempting to meet all the associated requirements in terms of signal integrity. In the specific case of a memory die, the access problem is critical since these embedded memories are generally already packaged. These Package-on-Package (PoP) or Package-in-Package (PiP) configurations have no BIST capabilities and thus the BIST has to be implemented in another digital core for application to the embedded memory. 4.4 Analog , RF and MEMS components For the test of analog, mixed-signal, or RF dies, the two most significant challenges are the cost reduction of the required test equipment, and the test of embedded dies because of difficulty to access to these dies after SiP assembly. From the point of view of the test engineer, the possibility of assembling heterogeneous components might be a testing nightmare since the test equipment has to be able to address the whole set of testing requirements in all domains: digital, RF, analog, etc, resulting in unacceptable test costs (ATE options, test time, etc.). The functional tests are required to achieve a satisfactory test quality and to give some diagnostic capabilities at the die level. Even if all the tests previously performed at the wafer level for each die are not necessarily required after assembly, the price of the test equipment and the very long test sequences usually make the test cost prohibitive. As a result, specific approaches must be considered to reduce the testing time and the test equipment cost. A common approach is to move some or all the tester functions onto the chip itself. Based on this idea, several BIST techniques have been proposed where signals are Internally generated and/or analyzed /24//25//26//27//28/. However, the generation of pure analog stimuli and/or accurate analog signal processing to evaluate the system response remains the main roadblock. Another proposed approach is based on indirect test techniques. The fundamental Idea is to replace the difficult direct measurements by easier indirect measurements, provided that a correlation exists between what is measured and what is expected from the direct measurements. This approach looks promising for testing RF systems, for example the techniques using artificial neural networks /35/. Other techniques consist of transforming the signal to be measured into a signal that is easier to be measured by ATE. For example, timing measurement is easier for ATE than a precise analog level evaluation and so, a solution is to on-chip convert an analog signal to a proportional timing delay. Another possible solution consists of using DfT techniques to internally transform the analog signals to digital signals that are made controllable and observable from the chip I/Os /29//25/. As a result, only digital signals are externally handled by less-expensive "digital" test equipment (a Low Cost Tester for example). These techniques are limited by the accuracy of the conversion of the analog signal. A similar approach attempts to avoid the problem of conversion accuracy by assuming several digital-to-analog converters (DACs) and analog-to-digital converters (ADCs) are already available to obtain a fully digital test /30/. All the SiP including RF, low-frequency analog, and/or mixed-signal devices can benefit from these above techniques; Micro-Electric-Mechanical-Systems (MEMS) represent the extreme cases of heterogeneous systems. Indeed, In a typical MEMS, we can find accelerometer, pressure sensor, temperature or humidity sensors, micro-fluidic system, bio-mems, etc. The first problem for MEMS testing begins with the required test equipment. MEMS are generally dedicated to generate or actuate non-electrical signals. Consequently, test equipment should allow generation and measurement using sound, light, pressure, motion, or even fluidics. Because of their price, the difficulty to implement them, and the very long associated testing time, the use this type of equipment for production test (especially at wafer level) is rarely an option /31 /. In production test environment, only fully electrical signals are actually viable. 232 P. Cauvet, S. Bernard and M. Renovell: Design & Test of System-in-package Informacije MIDEM 37(2007)4, str. 228-234 In this context, two approaches are likely: perform an indirect structural or functional test on an electrical signal that would be an image of the physical signal associated with the MEMS under test, or implement some DfT circuitry allowing one to convert the physical signal associated with the MEMS to an electrical signal /32/, /33/. Another major challenge in MEMS testing is due to the significant package influence. Indeed, MEMS characteristics depend on the properties and the quality of the package used. As a result, the cost of MEMS testing can be prohibitive /34/. For MEMS integration into a SiP, the classical problems of MEMS testing are exacerbated. From the package standpoint, the SiP concept poses new challenges. For monolithic MEMS in CMOS technology, direct integration of the bare MEMS onto the passive substrate is conceivable. For more complex MEMS, the bare die can be flipped onto the passive substrate. Achieving a perfect etching and sealing of the cavity, and guaranteeing the cavity quality during the life of the system represent the new challenges. In this context, one solution consists in adding an additional and simple MEMS into the cavity to monitor the cavity characteristics as illustrated in figure 8. MFMS under test Substrate - \ Sensor Sealing Fig- Cavity monitoring thanks to additional sensor (MEMS). Considering access to MEMS in the SiP, for both smart MEMS composed of significant digital processing and for simple analog sensor, the problem is equivalent to digital and mixed-signal die. As a result, the solutions are thus similar to those described earlier according to the nature of the electric signal to be accessed. 5 Conclusion A system-in-package (SiP) is a packaged device comprised of two or more embedded bare dies and, in most cases, passive components. This technology has found many applications in recent years, providing system or sub-sys-tem solutions in a single package, using various types of carriers and interconnect technologies. In this paper, we describe several emerging solutions to achieve the KGD target, based on advanced probing, alternate methods, and enhanced screening techniques. The test at system level is also addressed, from two different -but complementary - viewpoints, functional test and test access. Currently, SiP is moving towards ever more sophisticated packaging technologies, which will require new test solutions. The trend towards more functionality combined with more communication features for emergent applications, such as healthcare, smart lighting, or ambient computing, drives the integration of a large variety of sensors and actuators. Consequently, very heterogeneous SiP implementations will be developed, posing new test challenges. Acknowledgement: This work has been carried out in ISyTest (Joint LIRMM-NXP Institute), under the umbrella of the European ME-DEA+ Project: "Nanotest". References /1/ SIA "The International Technology Roadmap for Semiconductors": 2005 Edition - Assembly & Packaging, Semiconductor Industry Association, San Jose, CA (http://www.itrs.net/Links/ 2005ITRS/AP2005.pdf). /2/ Die Product Consortium, http://www.dieproducts.org /3/ W.R. Mann, F.L. Taber, P.W. Seitzer, and J.J. Broz "The Leading Edge of Production Wafer Probe Test Technology", Proc. IEEE Int. Test Cortf, 2004, pp. 1168-1195. /4/ SIA, The International Technology Roadmap for Semiconductors: 2004 Update, Semiconductor Industry Association, San Jose, CA (http://public.itrs.net). /5/ M. D. Cooke and D. Wood, "Development of a Simple Microsystems Membrane Probe Card" Proc. IEEE Symp. On Design, Test, Integration and Packaging of MEMS and MOEMS, 2005, pp. 399-404. /6/ Advanced Micro Silicon Technology, www.swtest.org/ swtwJibrary/2000proc/PDF/S14_Kim. pdf /7/ C. Sellathamby, M. Reja, L. Fu, B. Bai, E. Reid, S. Slupsky, I. Filanovsky, and K. Iniewski "Noncontact Wafer Probe Using Wireless Probe Cards", Proc. IEEE Int. TestConf, 2005, Paper 18.3, (6 pages). /8/ B. Leslie and F. Matta "Wafer-level Testing with a Membrane Probe", IEEE Design and Test of Computers, 1989, Vol. 6, Issue 1, pp 10-17. /9/ J. Leung, M. Zargari, B. A. Wooley, and S. S. Wong "Active Substrate Membrane Probe Card", Int. Electron Devices Meeting, 1995, pp.709-712. /10/ S. Wartenberg "Six-gigahertz Equivalent Circuit Model of an RF Membrane Probe Card", IEEE Transactions. On Instrumentation and Measurement, 2006, Vol. 55, No. 3, pp. 989-994. /11/ J. Pineda de Gyvez, G. Gronthoud, and R. Amine, Vdd Ramp Testing for RF Circuits, Proc. IEEE Int. TestConf., pp.651-658, Sept. 2003. /12/ S. S. Akbayand A. Chatterjee "Feature Extraction Based Built-in Alternate Test of RF Components Using a Noise Reference", Proc. IEEE VLSI Test Symp., 2004, pp. 273-278. /13/ A. Haider and A. Chatterjee "Specification Based Digital Compatible Built-in Test of Embedded Analog Circuits", Proc. IEEE Asian Test Symp., 2001, pp. 344-349. /14/ R. Arnold "Test Methods Used to Produce Highly Reliable Known Good Die (KGD)", Proc. IEEE Int. Conf. on Multichip Modules and High Density Packaging, 1998, pp. 374-382. 233 Informacije MIDEM 37(2007)4, str. 228-234 P. Cauvet, S. Bernard and M. Renovell: Design & Test of System-in-package /15/ R. Kawahara. 0. Nakayama, and T. Kurasawa "The Effectiveness of IDDQ and High Voltage Stress for Burn-in Elimination / CMOS production/", IEEE International Workshop on IDDQ Testing, 1996, pp 9-13. /16/ T. Barrette. V. Bhide, K. De, M. Stover, and E. Sugasawara "Evaluation of Early Failure Screening Methods /ASICs/", IEEE International Workshop on IDDQ Testing, 1996, pp 14-17. /17/ A.D. Singh, P. Nigh, and C.M. Krishna "Screening for Known Good Die (KGD) Based on Defect Clustering: an Experimental Study", Proc. IEEE Int. Test Conf, 1997, pp 362 - 369. /18/ S. Ozev, A. Orailoglu, and I. Bayraktaroglu "Seamless Test of Digital Components in Mixed-signal Paths", IEEE Design and Test of Computers, 2004, Vol. 21, Issue 1, pp 44-55. /19/ G. Srlnivasan, A. Chatterjee, and F. Taenzler "Alternate Loop-back Diagnostic Tests for Wafer-level Diagnosis of Modern Wireless Transceivers Using Spectral Signatures", Proc. IEEE VLSI Test Symp.,2006, 6 pages. /20/ D. Lupea, U. Pursche, and H-J. Jentschel "RF BIST: Loopback Spectral Signature Analysis", Proc. IEEE Design, Automation, and Test in Europe, 2003, pp. 478- 483. /21 / J. Dabrowski "BiST Model for IC RF-transceiver Frontend" Proc. IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems, 2003, pp. 295-302. /22/ F. de Jong and A. Biewenga "SiP-TAP: JTAG for SIP", Proc. IEEE Int. Test Conf., 2006, pp.389-395. /23/ D. Appello, P. Bernardi, M. Grosso, and M.S. Reorda "System-in-package Testing: Problems and Solution", IEEE Design and Test of Computers, 2006, Vol. 23, No. 3, pp. 203-211. /24/ M. Toner and G. Roberts "A BIST Scheme for an SNR Test of a Sigma-delta ADC", 1993, Proc. IEEE Int. Test Conf., pp. 805-814. /25/ M. J. Ohletz "Hybrid Built-in Self-test (HBIST) for Mixed Analog/ Digital Integrated Circuits", Proc. IEEE European Test Conf., 1991, pp.307-316. /26/ S. Sunter and N. Nagi "A Simplified Polynomial-fitting Algorithm for DAC and ADC BIST", Proc. IEEE Int. Test Conf., 1997, pp.389-395. /27/ F. Azals, S. Bernard, Y. Bertrand, and M. Renovell "Towards an ADC BIST Scheme Using the Histogram Test Technique", Proc. IEEE European Test Workshop, 2000, pp. 53-58. /28/ F. Azais, S. Bernard, Y. Bertrand, and M. Renovell "Implementation of a Linear Histogram BIST for ADCs", Proc. IEEE Design, Automation and Test in Europe, 2001, pp.590 - 595. /29/ N. Nagi, A. Chatterjee, and J. Abraham "A Signature Analyzer for Analog and Mixed-Signal Circuits", Proc. IEEE Int. Conf. on Computer Design, 1994, pp.284-287. /30/ V.Kerzerho, PCauvet, S.Bernard, F.Azais, M.Comte "Analogue network of converters: a DfT technique to test a complete set of ADCs and DACs embedded in a complex SiP or SOC", Proc IEEE European Test Symposium, 2006, pp 159-164. /31 / S. Mir, L. Rufer, and B. Courtois "On-chip Testing of Embedded Silicon Transducers", Proc. IEEE Int. Conf. on VLSI Design, 2004, pp. 463-472. /32/ Analog Devices, (fttp://www.analog.com), 2007. /33/ H. V. Allen, S. C. Terry, and D. W. DeBruin "Accelerometer Systems with Self-testable Features", Proc. IEEE Sensors and Actuators, 1989, pp. 153-161. /34/ B. Chariot, S. Mir, F. Parrain, and B. Courtois "Electrically Induced Stimuli for MEMS Self-test", 2001, Proc. IEEE VLSI Test Symp., pp.60-66. /35/ S. Ellouz, P. Gamand, C. Kelma, B. Vandewiele, B. Allard "Combining Internal Probing with Artificial Neural Networks for Optimal RFIC Testing", Proc. IEEE Int. Test Conf., 2006, 9 pages. P. Cauvet, NXP Semiconductors, 2 Esplanade Anton Philips, BP20000, 14906 Caen Cedex 9, France philippe. cauvet@nxp. com, bernard@lirmm. fr S. Bernard2 and M. Renovell LIRMM, University of Montpellier / CNRS - 161 rue Ada, 34392 Montpellier, France renovell@lirmm. fr Prispelo (Arrived): 15.07.2007 Sprejeto (Accepted): 01.09.2007 234 UDK621.3.'(53+54+621 +66), ISSN0352-9045 Informacije MIDEM 37(2007)3, Ljubljana DEBUG AND DIAGNOSIS: MASTERING THE LIFE CYCLE OF NANO-SCALE SYSTEMS ON CHIP Hans-Joachim Wunderlich, Melanie Elm, Stefan Holst Institut für Technische Informatik, Universität Stuttgart, Stuttgart, Germany INVITED PAPER MIDEM 2007 CONFERENCE - WORKSHOP ON ELECTRONIC TESTING 12.09. 2007 - 14.09. 2007, Bled, Slovenia Key words: Diagnosis, Debug, Embedded Test Abstract: Rising design complexity and shrinking structures pose new challenges for debug and diagnosis. Finding bugs and defects quickly during the whole life cycle of a product is crucial for time to market, time to volume and improved product quality. Debug of design errors and diagnosis of defects have many common aspects. In this paper we give an overview of state of the art algorithms, which tackle both tasks, and present an adaptive approach to design debug and logic diagnosis. Special design for diagnosis is needed to maintain visibility of internal states and diagnosabillty of deeply embedded cores. This article discusses current approaches to design for diagnosis to support all debug tasks from first silicon to the system level. Diagnoza in odkrivanje napak: obvladovanje življenjske dobe nanosistemov na čipu Kjučne besede: diagnoza, odkrivanje napak, vgrajeni test Izvleček: Povečevanje kompleksnosti in vse manjše strukture na čipu stalno postavljajo nove izzive pri testiranju in odkrivanju napak. Hitro odkrivanje napak in defektov tekom življenjske dobe izdelka je pomembno pri uvajanju izdelka na trg, povečevanju proizvodnje in izboljševanju izdelkove kvalitete. V prispevku podajamo opis trenutnega stanja sodobnih algoritmov za odkrivanje napak in diagnozo defektov. Zavzemamo se za poseben pristop k načrtovanju, ki omogoča diagnozo in vidljivost tako notranjih stanj kot globoko vgrajenih sredic. Na ta način olajšamo odkrivanje napak od prvega silicija do sistemskih nivojev. 1 Introduction For the economic success of a product three factors are crucial: fast time to market, low cost per unit and high product quality. The relevance of debug and diagnosis to optimize these factors is obvious, In each phase of a product's llfecycle, defects and bugs have to be found quickly and special equipment and automatisms are necessary to achieve this goal. In the development phase of a product, debug is the essential mean to find and locate bugs, in the production and support phase the appropriate mean is diagnosis. Debug is the process of locating logical and functional flaws in specifications, hardware and software. As logical and functional flaws remain the main cause of today's design respins, verification is turning into a critical bottleneck with increasing complexity of Systems on Chip (SoC) /1/, /2/. Despite the efforts spent on advanced verification and validation techniques, the percentage of designs with functional errors has Increased between the years 2002 and 2004 /3/. Diagnosis is the process of locating faults in a physical chip at the various levels down to real defects. Numerous parasitic and timing effects may show up in the first silicon /4/ and have to be located and eliminated to enable a fast yield ramp up. But even after successful production diagnostic techniques have to be applied to returns to further improve the product quality and learn for future products. Design verification and diagnosis of microelectronic circuits have long been viewed as separate tasks with individual challenges and techniques. However, in recentyears more and more attention is paid to the interaction of individual design steps in verification, diagnosis in production, and field return analysis. Since diagnosis and debug have the common objective of achieving high diagnostic resolution, improving accessability of Internal signals and cores helps with all aspects of verification, debug and diagnosis. Techniques which were formerly employed for test and afterwards discovered for diagnosis - like the scan design method and test point insertion - are now reused for design validation /5/, /6/. Additionally It becomes more and more obvious that yield ramping starts with design for man-ufacturability /7/, /8/, /9/, thus many precautions have to be taken on the designer's side to enable debugging and diagnosis. Taking a look at the progress of nanometer technology, designs have to be more robust due to increased varia- 235 Informacije MIDEM 37(2007)4, str. 235-243 H.-J. Wunderlich, M. Elm, S. Hoist: Debug and Diagnosis: Mastering the Life Cycle of Nano-scale Systems on Chip tions of nano-scaled silicon. Automated maintenance and built in self-repair become strong requirements to achieve high reliability, but employing this, designs get both hard to test and hard to diagnose. Future work will have to overcome these challenges. In this paper we give an outline of recent research challenges and developments in the area of debug and diagnosis. In the following chapter, debug and diagnosis tasks throughout the life cycle of a SoC are discussed. In chapter 3, algorithms are presented to locate faulty structures in circuits and defects on chips, in chapter 4, methods are discussed to improve accessability to internal states of devices. 2 Life cycle debug and diagnosis Designing and manufacturing is an error prone process. Some of the errors are due to misconceptions and human mistakes, others are unavoidable due to unknown conditions. Even under the assumption of no human mistakes there is a strong necessity for quality control and improvement during the complete life cycle of the system. This section discusses tasks, challenges and requirements for tackling faults occurring during design, manufacturing and operation. 2.1 Specification Functional errors caused by the designer are mostly due to an inconsistent or incomplete specification. After specification, we have functional simulation models for all major system components. But with growing system complexity, simulation time for the complete system grows to an extend where simulation covers only a small portion of the design space. Despite all efforts to find functional errors as early as possible, functional errors may then show up later in prototypes and during system-level debug. Increased accessibility introduced by design for diagnosis helps tracking down functional errors during system-level debug. Though system-level debug itself is beyond the scope of this article, we spent a paragraph on trace buffers in chapter 4. 2.2 Implementation Implementation is the design of hardware according to the available models or specification. The implementation has to be verified to avoid respectively find design-errors, which are defined as deviations of the implementation from the specification. Estimates today are that more than 70% of the total design time is spent on verification /1/, /2/. Standardized debug methods and algorithms have to be developed to decrease verification and thereby design time. Today, assertion based verification is used and supported by commercial tools. Most often, this task relies on simulation and becomes very expensive. Again, some design errors may only show up during emulation, in prototypes or even during mass production, where then an Increased accessability introduced by design for diagnosis during the implemenation phase has to help tracking down design errors. The variety of design-errors is infinite, nevertheless the most common design flaws can be made out as violated timing constraints or altered logic functions due to manual optimizations. Design debug is the task of finding those flaws in a design. Though design errors are defined relative to the specification, design debug does not depend on a fault-free specification to track down deviations. To gain a deeper understanding of today's debugtech-niques, a different view on the problem is helpful: A design fault is the logic function of the smallest circuit part which needs to be rectified. This view bridges debug and diagnosis, where fault-modeling is applied to describe possible misbehaviours of a circuit, caused by design-errors or defects. One way to describe design errors are conditional stuck-at faults as proposed in /10/. Conditional stuck-at faults are stuck-at faults with an additional activation condition. If multiple conditional stuck-at faults are assigned to a single line or a multiple line, then an arbitrary combinational faulty behavior can be described. Figure 1 amplifies how to model a design error, where an AND gate is exchanged by an OR gate with the help of a conditional stuck-at fault. Specification: Faulty net list: A B & A B >1 Model: A B & cs@l condition(cs@l) = A+B Fig. 1: Example of a conditional stuck-at fault Among all the approaches to describe malfunctioning designs by fault models, there is no fault model matching every possible design fault. The last step in implementation is the final layout of a design. Layouts are verified by extracting a netilst and debugging the extracted netilst with the same design debug methods used before. 2.3 Prototyping With the masks from the implementation phase, first chips are produced. Due to various unknown effects, not all of the prototypes produced will work properly. This problem 236 H.-J. Wunderlich, M. Elm, S. Hoist: Debug and Diagnosis: Mastering the Life Cycle of Nano-scale Systems on Chip Informacije MIDEM 37(2007)4, str. 235-243 is getting more severe with shrinking structures, as now systematic and random variations are increasing. In consequence, the actual behavior of physical chips gets more and more difficult to predict and simulate /11/, and additionally it is harder to decide whether a device works within a given specification or not /12/. A physical disorder which leads to a behavior different from the implementation is called a defect. Systematic defects must be identified and avoided by altering the design or the process parameters. The new behavior of the internal signals due to a present defect is called a fault. In recent technologies, the complexity and variety of possible faulty behaviors is increasing, and fault models cannot reflect reality any more. The increasing variations in nano-scale silicon lead to complex defect mechanisms, hence the actual behavior of faulty chips and designs becomes not only difficult to predict but also difficult to model /13/, /14/, /15/. Rising variations for instance lead to lower signalto-noise ratios, to complex defect mechanisms and indeterministic behavior. Stuck-at, delay, bridging and statistical fault models are used today in commercial tools. However there are strong efforts towards a fault model independent diagnosis. Defects can also be expressed In conditional stuckat faults as long as they result in a combinational malfunction. Figure 2 shows an example of a bridge or short. However like in design debug, a fault model which can describe all possible defect mechanisms is unknown. Fault models must be determined by the diagnosis algorithm itself to describe the defect mechanisms. As an additional precondition for diagnostic algorithms, a high diagnostic resolution has to be provided to find exact defect mechanisms to guide the physical inspection accurately. Faulty circuit: Model: A -- A -- "h Condition: B=1 s@1 2.4 Manufacturing "Time-to-volume" and "time-to-market" are essential for the economic success of a product. "Yield ramping" is a traditional application area of diagnosis as it is used to find yield limlters. Modern manufacturing processes strongly interact with the design characteristics. This necessitates yield learning for each new design. Adapting process and product requires analysis of root causes for failures and outliers /16/, /17/. The extracted knowledge is used to support yield ramping and yield learning in advanced process technologies by improving design for manufacturabillty /16/. Prior to expensive diagnosis and physical failure analysis, spot defects must be ruled out by volume diagnosis. In volume diagnosis, test data of a large number of failing chips are recorded and analyzed to find yieldlimiting systematic defects and design issues. Diagnostic data from a single chip is not sufficient since systematic problems need to be differentiated from sporadic random defects. First attempts to establish standards in volume diagnosis have been made /18/. Research in this area is quite mature, nevertheless with growing design complexity again new problems arise. Complex designs need more patterns to test and testing time is a crucial cost factor. Additionally in modern designs many cores are deeply embedded and test access is a severe problem. The test-solution developed aiming at this issue is built-in self test (BIST). BIST reduces traffic and helps cutting testing time, and many chips can be tested in parallel on one tester. However, classic BIST infrastructures may limit the visibility from outside and gathering diagnostic data may become more difficult. Often, only very limited diagnostic information is available like the number of the first failing pattern. 2.5 Support Even after successful manufacturing, diagnostic techniques are needed to detect and locate defective modules /19/ before repair. As customer satisfaction and warranties are a strong economic factor, the diagnoslsinfrastructure of a silicon product facilitating diagnosis in field is also of great importance. Fig. 2: Example of a conditional stuck-at fault modeling a resistive bridge On a chip, there can be faults in combinational logic, in scan chains or in the clock tree. Finding possible defect locations in random logic based on the observed behavior of the chip is called logic diagnosis. Logic diagnosis together with scan chain diagnosis and interconnect (bus, network) diagnosis forms the precision diagnosis. 3 Logic debug and diagnosis Logic debug and diagnosis is concerned with finding the most reasonable root causes within a random logic network that explain the failing flip-flops of this circuit as good as possible. This circuit can either be a design containing errors or a core on a chip with defects. The only difference is that the root causes are induced by different defect or error mechanisms. The traditional way to tackle these root causes first to create simple fault and error models to cut on the complexity of the problem, and then to develop 237 Informacije MIDEM 37(2007)4, str. 235-243 H.-J. Wunderlich, M. Elm, S. Hoist: Debug and Diagnosis: Mastering the Life Cycle of Nano-scale Systems on Chip special debug and diagnosis algorithms for each of these models. Due to rising complexity of possible defect mechanisms, new approaches are currently explored which are not restricted to a specific fault or error model. Such algorithms are based on the observation, that simple, local defect mechanisms are more reasonable root causes than complex, distributed ones. An easy metric for resonability is provided by the number of conditional stuck-at faults needed to explain all failures. This observation holds for both, design errors and physical defects. Therefore fault model independent approaches are suitable for both, design debug and logic diagnosis. Design debug and logic diagnosis have the common goal of not only deriving possible root causes but also to keep the number of suspects as low as possible: The lower the number of returned suspects, the higher the achieved diagnostic resolution. The applied test set itself determines the achievable resolution by the number of faults which cannot be distinguished any further /20/, /21 /, /22/. The effectiveness of pattern response analysis algorithms is evaluated by comparing the achieved diagnostic resolution to the resolution of the test set. Pattern response analysis algorithms are divided into cause-effect and effect-cause approaches. These two fundamental paradigms will be discussed in the next two subsections. Most debug and diagnosis methods employ at least one of these approaches and some even combine pattern analysis with diagnostic ATPG to provide maximum diagnostic resolution. This concept is called adaptive diagnosis and is covered in the third subsection. 3.1 Cause-Effect Analysis In cause-effect analysis, a fault model is chosen to enumerate all possible root causes in a circuit. Fault simulation is performed on each fault in the model, and the behavior is matched with the failing responses observed /23/, /24/. To cut on simulation time, the erroneous output for each fault and each pattern is stored in a dictionary /25/ but depending on the complexity of the chosen fault model and the size of the circuit, such a dictionary may explode. Significant research effort has been spent for reducing the size of fault dictionaries /26/, /27/. The size can be reduced by omitting the erroneous output and storing only pass-fail information for each pair or by limiting the diagnostic resolution of the dictionary and performing fault simulation for each case to distinguish the remaining candidates /28/. Dictionary based cause-effect approaches today can handle industrial-sized designs /29/ but the main drawback— the dependency on simplistic fault models like stuck-at or bridges—remain. However, some advanced methods still use cause-effect analysis as a final stage in the diagnosis process to improve diagnostic resolution. 3.2 Effect-Cause Analysis In effect-cause analysis, possible defect locations are derived directly from the observed failing outputs by taking the logic structure of the circuits Into account /30/, /31/. This approach does not depend on the enumeration of all possible faults, thus it can be used to implement fault model independent diagnosis. As mentioned above, such algorithms assume a certain locality of the root cause. The most simple effect-cause algorithms rely on the strongest locality possible: the so-called single fault assumption or single fix condition. This assumption states, that there is a single signal within the circuit which value needs to be altered to explain all falling patterns. Based on this, algorithms were proposed which are based on the intersection of input cones of failing outputs /32/ or backtrace critical paths from failing outputs to focus on delay faults /33/. After finding such a signal in an erroneous design, its logic behavior can be extracted and rectified /34/. The 'Single Location At a Time' (SLAT) approach introduced by /35/, /36/ relaxes the single fault assumption. This approach determines for each pattern single stuck-at faults that can explain the failing response by fault simulation. Those explaining faults can be different for each failing pattern and are used to derive more complex faults. Hence SLAT is a fault model independent approach which merely uses the stuck-at fault model in fault simulation to localize the suspicious region of the circuit. The main drawback of the SLAT paradigm is the fact that information for fault location is only extracted from patterns which fulfill the single fix condition. All the other patterns are not taken into account, neither failing nor passing ones. To overcome this limitation, many algorithms work in two passes: First, a fast effect-cause analysis like SLAT is performed to constrain the circuits region where possible culprits may be located. Second, for each of the possible fault sites, a cause-effect simulation is performed for identifying those faults, which match the real observed behavior/23/, /24/. 3.3 Adaptive Diagnosis There is no concise test set which provides the best resolution for every possible faulty behavior. Since the maximum achievable diagnostic resolution is determined by the test set, many approaches already employ diagnostic or focused ATPG to distinguish remaining suspects or extracting defective behavior completely /23/, /34/. By integrating pattern generation more tightly into the whole diagnosis process, fault location can be even more powerful. This general idea of alternating pattern analysis and pattern generation steps is called adaptive diagnosis /37/. Here, faulty and fault free responses are used in order to guide the automatic generation of new patterns for increasing the resolution. A pattern analysis step extracts informa- 238 H .-J. Wunderlich, M. Elm, S. Hoist: Debug and Diagnosis: Mastering the Lite Cycle of Nano-scale Systems on Chip Informacije MIDEM 37(2007)4, str. 235-243 tion from responses of the DUD and accumulates them in a knowledge base. This knowledge in turn guides an automatic test pattern generator (ATPG) to generate relevant patterns for achieving high diagnostic resolution. The loop ends, when an acceptable diagnostic resolution is reached (Fig. 3). The definition of the exact abort criterion depends on the number and confidence levels of fault candidates. done Fig. 3: Adaptive diagnosis flow One way to implement such an adaptive diagnosis flow is by the generalization of the SLAT paradigm. Where SLAT only considers perfect matches for each pattern, a measure can be defined to quantify how well a stuck-at signal explains a response of the circuit under diagnosis /38/. Let FM(f) be a fault machine, i.e. the circuit with stuck-at fault f injected. For each test pattern t e T, the evidence e(f, t) = (Act, Ait, Ait, Ayt) is defined as as tuple of numbers where Act is the number of failing outputs f can explain, Ait is the number of additional failures f induces, Axt is the number of failing outputs not explained by f (see fig. 4), and Ayt is the minimum of Act and Ait. A failing pattern t which is completely explained by a stuck-at signal f will lead to evidence e(f, t) = (Act > 0, 0, 0, 0). So the SLAT approach is only a special case in this notation. Note, that Act will be maximum for all evidences of t. Fig. 4: Definition of evidence e(f, t) = (Act, Ait, Art, Ayt) The evidence of a fault f and a test set T is simply the sum of all evidences for f: e(/,jT) = (cry, it, rT,yr), with <7t = A(7f, I'T = Ait, t£T ter tt = Art and 7T = ^ A-yt. ter ter If Ac was maximum for a stuck-at signal f and each t e T, c is also maximum. In addition, a candidate Is more suspicious if it causes less additional failures in places where the observed response shows the correct values. So the ranking is derived by sorting evidences first by c and then by 1 . Table 1 provides an example of such a ranking. Table 1: A ranking with fi as the best candidate. stuck-at sig. crT 'T tt Tr /1 42 0 0 0 /2 42 35 0 0 /3 42 35 0 0 /1 42 35 0 0 h 42 38 0 0 /0 23 22 19 0 /7 23 23 19 0 This ranking shows, that f is the only stuck-at signal, which can explain every observed failure and induces no additional ones. Hence, all pattern responses analyzed so far can be explained by this single stuckat fault in the circuit under diagnosis. The failures of stuck-at signal f is a proper superset of the observed failures because c is maxi-mumandi is positive. Moreover since y is 0 for this stuck-at signal, f explains all failing pattern t e Tf c T completely (Ac maximum, Ai = 0) but not every passing pattern (Ac = Ax = 0 and Ai > 0 for some te Tpc T). This leads to the only conclusion, that f can explain all the responses as a conditional stuck-at fault. Table 2 shows suspect evidences for some classic models. If 1 , x and y are all zero, a single stuckat fault explains the DUD behavior completely. With 1 =y = 0, such a stuck-at fault explains a subset of all fails, but some other faulty behavior is present in the DUD. If x and y are zero, a faulty value on a single signal line under some patterns T'cT provides complete explanation. With only y = 0, a faulty value on the corresponding single signal line explains only a part of DUD behavior. If only x is zero, the suspect fails are a superset of DUD falls. If all suspects show positive values In all components 1 , x , y , all simplistic fault models would fail to explain the DUD behavior. Table 2: Fault models and evidence forms for e(f, T) with 0 classic model iT rT 7 T single stuck-at 0 0 0 stuck-at, multiple fault sites 0 > 0 0 single conditional stuck-at > 0 0 0 cond. stuck-at, multiple fault sites > 0 > 0 0 delay fault, i.e. long paths fail > 0 0 > 0 239 Informacije MIDEM 37(2007)4, str. 235-243 H.-J. Wunderlich, M. Elm, S. Hoist: Debug and Diagnosis: Mastering the Life Cycle of Nano-scale Systems on Chip By simple iteration over the ranking, pairs of suspects fa, fb are identified with equal evidences e(fa, T) = e(fb, T). In table 1, the stuck-at signals f2, f3 and f4 are not distinguished yet. To improve the ranking, fault distinguishing patterns are generated /20/, /21/ and applied to the circuit. During analysis of these responses, different values will be added to the evidences under consideration and the ranking will improve. However, this may also introduce other sets of equal evidences, the approach iterates until all remaining pairs of equal evidences can not be distinguised by diagnostic ATPG. To reduce the number of suspects and the region under consideration further, diagnostic pattern generation algorithms have to be employed which exploit layout data /23/. This generalization of SLAT provides a consistent, single-pass adaptive diagnosis algorithm which extracts evidence from every pattern and the diagnostic results are very encouraging /38/. The consideration of every pattern is important especially if there is only limited failure information available. 4 Design for debug and diagnosis Diagnostic capabilities are needed during the whole life cycle of a system /9/. Besides the techniques and algorithms to find the actual locations of faults or defects there is the need to provide access to the internal states of a device and to record and evaluate diagnostic data. The fulfillment of this tasks is done by Design for Debug and Diagnosis (DDD). Two major problems have to be overcome by DDD: 1) Diagnostic data may reach an enormous bandwidth, due to the requirement of high diagnostic resolution. 2) The diagnosis or debug of so-called hard-to detect faults requires long observation periods. Generally the problem complexity can be broken down by the same means as applied for design-for-testability: scan-design to provide access to internal states, compression and compaction to reduce the data-volume. To guarantee the correctness of a scanned out diagnosis response, the diagnostic equipement has to be faultfree. The diagnosis of shift-registers respectively scanchains is nowadays quite mature /39/, /40/, /41/, /42/. Figure 5 shows the embedded test equipement reused for diagnosis and a schematic of volume diagnosis reusing the multi-site test structure. Compaction of diagnosis responses on the other hand is and will be a major research topic. 4.1 Compaction Techniques Special precautions are necessary to gain more valuable information while keeping the traffic as low as possible. As detailed knowledge on the diagnosis responses is not available, compaction techniques have to be applied to reduce the amount of necessary tester channels on the outputs of the circuit. Compactors can be classified due to different properties. The simplest classification separates timeand space compactors. Space compactors reduce the amount of output channels of a circuit by employing parity trees. Thus space compactors preserve the length of a test response. Time compactors reduce the length of a response vector by compaction of several shift-out cycles and employing memory cells. Combinations of time- and space compactors consisting of a space compaction stage and attached to this a time compaction stage can also be employed to reduce both, response length and width. Compaction may discard valuable information for diagnosis and may reduce diagnostic resolution. Unknown values in the response vectors, caused by buses or uninitialized logic, may additionally cause faultcancellation and -aliasing after compaction. Special compactors were proposed to preserve diagnostic resolution and capability of X-tolerance. Parity check matrices of error correcting codes were employed to con- Fig. 5: Multi-Site Test 240 H.-J. Wunderlich, M. Elm, S. Hoist: Debug and Diagnosis: Mastering the Life Cycle of Nano-scale Systems on Chip Informacije MIDEM 37(2007)4, str. 235-243 struct space compactors able to tolerate a certain amount of X-states and able to detect and locate a certain amount of errors. The first approach implementing this was proposed in /43/. Nowadays a large variety of extensions to this approach and similar approaches is available. A popular representative is the X-Compact compactor proposed in /44/. Besides the pure employment of coding theory, the interaction of compactor design with the ATPG was proposed. I-Compact for instance /45/ first employs coding techniques to gain X-tolerance and further enhances X-toler-ance by storing all possible X-positions in addition to the fault-free response vectors calculated during ATPG. By reusing the parity check matrix this can be done very space-efficiently. A different approach uses the ATPG to determine scan-chains, which have to be switched off by a selection logic on scan-out. This can be used on the one hand to enable error propagation for any error /46/ and in more recent work to allow for X-tolerant compaction /47/, /48/. The different compactor designs are already integrated in commercial tools. Despite all the enhancements in compaction the problem of error-masking is not solved completely and will increase with growing circuits. Application of coding theory and the interaction with ATPG will reach its limits with growing cicuits and the demand of fault-model independent or adaptive diagnosis. 4.2 Trace Buffers Contrary to the scan-design method in prototypes, forsys-tem-level debug and silicon debug (fault location before destructive probing) a different approach is often applied. Trace buffers are an on-chip instrumentation supporting at-speed sampling and a low bandwidth connection to external debug software which for instance uses a JTAG interface /49/. This approach was influenced by software debugging used in embedded systems /50/. Trace buffers can be classified in special purpose trace buffers designed for a special architecture-e.g. /51/—or generic trace buffers applicable to any SoC /52/. In contrast to scan-chains trace buffers monitor only a subset of internal signals. They are implemented on chip using the available memory to store failing pattern responses. This is area efficient on the one hand, but affects diagnostic respectively debug capabilities on the other hand as the buffer's size limits the observation window. In consequence the window might be too small to locate faults manifesting themselves only after a long execution time. One of the most recent approaches to overcome this problem was proposed in /53/. In cases where the debug experiment can be repeated a cyclic debugging used to zoom into the interesting intervals is employed. 5 Conclusion Today's challenges in diagnosis and debug can be seen in two different areas. First shrinking structures may cause unpredictable circuit behavior. This fact requires diagnosis algorithms and test pattern generation independent of an underlying fault model to enable reliable test and diagnosis. In this paper an overview of the existing methods meeting these requirements was presented. Second the growing complexity of circuits makes access to and transfer of internal states for debug and diagnosis more complicated. Basically, test equipment like scan chains and compactors can be reused to overcome this problem, but special care has to be taken to keep a high diagnostic resolution. Here we gave an overview of compaction approaches and debug facilities aiming at this issue. 6 Acknowledgement This work has been funded by the DFG under contract WU 245/4-1. References /1/ K. C. Chen, "Assertion-based verification for SoC designs," in Proceedings 5th International Conference on ASIC, Vol. 1, 2003, pp. 12-15. /2/ R. Klein and T. Piekarz, "Accelerating functional simulation for processor based designs," Mentor Graphics Corporation, white paper, 2005. /3/ T. Fitzpatrick, "Realizing advanced functional verification with questa," Mentor Graphics Corporation, white paper, 2005. /4/ K. Roy, T. M. Mak, and K.-T. T. Cheng, "Test consideration for nanometer-scale cmos circuits." IEEE Design &Test of Computers, vol. 23, no. 2, pp. 128-136, 2006. /5/ K.-T. Cheng, S.-Y. Huang, and W.-J. Dai, "Faultemulation: Anew methodology for fault grading." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 10, pp. 1487-1495, 1999. /6/ R. Ludewig, T. Hollstein, F. Sch"utz, and M. Glesner, "Architecture for testing and debugging of system-on-chip components." in IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems (DDECS) 2004, Star'aLesn 'a, Slovakia, April 18-21, 2004, 2004, pp. 91-98. /7/ M. Riley, N. Chelstrom, M. Genden, and S. Sawamura, "Debug of the CELL processor: Moving the lab into silicon," in Proceedings IEEE International Test Conference 2006, Santa Clara, CA, USA, October 24-26, 2006, 2006, p. 26.1. /8/ T. Arnaout, G. Bartsch, and H.-J. Wunderlich, "Some common aspects of design validation, debug and diagnosis," in Third IEEE International Workshop on Electronic Design, Test and Applications (DELTA 2006), 17-19 January 2006, Kuala Lumpur, Malaysia, 2006, pp. 3-10. /9/ H.-J. Wunderlich, "From embedded test to embedded diagnosis," in Proceedings European Test Symposium, Tallin, Estonia, 2005, pp. 216-221. /10/ O. E. Cornelia and V. K. Agarwal, Conditional Stuck-at Fault Model for PLATest Generation. VLSI Design Laboratory, McGill University, 1989. 241 Informacije MIDEM 37(2007)4, str. 235-243 H.-J. Wunderlich, M. Elm, S. Hoist: Debug and Diagnosis: Mastering the Life Cycle of Nano-scale Systems on Chip /11/ J. W. McPherson, "Reliability challenges for 45nm and beyond." in Proceedings of the 43rd Design Automation Conference, DAC 2006, San Francisco, CA, USA, July 24-28, 2006, 2006, pp. 176-181. /12/ S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzl, and V. De, "Parameter variations and Impact on circuits and microarchitecture." in Proceedings of Design Automation Conference 2003, Anaheim, USA, June 2-6 2003, pp. 338-342. /13/ A. Krstic, L.-C. Wang, K.-T. Cheng, J.-J. Liou, and M. S. Abadir, "Delay defect diagnosis based upon statistical timing models -the first step." in 2003 Design, Automation and Test in Europe Conference and Exposition (DATE 2003), 3-7 March 2003, Munich, Germany, 2003, pp. 10 328-10 335. /14/ C. L. Henderson and J. M. Soden, "Signature analysis for ic diagnosis and failure analysis." In Proceedings IEEE International Test Conference 1997, Washington, DC, USA, November 3-5, 1997, 1997, pp. 310-318. /15/ D. B. Lavo, B. Chess, T. Larrabee, and I. Hartanto, "Probabilistic mixed-model fault diagnosis." in Proceedings IEEE International Test Conference 1998, Washington, DC, USA, October 18-22, 1998, 1998, pp. 1084-1093. /16/ C. Hora, R. Segers, S. Elchenberger, and M. Lousberg, "An effective diagnosis method to support yield improvement." In Proceedings IEEE International Test Conference 2002, Baltimore, MD, USA, October 7-10, 2002, 2002, pp. 260-269. /17/ —, "On a statistical fault diagnosis approach enabling fast yield ramp-up." in IEEE European Test Workshop 2002, Corfu, Greece, May 26-29, 2002, 2002. /18/ A. Leinlnger, A. Khoche, M. Fischer, N. Tamarapalll, W.-T. Cheng, R. Klingenberg, and W. Yang, "The next step in volume scan diagnosis: Standard fail data format." in Asian Test Symposium 2006, Nov. 2006, Fukuoka, Japan, pp. 360-368. /19/ M. Abramovici, J. Emmert, and C. Stroud, "Roving stars: an integrated approach to on-line testing, diagnosis, and fault tolerance for fpgas in adaptive computing systems." In Proceedings. The Third NASA/UoD Workshop on Evolvable Hardware, July 12-14 2001, Long Beach, CA, USA, 2001, pp. 73-92. /20/ A. G. Veneris, R. Chang, M. S. Abadir, and M. Amiri, "Fault equivalence and diagnostic test generation using atpg." In Proceedings IEEE International Symposium on Circuits and Systems, 2004, 2004, pp. 221-224. /21/ T. Bartensteln, "Fault distinguishing pattern generation." In Proceedings IEEE International Test Conference 2000, Atlantic City, NJ, USA, October 2000, 2000, pp. 820-828. /22/ N. K. Bhatti and R. S. Blanton, "Diagnostic test generation for arbitrary faults." In Proceedings IEEE International Test Conference 2006, Santaclara, CA, USA, October24-26, 2006, 2006, p. 19.2. /23/ R. Desinenl, O. Poku, and R. D. S. Blanton, "A logic diagnosis methodology for Improved localization and extraction of accurate defect behavior," In Proceedings IEEE International Test Conference 2006, Santaclara, CA, USA, October24-26, 2006, 2006, p. 12.3. /24/ M. E. Amyeen, D. Nayak, and S. Venkataraman, "Improving precision using mixed-level fault diagnosis," in Proceedings IEEE International Test Conference 2006, Santa Clara, CA, USA, October 24-26, 2006, 2006, p. 22.3. /25/ I. Pomeranz and S. M. Reddy, "On the generation of small dictionaries for fault location," in IEEE/ACM International Conference on Computer-Alded Design, ICCAD92, November 8-12, 1992, Santaclara, CA, USA, 1992, pp. 272-279. /26/ V. Boppana, I. Hartanto, and W. K, Fuohs, "Full fault dictionary storage based on labeled tree encoding," In 14th IEEE VLSI Test Symposium (VTS'96), April 28 - May 1, 1996, Princeton, NJ, USA, 1996, pp. 174-179. /27/ B. Chess and T. Larrabee, "Creating small fault dictionaries," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 3, pp. 346-356, Mar 1999. /28/ P. G. Ryan, S. Rawat, and W. K. Fuchs, "Two-stage fault location." in Proceedings IEEE International Test Conference 1991, Test: Faster, Better, Sooner, Nashville, TN, USA, October 26-30, 1991, 1991, pp. 963-968. /29/ "Faloc reference manual 3.9," NXP Semiconductors, 2004. /30/ M. Abramovici and M. A. Breuer, "Fault diagnosis based on effect-cause analysis: An Introduction," in 17th Conference on Design Automation, June 1980, 1980, pp. 69-76. /31/ J. A. Waicukauski and E. Lindbloom, "Failure diagnosis of structured VLSI," IEEE Design & Test of Computers, vol. 6, no. 4, pp. 49-60, Aug 1989. /32/ S. Venkataraman and S. B. Drummonds, "Poirot: a logic fault diagnosis tool and its applications." in Proceedings IEEE International Test Conference 2000, Atlantic City, NJ, USA, October 2000, 2000, pp. 253-262. /33/ A. Rousset, A. Bosio, P. Girard, C. Landrault, S. Pravossoudo-vltch, and A. Vlrazel, "Derric: A tool for unified logic diagnosis." In 12th European Test Symposium (ETS 2007), 20 May 2007, Freiburg, Germany, 2007, pp. 13-20. /34/ R. Ubar, "Design error diagnosis with resynthesis in combinational circuits," Journal of Electronic Testing: Theory and Applications, vol. 19, pp. 73-82, 2003. /35/ T. Bartenstein, D. Heaberlin, L. M. Huisman, and D. Sliwlnski, "Diagnosing combinational logic designs using the single location at-a-time (SLAT) paradigm." In Proceedings IEEE International Test Conference 2001, Baltimore, MD, USA, 30 October - 1 November 2001, 2001, pp. 287-296. /36/ L. M. Huisman, "Diagnosing arbitrary defects In logic designs using single location at a time (SLAT)," IEEE Transactions on Computer-Alded Design of Integrated Circuits and Systems, vol. 23, no. 1, pp. 91 -101, January 2004. /37/ Y. Gong and S. Chakravarty, "On adaptive diagnostic test generation," In Proceedings IEEE International Conference on Computer-Alded Design, November 1995, 1995, p. 181. /38/ S. Hoist and H.-J. Wunderlich, "Adaptive debug and diagnosis without fault dictionaries." In 12th European Test Symposium (ETS 2007), 20 May 2007, Freiburg, Germany, 2007, pp. 7-12. /39/ J. L. Schafer, F. A. Policastrl, and R. J. McNulty, "Partnersrls for Improved shift register diagnostics." in Digest of Papers. 10th Anniversary IEEE VLSI Test Symposium, 7-9 April 1992, Atlantic City, NJ, USA, 1992, pp. 198-201. /40/ S. Kundu, "On diagnosis of faults In a scan-chain." in Digest of Papers., Eleventh Annual IEEE VLSI Test Symposium, 6-8 April 1993, Atlantic City, NJ, USA, 1993, pp. 303-308. /41/ J. C.-M. Li, "Diagnosis of multiple hold-time and setuptime faults in scan chains." IEEE Transactions on Computers, vol. 54, no. 11, pp. 1467-1472, 2005. /42/ J.-S. Yang and S.-Y. Huang, "Quick scan chain diagnosis using signal profiling." in Proceedings IEEE International Conference on Computer Design: VLSI in Computers and Processors, ICCD 2005, 2-5 Oct. 2005, 2005, pp. 157-160. /43/ K. K. Saluja and M. Karpovsky, "Testing computer hard-ware through data compression in space and time." in Proceedings IEEE International Test Conference (ITC), 1983, 1983, p. 8389. /44/ S. Mitra and K. S. Kim, "X-compact: an efficient response compaction technique for test cost reduction," In Proceedings on International Test Conference, 7-10 Oct. 2002, 2002, pp. 311-320. /45/ J. Patel, S. Lumetta, and S. Reddy, "Application of salujakar-povsky compactors to test responses with many unknowns." In Proceedings on 21 st VLSITest Symposium, 27 April-1 May 2003, 2003, pp. 107-112. /46/ A. Morosov, K. Chakrabarty, M. Gossel, and B. Bhattacharya, "Design of parameterizable error-propagating space compactors for response observation." in Proceedings on 19th IEEE VLSI Test Symposium, VTS 2001, 29 April-3 May 2001, Marina Del Rey, CA, USA, 2001, pp. 48-53. 242 H.-J. Wunderlich, M. Elm, S. Hoist: Debug and Diagnosis: Mastering the Life Cycle of Nano-scale Systems on Chip Informacije MIDEM 37(2007)4, str. 235-243 /47/ T. Clouqueur, H. Fujiwara, and K. Saluja, "A class of linear space compactors for enhanced diagnostic." 2005, pp. 260-265. /48/ W.-T. Cheng, M. Kassab, G. Mrugalski, N. Mukherjee, J. Rajski, and J. Tyszer, "X-press compactor for "lOOOx reduction of test data." in IEEE International Test Conference, ITC '06, Oct. 2006, Santa Clara, CA, USA, 2006, pp. 1-10. /49/ "1149.1-1990, IEEE standard test access port and boundary -scan architecture." IEEE Computer Society, 1990. /50/ C. MacNamee and D. Heffernan, "Emerging on-ship debugging techniques for real-time embedded systems." Computing & Control Engineering Journal, vol. 11, no. 6, pp. 295-303, 2000. /51/ H. Vranken, "Debug facilities in the trimediacpu64 architecture." in Proceedings of European Test Workshop 1999, 25-28 May 1999, Constance, Germany, 1999, pp. 76-81. /52/ A. Hopkins and K. McDonald-Maier, "Debug support strategy for systems-on-chips with multiple processor cores." IEEE Transactions on Computers, vol. 55, no. 2, pp. 174-184, 2006. /53/ E. Anisand N. Nicolici, "Low cost debug architecture using lossy compression for silicon debug." in Proceedings 2007 Design, Automation and Test in Europe (DATE '07), 16-20 April 2007, Nice, France, 2007, pp. 225-230. Hans-Joachim Wunderlich, Melanie Elm, Stefan Holst Institut für Technische Informatik, Universit"at Stuttgart, Pfaffenwaldring 47; D-70569 Stuttgart, Germany email: (wu, elm, holst}@iti.uni-stuttgart.de Prispelo (Arrived): 15.07.2007 Sprejeto (Accepted): 01.09.2007 243 Informacije MIDEM 37(2007)4, Ljubljana 43. Mednarodna konferenca o mikroelektroniki, elektronskih sestavnih delih in materialih - MIDEM 2007 ZAKLJUČNO POROČILO 43nd International Conference on Microelectronics, Devices and Materials - MIDEM 2007 CONFERENCE REPORT 12.09. 2007 - 14.09. 2007, Hotel ASTORIA , Bled, Slovenija Triinštirideseta mednarodna konferenca o mikroelektroniki, elektronskih sestavnih delih in materialih - MIDEM 2007 (43rd International Conference on Microelectronics, Devices and Materials) nadaljuje uspešno tradicijo mednarodnih konferenc MIDEM, ki jih vsako leto prireja MIDEM-Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale. Na konferenci je bilo predstavljeno 45 rednih in 6 vabljenih predavanj v petih sekcijah in delavnici na temo Testiranje v elektroniki. Na konferenci so bili predstavljeni najnovejši dosežki na naslednjih področjih: Fizika elektronskih elementov, modeliranje in tehnologija Debeli in tanki filmi Elektronika Testiranje v elektroniki Optoelektronlka Integrirana vezja Letos je bila v okviru konference že desetič zapored organizirana enodnevna delavnica, tokrat na temo Testiranje v elektroniki, ki jo je letos organiziral Odsek za računalniške sisteme IJS. Na delavnici je 5 vabljenih predavateljev predstavilo nekatere najnovejše dosežke s področja testiranja kompleksnih elektronskih sistemov in naprav v luči zniževanja stroškov in povečevanja učinkovitosti testiranja. Teme delavnice so med drugimi bile bile : testiranje, odkrivanje napak, diagnostika nanosistemov, visokotemper-aturno testiranje SoC sistemov, testiranje sistemov v enem ohišju, testiranje MEMS in testiranje analogno digitalnih sistemov .... Letos so bili podani naslednji vabljeni referati: S.Hellebrand, C. G.Zoellin, H. J.Wunderlich, S.Ludwig, T.Coym, B.Straube Universität Paderborn (D) Testing and monitoring nanoscale systems - Challenges and strategies for advanced quality assurance Z.Peng, Z.He, P.EIes Linköping University (S) Challenges and solutions for thermal-aware SoC testing P.Cauvetl, S.Bernard, M.Renovell LIRMM (F) Design & test of system-in-package H. J.Wunderlich, M.Elm, S.Holst Universität Stuttgart (D) Debug and diagnosis: mastering the life cycle of nanoscale systems on chip Paolo Prinetto Politécnico di Torino (I) Testing of SoC Systems Drago Strle FE Ljubljana MEMS Inertial Systems Pred konferenco je bil Izdelan zbornik referatov v obsegu 25 ap (približno 400 strani), ki je podobno urejen kot prejšnja leta. Nekaj statističnih podatkov: Število udeležencev: 60, iz tujine 12 Število referatov v zborniku: 51, Iz tujine 10 244 Informacije MIDEM 37(2007)4, Ljubljana Konferenca MIDEM 2007 - seznam udeležencev ST. PRIIMEK iN IME INSTITUCIJA NASLOV 1 TRONTELJ JANEZ Jr. LABORATORIJ ZA MIKROELEKTRONIKO, FE-LJ Tržaška 25, 1000 Lj. 2 RAIC DUŠAN LABORATORIJ ZA MIKROELEKTRONIKO, FE-LJ Tržaška 25, 1000 Lf 3 GEHRKE NICO JOŽEF STEFAN INSTITUTE, LJ Jamova 39, 1000 Lj. 4 HERMAN MATIC FEE.LJ Tržaška 25, 1000 Lj. 5 KRAUEL ANNE-KATHRIN JOŽEF STEFAN INSTITUTE, LJ Jamova 39, 1000 Lj. 6 JOZENKOWTOMASZ INSTITUT JOŽEF STEFAN, LJ Jamova 39, 1000 Lj. 7 NUCIC JANEZ IJS, LJ Jamova 39, 1000 Lj. 8 SMID BLAZ LABORATORIJ ZA MIKROELEKTRONIKO, FE-LJ Tržaška 25, 1000 Lj. 9 SESEK ALEKSANDER LABORATORIJ ZA MIKROELEKTRONIKO, FE-LJ Tržaška 25, 1000 Lj. 10 PLETERSEK ANTON IDS d.o.o. Tržaška 25, 1000 Lj. 11 STARAŠINIČ SLAVKO IDS d.o.o. Sojerjeva ulica 63, 1000 Lj. 12 RIBNIKAR ROK LABORATORIJ ZA MIKROELEKTRONIKO, FE-LJ Tržaška 25, 1000 Lj. 13 SVIGELJ ANDREJ LABORATORIJ ZA MIKROELEKTRONIKO, FE-LJ Tržaška 25, 1000 Lj. 14 PODRZAJ JURIJ LABORATORIJ ZA MIKROELEKTRONIKO, FE-LJ Tržaška 25, 1000 L|. 15 BERGINC MARKO FEE, LJ Tržaška 25, 1000 Lj. 16 LIPOVSEK BENJAMIN FEE, LJ Tržaška 25, 1000 Lj. 17 NERAT MARKO FEE, LJ Tržaška 25, 1000 Lj. 18 ZEMVAANDREJ FEE, LJ Tržaška 25, 1000 Lj. 19 PUHAR PRIMOŽ LEA, d.o.o. Finžgarjeva ulica 1 A, 4248 Lesce 20 PAVLIN MARKO HYB PROIZVODNJA HIBRIDNIH VEZIJ d.o.o. Trubarjeva 7, 8310 Šentjernej 21 GRAMC JANEZ HYB PROIZVODNJA HIBRIDNIH VEZIJ d.o.o. Trubarjeva 7, 8310 Šentjernej 22 PERNE JOŽEF ZAVOD TC SEMTO Stegne 25, 1000 Lj. 23 WUNDERLICH HANS-JOACHIM UNIVERSITY OF STUTTGART Nemčija 24 UBAR RAIMUND TALLINN UNIV. Raja 15, 13518 Tallinn, Estonia 25 WEGRZYN MARIUSZ IJS, LJ Jamova 39, 1000 Lj. 26 DUGONIK BOGDAN FERI MARIBOR Smetanova 17, 2000 MB 27 SELIGER BOGDAN METREL d.d. Ljubljanska 77, 1354 Horjul 28 PENG ZEBO ESLAB, LINKOPING UNIVERSITY Linkoping, SE 58183, Sweden 29 RENOVELL MICHEL LIRMM, FRANCE 34395 FRANCE 30 MRAK PETER GORENJE d.d. Velenje 31 TOPIČ MARKO FEE, LJ Tržaška 25, 1000 Lj. 32 MOZEK MATEJ FEE, LJ Tržaška 25, 1000 Lj. 33 ALJANCIC UROS FEE, LJ Tržaška 25, 1000 Lj. 34 BELAVIC DARKO HIPOT-RR d.o.o. Trubarjeva 7, 8310 Šentjernej 35 SANTO ZARNIK MARINA HIPOT-RR d.o.o. Trubarjeva 7, 8310 Šentjernej 36 CVIKL BRUNO FAKULTETA ZA GRADBENIŠTVO, UM Maribor 37 BILICHUK SERHIY CHERNIVTSI NATIONAL UNIVERSITY 58012 Ukraina 38 GORLEY PETER CHERNIVTSI NATIONAL UNIVERSITY 58012 Ukraina 39 BILJANOVIC PETAR FER, Zagreb Unska 3, 10000 Zagreb, Hrvatska 40 JUNKAR ITA IJS. LJ Jamova 39, 1000 Lj. 41 DRNOVŠEK BOŠTJAN FEE, LJ Tržaška 25, 1000 Lj. 42 VUKADINOVIČ MIŠO HYB PROIZVODNJA HIBRIDNIH VEZIJ d.o.o. Trubarjeva 7, 8310 Šentjernej 43 DRMOTA ANA KOLEKTOR NANOTESLA INSTITUT Stegne 29, 1000 Lj. 44 SVILIČIČ BORIS MARITIME FACULTY Studenska 2, Rijeka, HRVATSKA 45 DEVADZE SERGEI TALLINN UNIVERSITY OF TECHNOLOGY Raja 15, 13515 Tallinn, Estonia 46 ANHEIER WALTER UNIVERSITY OF BREMEN 25338 BREMEN, GERMANY 47 BRENKUŠJURAJ FEI STU llkovičova 3, 81219 Bratislava, Slovakia 48 PIRC MATIJA FEE, LJ Tržaška 25, 1000 Lj. 49 KURNIK JURIJ FEE, LJ Tržaška 25, 1000 Lj. 50 HROVAT MARKO IJS, LJ Jamova 39, 1000 Lj. 51 RESNIK DRAGO FEE, LJ Tržaška 25, 1000 Lj. 52 KOROSAK DEAN FAKULTETA ZA GRADBENIŠTVO, UM Smetanova 17, 2000 MB 53 KOŽELJ MATJAŽ IJS, LJ Jamova 39, 1000 Lf 54 AMON SLAVKO FEE, LJ Tržaška 25, 1000 L|. 55 PENIC SAMO FEE, LJ Tržaška 25, 1000 Lj. 56 VRTACNIK DANILO FEE, LJ Tržaška 25, 1000 Lj. 57 NOVAK ONDREJ CTU, PRAGUE