TESTING AND MONITORING NANOSCALE SYSTEMS -CHALLENGES AND STRATEGIES FOR ADVANCED QUALITY ASSURANCE Sybille Hellebrand\ Christian G. Zoellin^, Hans-Joachim Wunderlich^, Stefan Ludwig^, Torsten Coym®, Bernd Straube^ "'University of Paderborn, Germany ^University of Stuttgart, Germany ^Fraunhofer IIS-EAS Dresden, Germany INVITED PAPER MIDEM 2007 CONFERENCE - WORKSHOP ON ELECTRONIC TESTING 12.09. 2007 " 14.09. 2007, Bled, Slovenia Key words: nanoelectronic systems, soft errors, robust design, testing for quality assurance, singie event upset Abstract: Ttie increased number of fabrication defects, spatiai and temporal variability of parameters, as well as the growing impact of soft errors in nanoelectronic systems require a paradigm shift in design, verification and test. A robust design becomes mandatory to ensure dependable systems and acceptable yields. Design robustness, however, invalidates many traditional approaches for testing and implies enormous challenges. The RealTest Project addresses these problems for nanoscale CMOS and targets unified design and test strategies to support both a robust design and a coordinated quality assurance after manufacturing and during the lifetime of a system. The paper first gives a short overview of the research activities within the project and then focuses on a first result concerning soft errors in combinational logic. It will be shown that common electrical models for particle strikes in random logic have underestimated the effects on the system behavior. The refined model developed within the RealTest Project predicts about twice as many single events upsets (SEUs) caused by particle strikes as traditional models. Testiranje nanosistemov - izzivi in strategije zagotavljanja kakovosti Kjučne besede: nanoeiektronski sistemi, mehke napake, robustno načrtovanje, testiranje za zagotavljanje kakovosti, nenadni osamljeni dogodki Izvleček: Povečano število defektov pri izdelavi, prostorska in časovna spremenljivost parametrov, kakor tudi rastoč vpliv mehkih napak v nanoelektron-skih sistemih zahteva spremembe v njihovem načrtovanju in testiranju. Robustno načrtovanje postaja nujno za zagotavljanje delovanja in sprejemljivega izkoristka. Tako načrtovanje pa zavrača do sedaj mnoge tradicionalne pristope k testiranju in tako postavlja nove izzive. Projekt RealTest naslavlja opisane probleme pri CMOS nanosistemih in si za cilj zastavlja združeno načrtovanje in testno strategijo z namenom doseči robustno načrtan sistem, ki bo proizvodljiv z zagotovljeno kvaliteto. V prispevku opišemo raziskovalne aktivnosti v okviru tega projekta in se osredotočimo na prve rezultate glede mehkih napak pri kombinaoijski logiki. Pokažemo, da z novimi modeli, ki simulirajo nenadne osamljene dogodke, lahko bolje napovemo in simuliramo napake na nivoju celega sistema. 1 Introduction Continuously shrinking feature sizes offer a liigli potential for integrating more and more functionality into a single chip. However, technology scaling also comes along with completely new challenges for design and test. As in the past, manufacturing defects are still a major problem, and efficient test and diagnosis procedures are needed to detect and sort out failing devices. While "random" or "spot" defects, such as shorts or opens, have been the major concern so far, the scenario has changed in the nanoscale era. The increasing variability of transistors, the degradation of devices, as well as the increasing susceptibility to transient faults during system operation lead to massive reliability problems /5, 39/. One major reason for static parameter variations is sub-wavelength lithography. For nanoscale fabrication processes the wavelength used for lithography is greater than the size of the structures to be patterned. As in pictures with a low resolution, the resulting structures don't have exactly the intended contours. Even iftechniques for resolution enhancement (RET) such as optical proximity correction (OPC) are applied, these effects cannot be fully compensated /24/. A second source of static variability is the extremely small number of dopant atoms in the channel of a transistor. Al- * This work has been supported by the DFG-grant "RealTest". 212 though the concentration of dopant atoms in the channel remains more or less constant, the decreasing channel lengths lead to an exponential decrease of the number of dopant atoms with successive technology generations, and below 50 nm only tens of atoms are left. This implies that the "Law of Large Numbers" is no longer valid and disturbances in a few atoms already result in different electric characteristics of the transistors, as for example different threshold voltages. This phenomenon is also referred to as "random dopant fluctuations". Finally the varying power density in different components of a system is a reason for dynamic parameter variations. Extremely high switching activity in certain areas, e.g. the ALU in a microprocessor, may for example cause "hot spots" which in turn may result in voltage droops and supply voltage variations. During the lifetime of a chip, aging and degradation of devices can produce new permanent faults, which stay in the system. Transient faults or "soft errors", which affect the system operation for a short time and then disappear again, can be caused by a-particles emitted from the packaging material or by cosmic radiation. Traditionally, soft errors have only been considered for memories, because the more aggressive design rules for SRAM and DRAM arrays made them more susceptible to particle strikes. Meanwhile, a saturation of the soft error rate (SER) in memories can be observed, while the vulnerability of combinational logic and latches is increasing /2, 13/. To cope with these inevitable problems, a "robust" design will become mandatory not only for safety critical applications but also for standard products. On the one hand, a shift from deterministic to statistical design is necessary to deal with parameter variations /5, 39/. On the other hand, fault tolerance and soft error mitigation techniques are necessary to compensate a certain amount of errors /2, 13, 29, 34/. However, the changing design paradigms also require a paradigm shift in test. As "robust" systems are designed to compensate faults to a certain extent, it is no longer sufficient to classify chips into passing and failing chips. Instead, additional information about the remaining robust-ness of passing chips is required ("quality binning"). Furthermore, the "acceptable" behavior of a system may vary within a certain range, which is possibly application specific (e.g. accuracy or speed). Consequently, test development cannot only be based on classical measures such as fault coverage, but tests have to verify that modules fulfill their specifications including robustness properties. Additional problems arise, because traditional observ-ables such as boo are no longer reliable failure indicators. 2 The RealTest Project The problems explained above are addressed by the Real-Test Project, which targets unified design and test strategies supporting both a robust design and efficient test pro- cedures for manufacturing test as well as online test and fault tolerance. The project is a joint initiative of the Universities of Freiburg (Bernd Becker, Ilia Polian), Stuttgart (Hans-Joachim Wunderlich), and Paderborn (Sybille Hellebrand), and the Fraunhofer Institute of Integrated System Design and Design Automation Dresden (Bernd Straube) /4/. It is funded by the German National Science Foundation (DFG) and gets industrial support from Infineon Technologies, Neubiberg, and NXP Semiconductors Hamburg. In detail the research focus is on the following topics: Fault modeling. State monitoring in complex systems. Testing fault tolerant nanoscale systems, Modeling, verification and test of acceptable behavior The research activities are strongly dependent on each other. To design for example a robust system, which can compensate disturbances during system operation, a detailed analysis of possible defect and error mechanisms is indispensable. This analysis must take into account statistical variations of the circuit parameters and provide a statistical characterization of the resulting behavior. Depending on the results the appropriate design and fault tolerance strategies can be selected. Particular attention must be paid to flip-flops and latches, as they become the dominating components in random logic and are extremely vulnerable. As the known techniques for hardening flip-flops and latches are very costly, new efficient techniques for state monitoring are needed. The design strategy and the data obtained by the initial defect and error analysis determine the constraints for the test of the system. The cost for test and design can be reduced, if it is possible to identify critical and non-critical faults depending on the application. For example, a fault in a DVD player resulting in only a few faulty pixels at certain times is tolerable for the user and need not be considered. A precise and application specific model of the acceptable behavior of the system is the basis for this step. A short outline of the specific problems dealt with in each topic is given in the following subsections. 2.1 Fault modeling Defects, soft errors and parameter variations in future technologies cannot be accurately characterized by existing fault models. To be able to deal with the complex physical phenomena responsible for the circuit behavior, new fault models must be developed comprising, in particular, statistical profiles of circuit parameters and conditions for fault detection. This work is based on techniques for inductive fault analysis, which extract the behavior of defective layouts via the electrical level to higher levels of abstraction /18/. As classical approaches for inductive fault analysis do not take into account spatial and temporal variabilities, they must be extended accordingly. A first result concerning soft errors in combinational logic will be described in Section 3. 2.2 State monitoring in complex systems The percentage of flip-flops in logic components is rapidly growing, which is for example due to massive pipelining or speculative computing based on large register files. In particular, fault tolerant architectures rely on redundant structures and also work with an increased number of memory elements. Already today, circuits with more than a million flip-flops can be found both in data dominated and in control dominated designs /21/. Flip-flops are particularly susceptible to hard and soft errors, and, as it will be analyzed in more detail in Section 3, soft errors in the combinational logic also propagate to the system flip-flops with a higher probability than assumed so far. An additional problem appears in power aware designs, where clock gating is used to keep the system state for longer periods of time. Similar as the contents of memory arrays, the system state is then exposed to disturbances over longer time spans. Ensuring the correct system state is thus a problem of major importance. However, while online testing and monitoring of memory arrays is already state of art, respective techniques for logic circuitry are still in their infancy. Here the goal is to investigate monitoring techniques and reconfiguration strategies, which are suitable for both manufacturing and online test. In particular new and robust hardware structures for scan chains are under development. Similar as in memory arrays, the key issue is not to harden each single memory element but to partition the flip-flops into appropriate subsets, which can be monitored with the help of failure characteristics /14/. 2.3 Testing fault tolerant nanoscale systems On the one hand robust design styles are contradictory to traditional design for testability rules, as they decrease the observability of faults. On the other hand fault masking helps to increase yield. Consequently, a "go/nogo" test result is no longer satisfactory, instead information about the remaining robustness in the presence of faults is needed for quality binning. As classical fault tolerant architectures such as triple modular redundancy (TMR) are very costly to implement, they are still restricted to safety critical applications /36/. For other systems, less hardware intensive solutions are of particular interest. The research activities within the project therefore focus on self-checking designs, which are able to detect errors and initiate a recovering phase once an error has happened /33/. Typically self-checking systems aim to achieve the totally self-checking goal (TSC), i.e. to detect an error when it results in a wrong output for the first time. Strongly fault secure circuits e.g. achieve the TSC by guaranteeing for each fault either a test pattern or fault free operation even in the case of fault accumulation /40/. Design guidelines for strongly fault secure circuits are already given in /40/, more advanced techniques are described in /22/. In principle, tools for automatic test pattern generation (ATPG) can be used to both verify the self-checking properties of the design and to generate test patterns for manufacturing test. Clearly an ATPG tool can verify the existence of test patterns, and checking fault free operation in the presence of faults corresponds to the known problem of redundancy identification. However, there are several challenges, which are not yet addressed in state of the art tools. To deal with fault accumulation, the tools must be able to handle multiple faults efficiently. Furthermore, self-checking designs usually work with input and output encoding, and test patterns for online checking must be in the input code and result in a circuit response outside the output code. This requires ATPG with respective constraints. For manufacturing test, the fault model may be different from that for online checking. The interaction between both fault models must be analyzed, and a test set must be determined which can detect not only manufacturing defects but also reduced self-checking properties. 2.4 Modeling, verification and test of acceptable behavior As mentioned above, the behavior of nanoscale systems may be "acceptable" within a certain range, which is possibly application specific (e.g. accuracy or speed). This observation has been exploited in /6/ to introduce the concept of error tolerant design. Within the framework of the RealTest Project a more general approach is followed to develop metrics for "acceptable behavior" taking into account aspects of both offline and online testing. Along with the development of respective metrics and their integration into ATPG tools, an important issue is to provide means for estimating the impact of hard or soft errors. The "severity" of a soft error in a sequential circuit can for example be measured by the number of clock cycles the system needs to return to a fault free state /12/. The respective classification of soft errors in /12/ is based on a temporary stuck-at fault model for soft errors and an efficient estimation of the error probability Pen associated with each fault. Perr reflects the probability that a soft error causes an erroneous output or system state. It can also be used as a guideline for selective hardening of circuit nodes /30/. 3 Single event transients -An underestimated problem As soft errors in random logic are a key challenge in nanoscale systems, within the framework of the Real Test Project special emphasis has been placed on modeling the effects of particle strikes in combinational logic /15/. The results of this work have shown that soft errors in random logic are still an underestimated problem. In particular, it has been shown that in the majority of investigated cases soft errors remain in the system about twice as long as predicted by traditional approaches. For a better understanding of these results, the differences between tradi- tional modeling and the refined approach from /15/ are pointed out in more detail in the sequel. A particle strike in combinational logic can cause a glitch in the output voltage of a logic gate /8/. Usually such a "single event transient" (SET) only leads to a system failure, if it can propagate to a register and turn into a single event upset (SED) there. As a precondition, propagation paths must be sensitized in the logic, and the glitch must arrive at the register during a latch window /23, 31/. In Figure 1 this is illustrated for a small example. strike do Fig. 1: Logical and latch window masking. If the particle strike at the AND gate produces a glitch at the output, this can only be propagated through the OR gate for w = 0. The glitch at the output of the OR gate is not latched in the FF, because it has disappeared before the next rising edge of the clock. In addition, depending on the amplitude of a glitch, its propagation can also be prevented by electrical masking /9/. Overall, it is particularly important not only to predict the occurrence of an SET but also to accurately characterize its expected shape. State of the art device simulators allow a precise characterization of SETs, but they are also highly computationally intensive /10/. In many cases circuit level techniques offer a good compromise between accuracy and computational cost /3, 20, 25, 32, 35/. They can also be combined with device level analysis to mixed level approaches /9, 10/. 3.1 Refined electrical modeling for particle strikes Most circuit level approaches model the effect of a particle strike with the help of a transient current source as shown in Figure 2. A common approximation to determine the current slope l{t) is the double exponential function in equation (1) /28/. Here Xa is the collection time-constant of the pn-junction, and Xb denotes the time-constant for establishing the electron-hole track. s6 öb NMOS Fig. 2: Transient current model. m=h f / N, t / ^^ t exp -- - exp -- V y (1) An alternative model is given by formula (2) with parameters Q, X and K, where Q is the collected charge, x is a pulse-shaping parameter and K is a constant /11/. I(t) = KQ 1 t — exp T (2) Both models assume a constant voltage V across the pn-junction and do not considerthe interdependence between charge collection and the change in voltage over time. This simplification is appropriate for modeling strikes at a significant distance from a pn-junction, where charge is collected by diffusion. However, if an a-particle or a heavy ion generated by a neutron strike crosses a pn-junction, this leads to a "tunneling" process, which has first been described by Hsieh for a-particle strikes /16/. Here, charge collection by drift is the dominating phenomenon, and this process depends on the electric field strength, and thus on the voltage. Among several models for the charge collection by drift, Hu's model has been selected as the basis for the work in /15/, because it is also valid for variable field strength /17, 27, 28/. Hu only considers a-particle strikes, but it has been shown by device simulations that ions crossing a pn-junction lead to similar effects /37/. For the sake of simplicity, in the following explanations it is assumed that the particle strikes the pn-junction at an angle of 90°, and the discussion is restricted to NMOS without loss of generality. The particle strike in Figure 3 generates a track of free electron-hole-pairs, which disturbs the depletion zone. The electrons from the track are drifting to the drain/source region while the holes are drifting into the substrate generating an electric field. The depletion zone is gradually regenerated in the regions where no holes are left over. This tunneling process is finished when all the holes have drifted out of the original depletion zone. To model the current flow Hu assumes an ideal voltage source V as depicted in Figure 3. V w < » + - udpl 1 + > u rt + + - . / 1 + \ v f^s / p-substrate Fig. 3: Funneling process. In addition to V, the drift current IdmAt) is determined by the diode potential Ud of the pn-junction, the voltage L/dpl{0 across the depletion zone, the resistance Rt of the electron-hole-track, and the resistance Rs of the substrate. With G = (Rt + Rs)'^ the curve ldrm(t) is given by equation (3). (3) To determine the voltage UdplH) Hu assumes that the charge carrier density is equal to the density Nsub of acceptors in the substrate. However, Juhnke has shown by device simulation that this approximation may not be precise enough /19/. Exploiting the condition of quasi-neu-trality in semiconductors Juhnke derives an improved model with equation (4) for Uopdi). N (4) The parameter Nehj is the line density of the electron-hole-pairs along the track, which depends on the energy of the particle strike. K is a technology dependent parameter mainly determined by the mobilities of the electrons and holes and by the density of acceptors in the substrate. Inserting (4) into (3) pro-vides the differential equation (5) for Idriftit). N (5) For constant voltage V this equation has a closed form solution and Juhnke's model can be summarized by formula (6). L,ß(f) = G-(V+U„)-cxp I T y N., (6) G-K^V + U,; As observed in /15/ the assumption of constant voltage is only necessary to derive a closed form solution for Idridt)-The term V + Udln equation (5) can therefore be replaced by a variable voltage U{t), which pro-vides equation (7). (7) With Cit) = N^,|/(K■^fu(t}) equation (7) can be rewritten to formula (8), which suggests the interpretation as a serial connection of a capacitance and a conductance. Since the capacitance C(t) depends on U{t), the model is also referred to as UGC model. U(t)- C{t) I.inf,{t')dt' (8) State-of-the-art circuit simulators based on advanced description languages such as VHDL-AMS allow the implementation of arbitrary two terminal networks. Thus, it is not necessary to solve equation (7) analytically, but it can be passed directly to the simulator for numerical analysis. A symmetric analysis can be carried out for PMOS devices, but then the network must be connected with opposite polarity and the technology parameter K must be adapted. First experiments reported in /15/ have shown that the traditional transient current model (based on equation (6)) and the UGC model provide significantly different results. Analyzing for example the behavior of a transistor after an a-particle strike of 1 MeV, the glitches in the drain voltage predicted by the UGC model have smaller amplitude but longer duration. To justify this different view on single event transients, the UGC model has been validated by comparing it to the device level analysis of an NMOS transistor reported in /9/. As shown in /15/ both the device level simulations and the circuit level simulations using the UGC model yield smaller amplitudes and longer du-rations than traditional circuit level simulations based on a transient current source. 3.2 Gate level modeling and simulation results The impact of the UGC model on SEU prediction can be two-fold. On the one hand, smaller amplitudes may increase electrical masking, but on the other hand a longer duration of glitches is likely to increase the probability of propagation through the circuit. In order to analyze the impact of the UGC model in more detail, in /15/ the gate level behavior in the presence of SETs has been extracted using standard techniques as described in /1/. The circuit level parameters were based on a 130 nm process, and for each gate full parasitic information was taken into account during extraction. This way a gate library was created and used to synthesize a set of finite state machine benchmarks with the SIS synthesis tool, the characteristics of which are summarized in Table 1 /26, 38/. The columns show the names of the finite state machines, the number of states, the number of primary inputs and outputs, the number of flip-flops and the number of gates after state minimization, state coding and logic minimization as well as the minimum cycle times in picoseconds. For the simulation at the gate level with a state of the art event driven simulator, the properties of the library cells were mapped to VHDL behavioral descriptions. To model electrical masking at the gate level, the observations reported in /7/ were exploited. Electrical masking is most Table 1. Characteristics of FSM examples FSM States PI PO FF Gates tc [ps] bbara 10 4 2 8 90 670 dkl4 7 3 5 3 145 993 dkl 6 27 2 3 5 409 2068 ex5 9 2 2 2 18 348 ex6 8 5 8 3 123 928 fetch 26 9 15 9 210 697 keyb 19 7 2 8 333 905 lion 4 2 1 2 20 308 mc 4 3 5 9 50 381 nucpwr 29 13 27 5 271 568 si 20 8 6 8 199 1159 sand 32 11 9 21 928 1186 scf 122 27 56 24 1280 1668 shiftreg 8 1 1 4 16 209 styr 30 9 10 5 767 2677 sync 52 19 7 33 529 1403 train 11 11 2 1 2 15 1 211 pronounced in tine first two logic levels after the struck node and after this, electrical masking effects can be neglected and strictly Boolean behavior can be assumed. To quantify the impact of the UGC model, the following simulation flow is reported in /15/. The behavior of a finite state machine is monitored during a given number of cycles with a random input sequence. To compare the UGC model to the common model based on a transient current source, in fact three copies of the finite state machine are simulated under exactly the same conditions. In each clock cycle a random SET is injected into the combinational logic of the finite state machine: an SET characterized by the UGC model in one copy and an SET characterized by a transient current source into the other copy. For comparison the third copy simulates the fault free case. If the SET cannot propagate to a flip-flop in neither copy, then the next SET is injected in the next cycle. Otherwise, a checkpoint for the simulation of the good machine is generated, and the simulation is continued until a fault free state is reached again. This way it can be determined how long the fault effects remain in the system, which can be used as a measure of the "severity" of the faults /12/. If the fault effects remain in the system for more than a given limit, then the analysis is stopped to save simulation time. After the states of both copies agree with the good machine or the analysis of fault effects has been stopped, the checkpoint for the simulation of the good machine is restored, and simulation continues with the injection of the next SET. For the first series of experiments in /15/, a clock of maxi-mum frequency was assumed while monitoring the finite state machine for 10 million SET injections. The results showed that once an SET manifested itself as an SEL) in the system, the average time for the SEU to stay in the system was similar for both the UGC and the traditional transient current model. However, comparing the number of occurrences of SEUs showed significantly different results for both models. To simplify the discussion of the results in the following let tuac denote the number of cycles an SET remains in the system when the simulation is based on the UGC model, and let tf^fis represent the same number for the transient current model. Furthermore, the number of SETs with tuGc > k is denoted by n{tuGc > k), and the number of SETs with ttans > k\s denoted by nituans > k). In particular, a value of tuGc or tuans larger than zero means that the SET has been propagated to one or more registers, consequently causing an SEU. In sequential circuits an SEU can sometimes be tolerated, if it remains in the system only for or a few clock cycles and the system recovers quickly to fault free operation /12/. But if it repeatedly propagates through the next state logic and stays in the system for many cycles, then the risk of a severe system failure increases considerably. Thus, it is also particularly important to compare the results for the number of SEUs staying in the system for more than a tolerable number of cycles. Figure 4 compares n{tuGc > 0) and ndtrans > 0) as well as nituGc > 20) and n{ttrans > 20). For each circuit, the left bar shows the ratio n{tuGc > 0)/n{ttrans > 0), and the right bar represents the ratio n(tuGc > 20)/n{ttrans > 20). There are some cases where no SEUs stayed in the system for more than 20 cycles in both cases. Here the respective bars are omitted. n(tuGc>0)/n(t„,„,>0) ■ n(tuGc>20)/n(t„,„,>20) Fig. 4: Comparirig the ratios n(tuGC > 0)/n(ttrans > 0), and n(tuGc > 20)/n(ttrans > 20) for maximum frequency. It can be observed that the major trend is a factor of two between the UGC model and the transient current source model. This implies that the more realistic prediction by the UGC model results in twice as many (severe) SEUs as a prediction by the traditional transient current model. The detailed results in /15/ show that there are also some cases where the transient current source model predicts longer times for the SEUs to stay in the system. In this case the smaller amplitudes predicted by the UGC model result in electrical masking. But five to ten times more often the longer duration of glitches is the dominating effect. Although the probability for an SET to be latched in a flip-flop increases with the operating frequency, these trends have been confirmed also for simulations based on different clock frequencies /15/. 4 Conclusions The increasing variability of parameters and the increasing vulnerability to defects, degradation, and transient faults require a paradigm shift in design and test of nanoscale systems. A robust and fault tolerant system design becomes mandatory also for non critical applications, and testing has to characterize not only the functionality but also the robustness of a system. The RealTest Project addresses these problems by developing unified design and test strategies supporting both a robust design and efficient test procedures for manufacturing test as well as online test and fault tolerance. First results concerning the susceptibility of random logic to soft errors have shown that the effects of SETs have still been underestimated so far. Simulations at gate level based on a refined electrical model for SETs have revealed about twice as many critical effects as simulations based on a traditional model. References /1/ L. Anghel, R. Leveugle, P. Vanhauwaert, "Evaluation of SET and SEU at Multiple Abstraction Levels," Proc, 11th IEEE Int. Online Testing Symposium 2005 (IOLTS'05), San Raptiael, France, pp. 309-312, 2005 /2/ R. Baumann, "Soft Errors in Advanced Computer Systems," IEEE Design &Test of Computers, Vol. 22, No. 3, pp. 258-266, 2005. /3/ M. Baze, et al., "An SEU analysis approach for error propagation in digital VLSI CMOSASICs," IEEE Trans, on Nuclear Science, Vol, 42, No. 6 Part 1, pp. 1863-1869, 1995. /4/ B. Becker, et al., "DFG Projekt RealTest - Test und Zuverlässigkeit nanoelektronischer Systeme (DFG-Project RealTest - Test and Reliability of Nano-Electronic Systems)", it - Information Technology, Vol. 48, No. 5, 2006, pp. 304-311. /5/ S. Borkar, "Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation", IEEE Micro, Vol. 25, No. 6, pp. 10-16, Nov./Dec. 2005. /6/ M, A. Breuer, S. K. Gupta, and T M. Mak, "Defect and Error Tolerance in the Presence of Massive Numbers of Defects", IEEE Design &Test, Vol. 21, No. 3, pp. 216-227, May-June 2004 /7/ H. Cha, et al., "A gate-level simulation environment for alpha-partiole-induoed transient faults," IEEE Trans, on Computers, Vol. 45, No. 11, pp. 1248-1256, 1996. /8/ P. Dodd and L. Massengill, "Basic mechanisms and modeling of single-event upset in digital microelectronics," IEEE Trans, on Nuclear Science, Vol. 50, No. 3, pp. 583-602, 2003. /9/ P. Dodd, eta!., "Production and propagation of single-event transients in high-speed digital logic ICs," IEEE Trans, on Nuclear Science, Vol. 51, No. 6 Part 2, pp. 3278-3284, 2004. /10/ P. Dodd, "Physics-based simulation of single-event effects," IEEE Trans, on Device and Materials Reliability Vol. 5, No. 3, pp. 343-357, 2005. /11/ L. Freeman, "Critical charge calculations for a bipolar SRAM array," IBM Journal of Research and Development, Vol. 40, No. 1, pp. 119-129, 1996. /12/ J. Hayes, I. Polian, and B, Becker, "An Analysis Framework for Transient Error Tolerance", Proc. 25'" IEEE VLSI Test Symp. (VTS'07), Berkeley CA, USA, pp. 249-255, 2007 /13/ P. Hazucha, et ai., "Neutron soft error rate measurements in a 90-nm CMOS process and scaling trends in SRAM from 0.25-/ spl mu/m to 90-nm generation," Technical Digest IEEE Int. Electron Devices Meeting 2003 (IEDM'03), pp. 21-526, 2003. /14/ S. Hellebrand et al., "Efficient online and offline testing of embedded DRAMs", IEEE Trans, on Computers, Vol. 51, No. 7, pp. 801-809, July 2002. /15/ S. Hellebrand, et al., "A Refined Electrical Model for Particle Strikes and its Impact on SEU prediction", Proc. 22"^ IEEE Int. Symposium on Defect and Fault Tolerance in VLSI Systems (DFT07), Rome, Italy September 26-28, 2007. /16/ C. Hsleh and P. Murley, "A field-funneling effect on the collection of alpha-particle-generated carriers in silicon devices," IEEE Electron Device Letters, Vol. 2, No. 4, pp. 103-105, 1981. /17/ C. Hu, "Alpha-particle-induced field and enhanced collection of carriers," IEEE Electron Device Letters, Vol. 3, No. 2, pp. 31-34, 1982. /18/ A. Jee and F. J. Ferguson, "Carafe: An Inductive Fault Analysis Tool for CMOS VLSI Circuits", Proc. 11th IEEE VLSI Test Symp., pp. 92-98, 1993 /19/ T. Juhnke, "Die Soft-Error-Rate von Submikrometer-CMOS-Logik-schaitungen," PhD Thesis, Technical University of Beriin, 2003. /20/ N. Kaul, B, Bhuva, and S. Kerns, "Simulation of SEU transients in CMOS ICs," IEEE Trans, on Nuclear Science,Vol. 38, No. 6 Part 1, pp. 1514-1520, 1991. /21 / R. Kuppusw/amy et al., "Full hold-scan systems in microprocessors: Cost/benefit analysis", Intel Technology Journal, 8(1), pp. 63-72, Feb. 2004. /22/ R K. Lala, "Self-Checking and Fault-Tolerant Digital Design", Morgan Kaufmann Publishers, San Francisco, 2001 /23/ P. Liden, et al., "On latching probability of particle induced transients in combinational networks," Digest of Papers 24th Int. Symp. on Fault-Tolerant Computing 1994 (FTCS-24), pp. 340-349, 1994. /24/ L. W. Liebmann, "Layout impact of resolution enhancement techniques: impediment or opportunity?", Proc. Int. Symposium on Physical Design 2003 (ISPD '03), Monterey CA, USA, pp. 110-117, 2003. /25/ A. Maheshwari, I. Koren, and W. Burieson, "Accurate estimation of soft error rate (SER) in VLSI circuits," Proc. 19th IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems 2004 (DFT 2004), pp. 377-385, 2004. /26/ K. McElvain: IWLS'93 Benchmark Set: Version 4.0, distributed as part of the IWLS'93 benchmark distribution, available at http:/ /www.cbl.ncsu.edu:16080/benchmarks/LGSynth93/ /27/ F. McLean and T. Oldham, "Charge tunneling in N-and P-type Si substrates," IEEE Trans, on Nuclear Science, Vol. 29, No. 6, pp. 2018-2023, 1982. /28/ G. Messenger, "Collection of charge on junction nodes from ion tracks," IEEE Trans, on Nuclear Science, Vol. 29, No. 6, pp. 2024-2031, 1982. /29/ S. Mitra, et al., "Logic soft errors in sub-65nm technologies design and CAD challenges," Proc. 42nd Cent, on Design Automation, pp. 2-4, 2005. /30/ K. Mohanram and N. A. Touba, "Cost-effective approach for reducing soft error failure rate in logic circuits", Proc. IEEE Int. Test Conf. (ITC'03), Charlotte, NC, USA, pp.893-901, Sept./ Oct. 2003. /31 / S. Mukherjee, et al., "A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor," Proc. 36th Annual IEEE/ACM Int. Symp. on Microarchitecture 2003 (MICRO-36), pp. 29-40, 2003. /32/ H. Nguyen and Y. Yagil, 'A systematic approach to SER estimation and solutions," Proc. 41st Annual IEEE Int. Reliability Physics Symp., pp. 60-70, 2003. /33/ M. Nicolaidis and Y. Zorian, "On-Line Testing for VLSI - A Compendium of Approaches", Journal of Electronic Testing: Theory and Appli cations (JETTA), Vol. 12, No. 1-2, pp. 7-20, February/April 1998. /34/ M. Nicolaidis, "Design for Soft Error Mitigation," IEEE Trans, on Device and Materials Reliability, Vol. 5, No. 3, pp. 405-418, 2005 /35/ M. Omana, et al., "Amodelfortransientfault propagation in combinatorial logic," Proc. 9th IEEE On-Line Testing Symp. 2003 (IOLTS'03), pp. 111-115, 2003. /36/ D. K. Pradhan, "Fault Tolerant Computer System Design", Prentice Hall, Upper Saddle River, NJ, USA, 1996. /37/ P. Roche, et a!., "Determination of key parameters for SEU occurrence using 3-D full cell SRAM simulations," IEEE Trans, on Nuclear Science, Vol. 46, pp. 1354-1362, 1999 /38/ E. M. Sentovich, etal.: SIS: A System for Sequential Circuit Synthesis: Electronics Research Laboratory, Mem. No. UCB/ERL/ M92/41, Department of Eieotrical Engineering and Computer Science, University of California, Berkeley, CA 94720 /39/ S. K, Shukia, R. I. Bahar (Eds.), "Nano, Quantum and Molecular Computing - Implications to High Level Design and Validation", Boston, Dordrecht, London, Kluwer Academic Publishers, 2004. /40/ J. E. Smith and G. Metze, "Strongly Fault Secure Logic Networks", IEEE Transactions on Computers, Vol. c27, No. 6, pp. 491-499, June 1978. Sybille Hellebrand University of Paderborn, Germany Christian G. Zoellln, Hans-Joachim Wunderlich University of Stuttgart, Germany Stefan Ludwig, Torsten Coym, Bernd Straube, Fraunhofer IIS-EAS Dresden, Germany Prispelo (Arrived): 15.07.2007 Sprejeto (Accepted): 01.09.2007