ISSN 0352-9045 Informacije IMIDEM Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 2 (2015), June 2015 Revija za mikroelektroniko, elektronske sestavne dele in materiale letnik 45, številka 2 (2015), Junij 2015 UDK 621.3:(53+54+621+66)(05)(497.1)=00 ISSN 0352-9045 Informacije MIDEM 2-2015 Journalof Microelectronics, Electronic Components and Materials VOLUME 45, NO. 2(154), LJUBLJANA, JUNE 2015 | LETNIK 45, NO. 2(154), LJUBLJANA, JUNIJ 2015 Published quarterly (March, June, September, December) by Society for Microelectronics, Electronic Components and Materials - MIDEM. Copyright © 2014. All rights reserved. | Revija izhaja trimesečno (marec, junij, september, december). Izdaja Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale - Društvo MIDEM. Copyright © 2014. Vse pravice pridržane. Editor in Chief | Glavni in odgovorni urednik Marko Topič, University of Ljubljana (UL), Faculty of Electrical Engineering, Slovenia Editor of Electronic Edition | Urednik elektronske izdaje Kristijan Brecl, UL, Faculty of Electrical Engineering, Slovenia Associate Editors | Odgovorni področni uredniki Vanja Ambrožič, UL, Faculty of Electrical Engineering, Slovenia Slavko Amon, UL, Faculty of Electrical Engineering, Slovenia Danjela Kuščer Hrovatin, Jožef Stefan Institute, Slovenia Matjaž Vidmar, UL, Faculty of Electrical Engineering, Slovenia Andrej Žemva, UL, Faculty of Electrical Engineering, Slovenia Editorial Board | Uredniški odbor Mohamed Akil, ESIEE PARIS, France Giuseppe Buja, University of Padova, Italy Gian-Franco Dalla Betta, University of Trento, Italy Martyn Fice, University College London, United Kingdom Ciprian Iliescu, Institute of Bioengineering and Nanotechnology, A*STAR, Singapore Malgorzata Jakubowska, Warsaw University of Technology, Poland Marc Lethiecq, University of Tours, France Teresa Orlowska-Kowalska, Wroclaw University of Technology, Poland Luca Palmieri, University of Padova, Italy International Advisory Board | Časopisni svet Janez Trontelj, UL, Faculty of Electrical Engineering, Slovenia - Chairman Cor Claeys, IMEC, Leuven, Belgium Denis Donlagic, University of Maribor, Faculty of Elec. Eng. and Computer Science, Slovenia Zvonko Fazarinc, CIS, Stanford University, Stanford, USA Leszek J. Golonka, Technical University Wroclaw, Wroclaw, Poland Jean-Marie Haussonne, EIC-LUSAC, Octeville, France Barbara Malič, Jožef Stefan Institute, Slovenia Miran Mozetič, Jožef Stefan Institute, Slovenia Stane Pejovnik, UL, Faculty of Chemistry and Chemical Technology, Slovenia Giorgio Pignatel, University of Perugia, Italy Giovanni Soncini, University of Trento, Trento, Italy Iztok Šorli, MIKROIKS d.o.o., Ljubljana, Slovenia Hong Wang, Xi'an Jiaotong University, China Headquarters | Naslov uredništva Uredništvo Informacije MIDEM MIDEM pri MIKROIKS Stegne 11, 1521 Ljubljana, Slovenia T. +386 (0)1 513 37 68 F. + 386 (0)1 513 37 71 E. info@midem-drustvo.si www.midem-drustvo.si Annual subscription rate is 100 EUR, separate issue is 25 EUR. MIDEM members and Society sponsors receive current issues for free. Scientific Council for Technical Sciences of Slovenian Research Agency has recognized Informacije MIDEM as scientific Journal for microelectronics, electronic components and materials. Publishing of the Journal is cofinanced by Slovenian Book Agency and by Society sponsors. Scientific and professional papers published in the journal are indexed and abstracted in COBISS and INSPEC databases. The Journal is indexed by ISI® for Sci Search®, Research Alert® and Material Science Citation Index™. | Letna naročnina je 100 EUR, cena posamezne številke pa 25 EUR. Člani in sponzorji MIDEM prejemajo posamezne številke brezplačno. Znanstveni svet za tehnične vede je podal pozitivno mnenje o reviji kot znanstveno-strokovni reviji za mikroelektroniko, elektronske sestavne dele in materiale. Izdajo revije sofinancirajo JAKRS in sponzorji društva. Znanstveno-strokovne prispevke objavljene v Informacijah MIDEM zajemamo v podatkovne baze COBISS in INSPEC. Prispevke iz revije zajema ISI® v naslednje svoje produkte: Sci Search®, Research Alert® in Materials Science Citation Index™. Po mnenju Ministrstva za informiranje št.23/300-92 se šteje glasilo Informacije MIDEM med proizvode informativnega značaja. Design | Oblikovanje: Snežana Madic Lešnik; Printed by | tisk: Biro M, Ljubljana; Circulation | Naklada: 1000 issues | izvodov; Slovenia Taxe Percue | Poštnina plačana pri pošti 1102 Ljubljana Informacije i midem Journal of Microelectronics, Electronic Components and Materials vol. 45, No. 2 (2015) Content | Vsebina Original scientific paper Izvirni znanstveni članki M. E Bajak, F. Kagar:: A New Fully Integrated High Frequency Full-Wave Rectifier Realization 101 M. E Bajak, F. Kaçar: Nov polno integriran visokofrekvenčni polnovalni usmernik K. Gorecki, J. Zar^bski, D. Bisewski: An Influence of the Selected Factors on the Transient Thermal Impedance Model of Power MOSFET 110 K. Gorecki, J. Zarçbski, D. Bisewski: Vpliv določenih faktorjev na tranzienten termično impedančen model močnostnega MOSFET P. B. Petrovic: Electronically Controllable Current-Mode True RMS to DC Converter 117 P. B. Petrovič: Elektronsko nadzorovan RMS DC pretvornik v tokovnem načinu B. Wang, Y. Zhuang, X. Li: A Novel Dual Ports Antenna for Handheld RFID Reader Applications 125 B. Wang, Y. Zhuang, X. Li: Nova dvovhodna antenna za ročne RFID bralnike S. M. Djuric, N. M. Djuric, M. S. Damnjanovic: The Optimal Useful Measurement Range of an Inductive Displacement Sensor 132 S. M. Djurič, N. M. Djurič, M. S. Damnjanovič: Optimalno uporabno območje induktivnega senzorja premika V. Sklyarov, I. Skliarova, A. Rjabov, A. Sudnitson: Zynq-based System for Extracting Sorted Subsets from Large Data Sets 142 V. Sklyarov, I. Skliarova, A. Rjabov, A. Sudnitson: Sistem na osnovi Zynq za izluščitev razvrščenih podsklopov iz obsežnih podatkovnih sklopov A. A. Demidov, O. A. Kalashnikov, A. Y. Nikiforov, A. S. Tararaksin, V. A. Telets: Radiation Behavior and Test Specifics of A-D and D-A Converters 153 A. A. Demidov, O. A. Kalashnikov, A. Y. Nikiforov, A. S. Tararaksin, V. A. Telets: Sevalno obnašanje in testne posebnosti A-D in D-A pretvornikov A. Burmen, H. Habal: Computing Worst-Case Performance and Yield of Analog Integrated Circuits by Means of Mesh Adaptive Direct Search 160 A. Burmen, H. Habal: Določanje najslabših lastnosti in izplena analognih integriranih vezij z adaptivnim mrežnim direktnim optimizacijskim postopkom Announcement and Call for Papers: 51st International Conference on Microelectronics, Devices and Materials With the Workshop on Terahertz and Microwave Systems 171 Napoved in vabilo k udeležbi: 51. Mednarodna konferenca o mikroelektroniki, napravah in materialih z delavnico o teraherznih in mikrovalovnih sistemih Front page: The sensor element, detecting normal displacement (S. Djuric et al.) Naslovnica: Senzorski element za detektiranje normalnega premika (S. Djurič et al.) 99 loo Original scientific paper Informacije Journal or Microelectronics, Electronic Components and Materials Vol. 45, No. 2 (2015), 101 - 109 A New Fully Integrated High Frequency Full- Wave Rectifier Realization Muhammed Emin Ba§ak1, Firat Kagar2 1Yildiz Technical University, Faculty of Naval Architecture and Maritime, Istanbul, Turkey 2Istambul University, Dept. of Electrical and Electronics, Istanbul, Turkey Abstract: In this paper, a new fully integrated high frequency precise full-wave rectifier which consists of a floating current source (FCS) and four complementary MOS transistors is presented. The presented circuit has an appropriate zero crossing performance, linearity, low component count, and can be adapted to modern IC technologies. It is also suitable for monolithic integrated implementation. Rectifier performance is simulated based on 0.18^m CMOS technology. The proposed full-wave rectifier circuit provides an operating frequency more than 1 GHz, produces an input operating range from -300 mV to 300 mV and its power consumption is 825 ^W. LTSPICE simulation results of the circuit are presented which verify the workability of the proposed circuit. Noise analysis is also performed. The equivalent output noise of voltage mode rectifier at the 100 MHz is found as 7.13 nVVHz. It also exhibits good temperature stability. The presented circuit does not require any passive component; therefore it is suitable for integrated circuit implementation. The proposed circuit exhibits the high frequency operation, the lower power consumption and has the simplest structure compared to all other available works. Keywords: CMOS; full-wave rectifier; high frequency; floating current source; precision rectifier Izvleček: V članku je predstavljen nov polno integriran natančen polnovalen usmernik, ki je sestavljen iz plavajočega tokovnega vira in štirih komplementarnih tranzistorjev MOS. Predstavljeno vezje ima primerne lastnosti ničnega prehoda, linearnosti, nizkega števila elementov in se g alahko uporabi v vseh modernih IC tehnologijah. Uporaben je tudi v monolitno integriranih vezjih. Lastnosti usmernika so simulirane v 0.18 |m tehnologiji CMOS. Frekvenca predlaganega usmernika je več kot 1 GHz, vhodno območje od -300 mV do +300 mV, poraba 825 |W. Opravljena je bila tudi analiza šuma. Usmernik ima dobro temperaturno stabilnost. Ker ne vsebuje pasivnih elementov je uporaben za integrirana vezja. Ključne besede: CMOS; polnovalen usmernik; visoka frekvenca; plavajoči tokovni vir; natančen usmernik * Corresponding Author's e-mail: fkacar@istanbul.edu.tr, 1 Introduction usmernik Generally rectifier circuits are employed by using diodes, nevertheless, diodes cannot rectify the incoming signals whose amplitudes are less than their threshold voltages. For this reason, voltage-mode rectifiers containing active element based on operational amplifiers (op-amps), Rectification is essential and demanding aspect of signal processing in instrumentation, measurement and control. Rectifiers have a variety of applications such as: signal processing, signal - polarity detectors, amplitude modulated signal detectors, AC voltmeters and ammeters, watt meters, RF demodulators, function fitting error measurements, RMS to DC conversions, sample and hold circuits, peak value detectors, clipper circuits. diodes and resistors, have to be used. However, in consequence of the finite slew-rate and significant distortion during the zero crossing of the input signal effects caused by diode commutation, these circuits operate well only at low frequencies [1 - 5]. This is a small signal transient problem which cannot be solved by high slew-rate op-amps [6]. This problem has been overcome by the use of current mode technique [7-18] thanks to their higher operating frequency, wider bandwidth, larger dynamic range, and lower offset value at the zero crossing area compared with their voltage mode counterparts. However, some proposed rectifier circuits which were improved by the use of current conveyors (CCs) need either grounded or ungrounded resistors or some of them suffer from the limitation of high frequency. The present- 101 © MIDEM Society M. E. Bajak et al; Informacije Midem, Vol. 45, No. 2 (2015), 101 - 109 ed circuit in [7] uses a current differencing transconduct-ance amplifier (CDTA) and two diodes at the operating frequency of 5 MHz. CDTA-based precision full-wave rectifier described in [8] exhibits a good performance at a frequency of 5MHz. The suggested circuits operating at a frequency up to 100 kHz utilize one current conveyor, one voltage conveyor, two diodes and grounded resistors [9-10]. The proposed circuit in [11] employs two differential difference current conveyors (DDCC), but it operates a few MHz. The circuit presented in [12], common-mode two-cell winner-takes-all (WTA) circuits, consisting of 21 transistors and two current sources, can be rectified at signals of frequency over 70MHz. A single second generation current conveyor (CCCII-) based precision full-wave rectifier circuit is reported in [13]. It employs (CCCII-) with three outputs, two CMOS transistor, and an ungrounded resistor, and has an operating frequency of 100 kHz. The circuit presented in [14] employs three current controlled conveyors and five resistors having a testing frequency of 100 kHz. The reported circuit in [15] utilizes two current conveyors and three NMOS transistors and its operating frequency is up to 100 MHz. The proposed circuit in [16] employing a dual-X current conveyor and three NMOS transistors, has been successfully tested by applying a sinusoidal input voltage with a frequency of 250 kHz. The reported rectifiers in [17-19] have been realized by all CMOS transistors, but they are half wave rectifiers. The proposed circuit in [20] operat- ing at a few MHz is based on current conveyor and current mirror. The realization of full-wave rectifier based on an operational transconductance amplifier (OTA) circuits is proposed in [21-27]. However a large number of active and passive components are used in these rectifiers and they have not shown good performance at higher frequencies. In [24], OTAs utilized as the full-wave rectifier are the only active elements, whereas they have been tested at lower frequencies. A three output operational transconductance amplifier with two complementary MOS transistors and a grounded resistor is used to realize non-inverting and inverting full-wave precision rectifiers in [25]. It rectifies high frequencies up to 200 MHz. The circuit presented in [26] is more suitable for IC implementation than previously OTA based circuits and confirms the operation frequency up to 200 MHz. This circuit consists of a dual-output OTA, junction diodes, and a MOS resistor. Another rectifier circuit uses OTA, four CMOS diodes, and a MOS resistor in its realization, providing operating frequency up to 300 MHz as well as good temperature stability in [27]. Table 1 presents the comparison of the proposed precision full-wave rectifier with other designs. The employed full-wave rectifier is superior to the previously proposed full-wave rectifiers in terms of the power consumption, the number of components, and the operating frequency as seen in Table 1. Table 1: Comparison of the various rectifiers in literature Article DC Supply Voltage Technology Power Compsumption Operating Frequency Components Year Proposed ± 2.4V 0.18 ^m 825 nW 1 GHz 8 x MOSFET + 2 x current sources - [3] ± 1V - - 100 kHz OPA1 + OPA2 + 2 x diodes + 3 x Resistors 2007 [4] ± 1V - - 1 MHz AD817 x 2 + AD633 x 3 + AD711 + R 2010 [5] ± 1V - - 1 MHz AD817 x 2 + AD633 x 3 + AD711 + Resistor 2011 [7] ± 1V - - 5 MHz CDTA + 2 x Schotty Diodes 2010 [9] - - - 500 kHz Current Conveyor + Voltage Conveyor 2010 [10] ± 1V - - 1 MHz 2xCCII + 2 diodes or CCII + VC+2 diodes 2011 [11] ± 2.5V 0.5 ^m - 1 MHz 2xDDCCI 2011 [13] ± 2.5V - - 5 kHz CCCII + 2xCMOS + R 2007 [15] ±1.25V 0.25 ^m 10 MHz 23 MOSFET 2006 [16] ±1.25V 0.25 ^m - 1 MHz DXCCII (20 CMOS) + 3xNMOS 2008 [19] - - - 200 MHz 26 CMOS + 1 current supply 2006 [20] ± 1.5V 0.5 ^m - 10 MHz 33 MOSFET 2007 [23] ± 5V - - 10 kHz 4xOTA or 5xOTA 2007 [24] ± 5V 0.5 ^m - 200 MHz OTA (24MOSFET) + 2 MOS + Resistor 2009 [25] ± 5V 0.5 ^m 7.9 mW 300 MHz 24 MOSFET 2010 [27] ± 1.2V 0.5 ^m - 250 MHz 31 MOSFET 2006 102 M. E. Ba§ak et al; Informacije Midem, Vol. 45, No. 2 (2015), 101 - 109 Floating current source (FCS) was firstly introduced to be used as an output stage for current-mode feedback amplifiers by Arbel and Goldminz in 1992 [28]. Following that, the FCS was used as the output stage of the accurate CCII- proposed in [29-30] to perform the required current conveying action. The FCS has also been used in the realization of fully differential voltage second generation current conveyor [31]. Then, [32] presented two novel floating current source based CMOS negative second generation current conveyor (CCII-). In this paper, a new circuit for realizing full wave rectifier employing a floating current source, two CMOS diodes, and a MOS resistor, is proposed. The proposed circuit was simulated by LTSPICE simulator with 0.18 pm CMOS model obtained through TSMC (Taiwan Semiconductor Manufacturing Company, Limited). The advantages of the presented structure over the previously presented rectifiers are as follows: - The presented structure is very compact and consists of an FCS and four CMOS transistors, thus enjoying a simpler structure compared to all other available works [1-27]. - The proposed circuit, verified the operation frequency up to 1 GHz, which is the highest frequency when compared with the previously published rectifiers. - It does not require any passive component; therefore it is suitable for integrated circuit (IC) implementation. - It provides high precision voltage rectifying. - This rectifier has the lowest power consumption (825 pW) in comparison with the hitherto published rectifiers [1-27]. (a) (b) Figure 1: (a) Symbol of floating current source circuit (b) MOSFET implementation of floating current source circuit [28] 2 The Floating Current Source Floating current source circuit can be viewed as two differential pairs connected in parallel; an NMOS pair and a PMOS pair. It is assumed that M1 - M2 and M3 - M4 are matched and operate in the saturation region for the NMOS pair and PMOS pair, respectively. Symbol of the FCS and its MOSFET implementation is shown in Fig. 1 (a) and Fig. 1 (b), respectively. [28] provides two balanced output currents satisfying Kirchhoff's current law. The equations of the output currents are given in below. 2 +1Ol +10 2 iß 2 = 1B1 I = -1 1O\ 1O 2 (1) (2) (3) It is assumed that M, M2 and M3 M4 are equal transis- tors and so we can say that the transconductance of M1 is equal of transconductance of M2 (gm1 = gm2) and transconductance of M3 is equal of transconductance of M (g 3 = g J- Then the transconductances of the 4 m3 m4 FCS circuit (gmo1 and gmo2) are given in Equation (4). The output impedances of the FCS structure are given in Equation (5). S mol S mo 2 S ml + S m3 2 (4) R 1 — Ro — oi o 2 2 -i § m3 § ds 3 § m3 + §. + m 4 § mi§ dsi § mi + §. m2 (5) gds1 + S ds 3 Two balanced output currents are given by [33]; 103 M. E. Bajak et al; Informacije Midem, Vol. 45, No. 2 (2015), 101 - 109 ^o\ ~ 1o2 >/V\| \ — V 2 Vd -¡khsi knVd 4 + + (6) k V2 ^p* d where Vd = V, - V2 d 1 2 V1 and V2 are the voltages applied to Y1 and Y2, respectively. kn is the NMOS transconductance parameters given by W kn - ßnCOX T (7) k is the PMOS transconductance parameters given by kp — ßpCax l (8) where |=mobility of carrier; Cox=gate capacitance per unit area; W, L = channel width and channel length of the MOS transistor, respectively; IB1, IB2 = bias currents; 3 Proposed Full-Wave Rectifier Circuit The basis of proposed full-wave rectifier is shown in Fig. 2. It is composed of three parts which are FCS, four diodes and resistor. A FCS is used to convert the voltage into two currents through the terminals p and n, then the four diodes rectify these currents. Afterwards, resistor converts the rectified current into the output voltage. The voltage source Vb is approximately equal to the sum of the threshold voltage of D1 and D2 and keep them ready for conduction [6]. Cross-section of NMOS and PMOS transistors in a p-substrate CMOS process are presented in Fig. 3. The structure of MD1 and MD2 are used to replace diodes D1 to D4 [26]. The diodes D1 and D2 are the junction diodes established between p-substrate and n+ diffusion of the drain and source regions and D3 and D4 are the junction diodes established between n-well and p+ diffusion of the drain and source regions. These diodes operate as a precision rectifier. Figure 2: The principle of the proposed rectifier Figure 3: Cross-section of MD1 and MD2 transistors in a p-substrate CMOS process [26]. The proposed high precision full-wave rectifier in detail is shown in Fig. 4. The FCS is consisted of four MOS transistors (M ,-MJ and two bias currents L and L. Mn, 1 4 B1 B2 D1 and MD2 are used as four diodes as described the above and MR1 and MR2 are operated as a MOS resistor in the saturation region. The resistance value of MOS resistor R can be expressed as; R _ 1 0 _ 2 0 0 if I in < 0 0 if I in > 0 I2 SmlVz = -In if I in < 0 Ir (7) The equation above suggests the output current I as: where VT= 26 mV at 27oC is the usual thermal voltage given by kT/q, k = Boltzmann's constant = 1.38x10-23 J/K, T = the absolute temperature (in Kelvins), and q = 1.6x10-19 C. Generally, a MO-CCCII is a multiple-terminal active building block, as shown in Fig. 1. The port relations of the MO-CCCII can be presented by the following equation [1]: ly = 0; vx = vy + *xRx; lz + = +ixlz- = -ix (3) The bipolar realisation of the MO-CCCII is proposed in [18]. In this case, the parasitic resistance Rx at the terminal x can be expressed by: Rx = VT 2I (4) B where VT is the thermal voltage and IB (IB1 and IB5 in the proposed realisation, Fig. 1) is the bias current of the conveyor which remains tunable over several decades. By the routine analysis of the proposed RMS circuit shown in Fig. 1 and using the properties of MO-CCCD-TA and MO-CCCII, the output current at z terminal of MO-CCCDTA is obtained by: I z 1in 1x3 (5) /z I = I + I = ln 1out 1 x\~ 1 x 2 j 11■ Rx2 Vi2 Rin V (8) out Where Rn = Rx1. The current I , is then converted to ri Al out the output voltage, Vou, with an implied low-pass filtering function. We can recognise that the output current-to-voltage conversion (with second CCCII) establishes a differential equation relating the current, Iout, to the output voltage, Vout, i.e.: Vout (() + ®0Vout (() = CIout (t); = Re (9) A simple way to obtain this equation is to determine the transfer function relating Iout to Vout, and then take this back to the time domain. Equation (9) is the generic time-domain description of a low-pass filter, where the coefficient of the undifferentiated term on the LHS of the equation equals the filter cut-off frequency. Equation (8) can subsequently be combined with the above to obtain: Vout()+®oKut() = VÈtl. Ri cvout (t y R 2 R1 =- in R 10) x2 We may now multiply both sides of the equation by 2Vout and make a simple observation incorporated into the final result: whereupon the output voltage at z terminal (V) of MO-CCCDTA equals: Vz I x3 2VTIh in Sm3 Ir (6) Figure 1 infers that IB2 = Ih ,IB3 =-Ih and IB4 = Ir. Thus, the I and 12 can be obtained by: 2 2V0Ut (t )V0Ut (t ) + 2v0V02ut (t ) = — V-2 (( RC d ,, ((Out (( ))+ 2®o (ut (t ))=^7 V-2 (() dt RC (11) Equation (11) is a first-order differential equation relating (y„ut)2 and (V )2, having the same form as (9). Therefore, the square of the output is a low-pass filtered version of the square of the input. Based on (11), we can assume that: 119 P. B. Petrovič; Informacije Midem, Vol. 45, No. 2 (2015), 117 - 124 VL (() = ^ e~ 2fl°((-t)V2 (T)dT R\C o (12) The equation above (convolution integral) implies that if the square root of both sides is considered, the output is the root mean square of the input voltage, where the integral is assumed to compute mean value function. The implied filtering function is thus given by: The equation above infers the accuracy associated with the proposed circuit for measuring the effective value of the input sine voltage signal. In the case of the input signal described by the Fourier order, the estimation defined by (15) gains in complexity, whereas it also clearly implies that it is possible to filter and single out the effective value of the signal processed in the respective manner. H (s ) = R &3db R1 s + ®3db where a^db = 2 RC (13) The low-pass filter performs averaging of the RMS function and needs to be of a lower corner frequency than the lowest frequency of interest. For line frequency measurements, this filter is simply too large to implement on-chip, but the proposed detector requires only one capacitor on the output to implement the low-pass filter. This capacitor can be selected by the user, depending on frequency range and settling time requirements. Low-pass filtering the square of the input sine functions with some a certain amplitude, frequency and phase shift (Vn (t) = Vcos(ax + 0)), as suggested by (11), yields a time function y(t) given by: ( M V y(()--j \ i+ 1 + (2 a/^idb )2 ^cos1 (lot) - -Out (() (14) 3 Non-ideal system analysis The effects of MO-CDTA and MO-CCCII non-idealities on the RMS detector performance are to be considered in this section. By considering the non-ideal MO-CCCII characteristics, equation (3) can be rewritten as: L = 0; vx =avv + ixRx; iz + = +ppix; iz- = -pnix (16) where a = 1 -ev and £v (|£v|<<1) represents the voltage tracking error from y to x terminal, bp = 1 -ep and £p (|ep|<<1) denotes the current tracking error from x to +zterminal, while pn = 1 -en and £n (|ej<<1) stands for the current tracking error from x to -z terminal of the MO-CCCII, respectively. Given the non-idealities, currents generated from first and second CCCIIs (first and third circuits of the proposed realization in Fig. 1) can be defined as: The input phase shift, such as the net phase shift, after filtering of the second harmonic, yields zero phase, thereby simplifying the form of y(t) without the loss of generality. R/R1 was set to unity for simplicity reasons. If we assume that the input signal frequency is considerably higher than the filter cut-off frequency, the approximate final output can be rather successfully estimated with just a few terms of a Taylor series. Accordingly, the DC component of the output voltage of the proposed circuit, i.e. the apparent output RMS value of the input and the associated second-harmonic component of the output voltage resulting from the rapidly decreasing magnitudes of higher harmonic terms, such as the ripple (peak-to-peak ripple of the output), is expressed as: Vr RMS 1/16 V» 1 + {2a/midb )2 1/2 V true - RMS (15) ripple ^ + {2a/oidb)2 V rue-RMS J _avin., _ J 1in _ r, . lp _ JB2 _ R x1 R x1 JB3 _ Pn1a1Vi R in . J _ , i r Pp2a2Vc (17) out x1 R x2 In practice, the deviation from the ideal performance of the proposed RMS circuits is mainly due to the nonideal CDTA characteristics, which can be divided into two categories, i.e. parasitic gain effects and parasitic impedance effects. Fig. 3 illustrates the simplified equivalent circuit represented by the behavior of the non-ideal CDTA. Figure 3: The equivalent circuit of the non-ideal CDTA i 120 P. B. Petrovič; Informacije Midem, Vol. 45, No. 2 (2015), 117 - 124 A practical CCCDTA device can be modelled as an ideal CCCDTA with finite parasitic resistances and capacitances, as well as non-ideal current transfer gains and a transconductance inaccuracy factor of the CCCDTA. Fig. 3 shows a more sophisticated circuit model to represent the non-ideal CCCDTA device, where Rp, Rrf R, and R are the terminal parasitic resistances. R and x z ^ p Rn are the current-controllable parasitic resistances, where Rx and R, as typical values of the parasitic resistances, connected to the terminals x and z respectively, are in the range of several mega-ohms. Cx and Cz are the terminal parasitic capacitances from terminals x and z to the ground (the shunt output impedances (R//Cz and Rx//Cx) at terminals z and x, respectively). Typically, these parasitic capacitances are in the order of several pFs. In Fig. 3, ap represents the non-ideal current transfer gain from the p terminal to the z terminal of the CCCDTA, an denotes the non-ideal current transfer gain from the n terminal to the z terminal of the CCCDTA, and b is the transconductance inaccuracy factor from the z terminal to the x terminal of the CCCDTA. The typical values of the non-ideal current transfer gains and the transconductance inaccuracy factor an, ap, and b range from 0.9 to 1, with an ideal value of 1. It follows that: gPpVm 1 Rx1 2VT gWc, + 2Vt Rx2 z z z gPpVm % (1+sCR) Rxl «X (1 + sC«)+ R(1 + sC1xR1x ) R1x(1 + sCR) 0 VT VT ') R1x(1+ sCR)+ R(1+ sC1xR1x) R2. V' 2«x2 + P"20p2Rz + 2-T- Rx.2sRzCz\ 1x1 ' 1 1x 1x' «x1 ou ut V out I k (s) , «1x(1 + sCR) «X2 v£ 11 «1x(1 + sCR)+ R(1 + sQx«!,) «2 Vout OjPnVin Rx1__OpRz_g\Pp\V,n R2x(1 + sCR) 2Vt 1 + 0 a20p2out + sRC Rx1 R2x(1+ sCR)+ R(1+ sC2,R2x ) ' 2VtRx2 z z z 'i0„10p1apRz R2 x( + sCR) 2 V^- Rx2 + P"20p2Rz + 2 Rx2sRC R2x(( + sCR)+ R(( + sC2xR2x) R¿ V„u, = k2(s) where: kx(s ) = R2x(1+ sCR) Rx2 Vn R2x(1 + sCR)+ R(1 + sC2xR2x) r2 Vu a¡p2plapRz Vt Vt 2 + PaiPpi Rz + 2 RX2 sRzCz vout vout ki (s ) = - a\pnippiapRz VT Vt 2 -L-Rxi + PaiPpi Rz + 2^RxisRzCz ' ou (21) (22) Based on the circuit representation in Fig. 3 and the proposed RMS detector, and given the non-ideal CDTA characteristics, after applying the non-ideal equivalent circuit mode of the CCCDTA to the proposed circuit, tedious derivations lead to the following modified characteristic equation): ip — IB 2 - aPplVin . . =aPnlVin "> JB3 — " R x l R x l J — a2pp 2Vout — T Jr — — i (18) R B 4 x2 Vz =■ Rz (apip -pgm3Vz )= 1 + sRzCz V _ apRz z 1 + figm3 Rz + sRzC;P -i„ ; (19) gm3 Ir _ alPp2Vo out 2Vt 2VtR T*x2 The modified output current for the proposed RMS detector can be rewritten as: figmlVz- Rlx 1 + sCixRi. R1x (1 + sCR) z R1x (l + sCR) + R(1 + sC1xR1x . I 1 + sC1xR1x 1 + sCR 'ou' "| Rix i j. or. ». figmlVz- R2x R mlVz Rlx (1 + sCR) . V Rlx (1 + sCR) + R(1 + sClxRlx ) ' (20) 1 + sClxRlx 1 + sCR g " IB1. g " ¿BL gm1 1Vt .gml 1Vt The expressions above infer that the deviations in the transfer current gains are mainly the result of parasitic gains of the CDTAs. In order to improve the discrepancy to theoretical response, a high-performance CDTA with minor parasitic effects need to be employed. However, easy compensation for these deviations is possible by adjusting the values of IB1 and IBS, respectively. The output voltage of the proposed RMS detector is defined as: R' RR Vout =PgmiV 1 + sC ' R1 R = ~R + R~5 C' = C + Cix ; i = 1,2. (23) Given the non-ideal characteristics of MO-CDTA and CCCIIs, the implied filtering function implies that: H ' (s ) = ki (s ) 3db R1 s + 2 (24) where tô3db ;1 =1,2 R C Equation (24) suggests that filtering function, represented by the integral operators (equation (12), poses different characteristics in comparison with the ideal situation (equation (13), especially in the operators' behavior at higher frequencies. a R p-z I 0p1apRz K- > 0 I < 0 R R + 121 P. B. Petrovič; Informacije Midem, Vol. 45, No. 2 (2015), 117 - 124 4 Simulation results To confirm the given theoretical analysis, the proposed current-mode bipolar RMS circuit in Fig. 1 was simulated using the PSpice program. The CCCDTA and CCCIIs were realized by the schematic bipolar implementations given in Fig. 2 and [18], with the transistor model parameters of PR200N (PNP) and NP200N (NPN) of the bipolar arrays ALA400 from AT&T [19]. The supply voltages and the values of the bias currents were +V = -V = 1.2 V and iB1 = IBS = 100 ||A, Ip = 300 ||A respectively, whereas the input voltage was within the range of 0 ^ 500 mV. Fig. 4 shows the wave form of the signal at the output of the circuit shown in Fig. 1 (voltage Vout(t)), whereby the total power dissipation was 5.80 mW. Small power consumption of the proposed circuits results from the application of low-voltage current mode and transcon-ductance mode integrated circuits, with the use of bipolar transistor technology. Applying the current mode signal processing to solve the issues under consideration is a sensible approach to the problem. However, similar and sometimes lower power consumption can be achieved using CMOS technology instead of the bipolar one. The output ripple is always considerably greater than the DC error; therefore, filtering out the ripple can substantially reduce the peak error without applying a long settling-time penalty by simply increasing the averaging capacitor. The rippling of the output voltage generated in this manner is lower than in detector [16, 20], followed by the shorter feedback as well. Linearity may seem like an odd property for a device that implements a function involving two very nonlinear processes: squaring and square rooting. However, an RMS-to-DC converter has a transfer function, RMS volts in to DC volts out, that should ideally have a 1:1 transfer function. To the extent that the input to output transfer function does not lie on a straight line, the part is nonlinear. Fig. 5 (a) shows the DC transfer function nearing zero in the proposed circuit. Given that the dynamic range has nonlinearity level lower than 1dB, the dynamic range of the circuit proposed in this paper is around 35 dB. The proposed detector circuit involves higher linearity compared to the ones described in [16, 20, 21]. Temperature: 27.0 6 OmV -----i.....i..... ----- ------ ----- ..... ----- OV ■ 0 s 2n 0 V (R2 : 2) is 4n is 61 Time IS 81 is 10ms Figure 4: Time-domain response of the proposed RMS circuit for the sine input signal (Vm (t) = 10sin(2n/t)[mV] , f = 100 kHz, R = 180 W, C = 5 |F) 122 P. B. Petrovič; Informacije Midem, Vol. 45, No. 2 (2015), 117 - 124 a) Figure 5: a) DC transfer function near zero; b) Performance vs Crest Factor Crest Factor represents a common method of describing dynamic signal wave shapes. It is the ratio of the peak value relative to the RMS value of a waveform. For example, a signal with a crest factor 4 has a peak four times its RMS value. The proposed circuit performs very well with crest factor 4 or less, and responds with a reduced accuracy to signals with higher crest factors (Fig. 5 (b)). On the Fig. 5 (b) the "SCR waveforms" refers to the ideally chopped sine wave. High performance with crest factors lower than 4 can be directly attributed to the high linearity throughout the proposed solution. Fig. 4 shows the result for pure sinusoid signal. However, as an RMS detector, the circuit should have consistent response in signals with equal powers but various waveform shapes. Thus, the circuit was simulated using various input waveforms to verify the RMS power detection function. The simulated detector responded to the single-tone sinusoid, two-tone signals (with frequencies of 1 MHz and 3 MHz, and amplitudes of 100 mV and 50 mV), square-waves (duty cycle = 50%) and triangle waves given in Fig. 6. All the used signals were at 1MHz. The relative errors were lower than 0.04 % for Pin O -20 dBm. Given that the dynamic range has nonlinearity level lower than 1dB, the dynamic range of the circuit proposed in this paper is around 36 dB. The frequency responses, dynamic range of this bipolar detector, were all comparable and even superior to most diode detectors. The error in computing the effective value of the processed input voltage signal was lower than in [16, 22-26], whereby the circuit of the proposed detector, which includes a wider dynamic range, facilitates the realization more favourably than those described in [21, 23, 26, 27]. Similarly, it does not require a specific compensation procedure. Figure 6: The simulated response of a single detector to various waveforms. 5 Conclusion This paper reports on a new electronically controllable bipolar translinear RMS-to-DC converter. The proposed circuit employs two CCCIIs, one CDTA and two grounded passive elements, which is advantageous for integration point of view. The proposed circuit ensures high precision, wide bandwidth and high accuracy. The PSPICE simulation results were depicted, and they agree well with the theoretical anticipation. 6 Acknowledgments The author wishes to thank Ministry of Education and Science of the Republic of Serbia their support to this work provided within the projects 42009 and OI-172057. 123 P. B. Petrovič; Informacije Midem, Vol. 45, No. 2 (2015), 117 - 124 7 References 1. R. B. Northrop, Analog Electronics Circuits, Reading, MA: Addison-Wesley, 1990. 2. P. Heavey, and C. Whitney, "RMS measuring principles in the application of protective relaying and metering", in Proc. 57th Annu. Conf. Protective Relay Eng.(2004), pp. 469-489. 3. U. Pogliana, "Precision measurement of ac voltage below 20 Hz at IEN", IEEE Trans. Instrum. Meas., vol. 46, no. 2, pp. 369-372, 1997. 4. H. Germer, "High-precision AC measurements using the Monte-Carlo method", IEEE Trans. Instrum. Meas., vol. 50, no. 2, pp. 457-460, 2001. 5. W.-K. Yoon, and M.J. Deveney, "Power measurement using the wavelet transform", IEEE Trans. Instrum. Meas., vol. 47, no. 5, pp. 1205-1210, 1998. 6. M. Novotny, and M. Sedlacek, "RMS value measurement based on classical and modified digital signal processing algorithms", Measurement, vol. 41, no. 3, pp. 236-250, 2008. 7. True RMS' detector, National semiconductor Application Note AN008474, 2002. 8. DSCA33 ISOLATED True RMS Input Module, AN101 Dataforth Corporation, USA 2011. 9. High Precision, Wide-Band RMS-to-DC Converter, Analog Devices Application Note AD637, 2011. 10. J. Mulder, W. A. Serdijn, A. C. Woerd, and A. H. M. Roermund, "Dynamic translinear RMS-DC converter", Electron Lett., vol. 32, pp. 2067-2068, 1996. 11. J. Mulder, W. A. Serdijn, and A. H. M. Roermund, "An RMS-DC converter based on the dynamic translinear principle", IEEE Solid-State Circuits, vol. 32, pp. 1146-1150, 1997. 12. W. Surakampontron and K. Kumwachara, "A dual translinear-based RMS-to-DC converter', IEEE Trans. Instrum. Meas,. vol. 47, pp. 456-464, 1999. 13. R. F. Wasseneaar, E. Seevinck, M. G. van Leeuwen, C. J. Speelman, and E. Holle, "New Techniques for High-Frequency RMS-to-DC Conversion Based on a Multifunctional V-to-I Convertor", IEEE Jour. Sol. Sta. Circ., vol. 23, no. 3, pp. 802-815, 1998. 14. V. Milanovic, M. Gaitan, E. D. Bowen, N. H. Tea, and M. E. Zaghlou, "Thermoelectric power sensors for microwave applications by commercial CMOS fabrication", IEEE Elec. Dev. Lett., vol. 18, no. 9, pp. 450-452, 1997. 15. W. Tangsrirat W, T. Dumawipata, and W. Surakam-pontorn, "Multiple-input single output current-mode multifunction filter using current differencing transconductance amplifiers", Int J Electron Commun (AEU), vol. 61, pp. 209-214, 2007. 16. P. Petrovic, "RMS Detector of Multiharmonic Signals", ETRI Journal, vol. 35, no. 3, pp. 431-438, 2013. 17. P. Petrovic, and I. Zupunski, "RMS detector of periodic, band-limited signals based on usage of DO- CCIIs", Measurement, vol. 46, no. 9, pp. 3073-3083, 2013. 18. W. Tangsrirat, "Current-tunable current-mode multifunction filter based on dual-output current-controlled conveyors", Int. J. Electron. Commun. (AEU), vol. 61, pp. 528-533, 2007. 19. D. R. Frey, "Log-domain filtering: an approach to current mode filtering", IEE Proc Circuit Devices Syst., vol. 140, pp. 406-416, 1993. 20. B. Rumberg, and D. W. Graham, "A Low-Power Magnitude Detector for Analysis of Transient-Rich Signals", IEEE Jour. Sol. Sta. Circ., vol. 47, no. 3, pp. 676-685, 2012. 21. C. Yu, C. L. Wu, S. Kshattry, Y. H. Yun, C. Y. Cha, H. Shichijo, and K. O Kenneth, "Compact, High Impedance and Wide Bandwidth Detectors for Characterization of Millimeter Wave Performance", IEEE Jour. Sol. Sta. Circ., vol. 47, no. 10, pp. 2335-2343, 2012. 22. Y. Zhou, and M. Y. W. Chia, "A Low-Power UltraWideband CMOS True RMS Power Detector", IEEE Trans. on Mic. The. Tec., vol. 56, no. 5, pp. 10521058, 2008. 23. Q. Yin, W. R. Eisenstadt, R. M. Fox, and T. Zhang, "A Translinear RMS Detector for Embedded Test Of RF ICs", IEEE Trans. Instrum. Meas., vol. 54, no. 5, pp. 1708-1714, 2005. 24. K. Kaewdang, K. Kumwachara, and W. Surakamp-ontorn, "A translinear-based true RMS-to-DC converter using only npn BJTs", AEU-Intern. Jour.Elec. Comm., vol. 63, no. 6, pp. 472-477, 2009. 25. E. Farshidi, and H. Asiaban, "A new true RMS-to-DC converter using up-down translinear loop in CMOS technology", Analog Integrated Circuits and Signal Processing, vol. 70, no. 3, pp 385-390, 2012. 26. J. Koton, N. Herencsar, and K. Vrba, "Current and Voltage Conveyors in Current and Voltage-Mode Precision Full-Wave Rectifiers", RADIOENGINEERING, vol. 20, no. 1, pp. 19-24, 2011. 27. G. Klahn, "True RMS power detection with high dynamic range", in Proceeding IEEE MTT-S International Microwave Symposium Digest, (1999) vol. 4, pp. 1773 - 1776. Arrived: 18. 12. 2014 Accepted: 18. 02. 2015 124 Original scientific paper /midem lournal of M Informacije | Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 2 (2015), 125 - 131 A Novel Dual Ports Antenna for Handheld RFID Reader Applications Bo Wang, Yiqi Zhuang and Xiaoming Li Xidian University, Xi'an 710071, China Abstract: A compact antenna utilizes two ports to transmit and receive signal separately, different with conventional handheld RFID readers with single port. The proposed antenna can enhance receive sensitivity of handheld RFID readers, since the strong transmitting signal of reader with single port is usually highly coupled with weak receiving backscatter signal of tag. The antenna uses U-shape aperture coupled patch structure that occupies less volume and provides further space-saving efficiency. It is fed by two T-shape microstrip lines with rectangle stubs. The U-shape apertures are used to excite two orthogonal modes for dual polarized operation. The height of the air substrate is reduced to only 4 mm (0.032 wavelength) and the volume of antenna is 80 mmx80 mmx6.8 mm, which is easy to integrate in Handheld RFID readers. The measured results show -10 dB matching band and -25 dB isolation band from 2.2 GHz to 2.6 GHz and from 2 GHz to 2.6 GHz, respectively. The minimum isolation is -50 dB at 2.48 GHz. The antenna is suitable for applications in handheld RFID readers. Keywords: handheld RFID reader antenna; two ports; high isolation Nova dvovhodna antenna za ročne RFID bralnike Izvleček: Kompaktna antena ima dva vhoda za ločeno pošiljanje in sprejemanje signala, kar je različno od običajnih enovhodnih RFID bralnikov. Predlagana antena omogoča večjo sprejemno občutljivost, jas je močen oddajen signal enovhodnih bralnikov večinoma močno sklopljen s šibkim bralnim signalom. U oblika antene porabi manj prostora in omogoča dva ortogonalna načina delovanja za dvopolarizirano delovanje. Napajana je z dvemi T trakastimi linijami T oblike. Višina zračnega substrata je le 4 mm (0.032 valovne dolžine), velikost 80 mmx80 mmx6.8 mm, kar omogoča enostavno integracijo v ročne RFID bralnike. Meritve izkazujejo ujetost -10 dB in izolativnost pasu -25 dB v območju 2.2 do 2.6 GHz. Najmanjša izolativnost pri 2.48 GHz je -50 dB. Ključne besede: ročna RFID bralna antena; dva vhoda; visoka izolativnost * Corresponding Author's e-mail: wangbo_chen@126.com 1 Introduction Recently, the use of radio frequency identification (RFID) systems has become widespread in a variety of applications. Furthermore, handheld RFID readers have become very popular with users, particularly in applications that need to control large and heavy products, which are not easy to move. Handheld RFID readers reported are most single port with various structures [1-7]. RFID system consists of a tag and reader. The reader transmits a continuous wave (CW) signal and the tag backscatters transmission from the reader to send back data. In a backscatter reader, the transmitted CW signal may be directly coupled to the receiving part of the reader to drastically degrade the receiving sensitivity. The directly coupled CW signal is much larger than the backscatter signal from the tag, and the receiving part of the reader should detect the weak signal close to such a strong in-band interfere. Therefore, it is essential to separate transmitting and receiving parts with dual ports to achieve high isolation between them. Over the past years, dual ports reader antenna designs have received considerable attention. Among dual polarized antenna designs, aperture coupled microstrip patch antenna are the most suitable candidates for RFID application [8-17]. Aperture coupling is preferred to other feeding mechanisms of microstrip patch antenna due to its greater design flexibility, easier fabrication and lower cost. The antenna in [8] is utilizes a resonant annular ring slot and a T-shaped microstrip feedline to coupled with radiating patch, thus exciting dual orthogonal linearly polarized mode. The 2x2 array employing two symmetric dog-bone shaped coupling apertures is proposed to introduce dual linearly 125 © MIDEM Society B. Wang et al; Informacije Midem, Vol. 45, No. 2 (2015), 125 - 131 (a) i [stub-y| ta — i MM. ... m -19p^-> polarized mode [9]. A common method to increase ports isolation is combining branch line with antennas [10-12]. In [13], the antenna is designed with simple microstrip feedline to couple with radiating patch, but performs badly ports isolation with 20 dB. Majority of aperture coupled antennas apply the approach for addressing the requirement for low signal correlation is to increase the height of air substrate to achieve high ports isolation [14-17]. Since, the antenna is to be used with a handheld RFID reader, the size of the antenna in general should be around 100 mm length and width, and around 10 mm in thickness [3]. Therefore, most of open literatures including [8-17] described reader antennas with dual ports are comparable large to be mounted onto a handheld RFID reader, however they are suitable for stationary readers. The rest sections are arranged as followed: section II presents the detail design and principle of the aperture coupled patch antenna. The measured isolation and impedance matching of the proposed antenna are discussed in section III. In section IV, the parameters of stubs are simulated and analysis. Finally, the conclusions are given in section V. Figure 1: (a) Side view of the proposed antenna (b) configurations of the proposed antenna (c) top view of the fabricate antenna (d) bottom view of the fabricate antenna 2 Antenna structure and design In traditional designs of aperture coupled antenna, they are using various shapes of apertures in the ground plane. But these apertures technique requires high air layer in order to reduce the coupling between the two feeding lines, thus increase the volume of antenna inconvenient of integrated in the handheld RFID reader. So this paper applies a novel shape aperture to decrease the air layer. The configuration of a dual feeding aperture coupled square patch antenna is shown in Figure 1. It consists of two FR4 substrates with dielectric constant of 4.4 and loss tangent of 0.02. A single-layer substrate (56mmx56mmx1.2mm) is suspended 4 mm (0.03210 , 10 is free space wavelength) above the double-layer substrate (80mmx80mmx1.6mm). A square patch of 50mmx50mm is etched on the top side of the single-layer substrate. The overall volume of proposed antenna is 80mmx80mmx6.8mm. Two 50 Q modified T-shape microstrip lines with width of Wf = 3 mm and length of Ls = 43 mm are fed by separate port 1 and port 2 on the bottom side of double-layer substrate. The ground plane with U-slots is etched on the top side. The optimized values of stub width stub-x and length stub-y are 9 mm and 7 mm, respectively. The mathematical equations for calculating the Wf and Ls are as follows: 126 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 142 - 152 Where 1g is the dielectric wavelength, 10 is the air wavelength, er is the effective dielectric constant and h the substrate thickness. The substrate thickness, h, in this paper is 1.6 mm. The microwave signal is transmitting or receiving through feeding lines. Since the electromagnetic energy is along the feeding lines, the apertures are etched above feeding lines in the ground plane to couple energy to the patch. The square patch is served as a radiator to transmit or receive signals. The feeding line of port 1 excites horizon linear polarization, while that of port 2 excites vertical linear polarization. The two orthogonal polarizations decrease the coupling between two ports. Furthermore, this paper adds two stubs in the end of the feed lines to improve impedance matching and isolation. The current concentrates in the stubs, thus introduces capacitive couple to the square patch. The stubs increase effectively the isolation and impedance matching of proposed antenna. In addition, spurious radiation from the feeding lines is eliminated due to ground plane shielding, resulting in a very low cross polarization level. An aperture coupled antenna has a narrow bandwidth and poor isolation. Additional stacked patch is utilized to improve the bandwidth. The resonant frequency is mainly determined by the size of the square patch and the amount of coupling is dependent on the aperture length. The advantage of isolating the patch from the feeding line, better radiation pattern symmetry caused by the apertures and impedance matching was obtained through the use of aperture coupled patch antenna. For dual polarization radiation, a square patch is coupled to a pair of microstrip lines through U-shape apertures located beneath the patch, which improves the radiation characteristics of the antenna. The length and width of the aperture have been optimized for acceptable optimum isolation and return loss in the desired band. (a) port 1 Figure 2: Surface current distribution of proposed antenna on (a) Feeding lines (b) Ground plane To investigate the mechanism of mutual coupling between two ports, current distributions in different layers under the patch have been simulated with port 1 excited and port 2 terminated. Thus, we simulate the proposed antenna and get the surface current distribution at 2.45 GHz on the ground plane and feeding lines shown in Figure 2. Figure 2(a) demonstrates that the microwave energy concentrates in junction and stub of T-shape feed line. It can be seen in Figure 2(a) that surface current is flowing along the feeding line from port 1 to port 2, and gradually attenuated. The current around port 2 is greatly weaker than that around port 1. It is demonstrated an excellent isolation between port 1 and port 2. Figure 2(b) describes that currents concentrate in the specific region of ground plane which is above the stub of feed line and the other end of feed line without stub has less currents. It is concluded that stub has much effects on increasing currents. 127 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 142 - 152 Figure 2(b) shows the current around U-slot is decreasing much, attribution to impedance matching. Figure 3: Radiating patch surface current distributions for two different phase intervals. (a) 0° of port 1(b)90° of port 1(c) 0° of port 2(d)90° of port 2 To better understand the excitation behavior of the antenna, Figure 3 only shows the current distributions of phase 0° and 90° in port 1 and port 2, respectively, since those of 180° and 270° are equal in magnitude and opposite in phase of 0° and 90°. It is clearly displayed that surface currents cause linear polarization with time and two ports produces orthogonal fields. Due to the symmetrical structure of the proposed antenna, the Tx and Rx port can interchange to create linear polarization. Thus, the proposed antenna has dual linear polarization in one structure, orthogonal polarization improves isolation between two ports. 3 Performance of aperture coupled patch antenna Figure 4(a) shows the simulated and measured return loss of the antenna. The simulated return loss is less than -10 dB over the frequency band of 2.19 GHz to 2.58 GHz, while the measured return loss bandwidth is 400 MHz from 2.2 GHz to 2.6 GHz. It is clearly seen in Figure 4(b) that the measured -25 dB bandwidth of 2-2.6 GHz is obtained with minimum -50 dB at 2.48 GHz, corresponding to the simulated bandwidth of 510 MHz. The simulated and measured peak gain is illustrated in Figure 4(c). The antenna exhibits the measured peak gain from 1.5-3.1 dBi according to the frequency band of 2.4-2.48 GHz. The measured and simulated return loss, isolation and peak gain show good agreement. In microwave band, antenna gain is not as critical since active tags are commonly used in many applications. Figure 5 shows the measured radiation patterns at 2.45 GHz in the orthogonal XOZ (phi=0°) and YOZ (phi=90°) planes with angular step of 20°. The radiation pattern in YOZ plane is like bow-tie, but that in XOZ plane is unidirectional. 4 Parameters simulation and analysis The parameters simulation is carried out to provide antenna engineers with the information for antenna design and optimization. The length stub-x and stub-y of the stub are the prime parameters that determine the amount of power concentrated in the stub and coupled to the radiating patch such that effect the impedance matching and isolation of proposed antenna. One physical attribute of the antenna is independently varied, while the other parameter is kept unchanged. For clearly visualize, the finial optimized parameters are depicted with red line in each simulation figure. Software High Frequency Structure Simulation (HFSS) 128 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 142 - 152 (a) (b) (c) Figure 4: The characters of proposed antenna(a) Return loss (b) Isolation (c) Peak gain based on finite element method is used in this analysis. The finally optimized values are stub-x=9 mm, stub-y=7 mm. The dependencies of the return loss and isolation on stub-x are described in Figure 6. Figure 6(a) and Figure 6(b) describe that the bandwidth of return loss (<-10 dB) and isolation (<-25 dB) are expanding with decreasing length of stub-x. Figure 6(a) shows that the return loss bandwidth is decreasing and resonate frequency is shift to lower frequency as the length of stub-x increasing. It is observed that isolation is reducing dramatical- Figure 5: Measured radiation patterns of proposed antenna at 2.45 GHz (a) (b) Figure 6: Antenna characters for different values of stub-x. (a)Return loss (b)Isolation 129 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 142 - 152 ly as the length of stub-x increasing in Figure 6(b). The minimum of isolation with stub-x of 9 mm is -60 dB at 2.45 GHz. The stub length stub-x determines the coupling strength between the feed line and ground. So it has an impact on both of the return loss and isolation. 6 Acknowledgment The paper supported by "the Fundamental Research Funds for the Central Universities" (No. JB141107). (a) (b) 7 References Figure 7: Antenna characters for different values of stub-y. (a)Return loss (b)Isolation In Figure 7, the effects of various dimensions of stub-y on return loss and isolation are shown. The variation of stub-y affects slightly on return loss, but severely on isolation value. The bandwidths of return loss and isolation are almost same, but optimized value of stub-y of 7 mm shows the best isolation in 2.4 GHz-2.48 GHz. 5 Conclusion A compact antenna with two ports is designed for Handheld RFID reader to enhance receive sensitivity. It is low cost and easy to integrate in the Handheld RFID reader for its height of 6.8 mm. The proposed antenna presents impedance matching of -10 dB and isolation of -35 dB. The -10 dB matching band and -25 dB isolation band cover from 2.2 GHz to 2.6 GHz and from 2 GHz to 2.6 GHz, respectively. 1. 2. 3. 4. 5. 6. 7. 9. 10. A.T. Mobashsher and R.W. Aldhaheri, "An improved uniplanar front-directional antenna for dual-band RFID readers," IEEE Antennas and Wireless Propagation Letters, vol. 11, pp. 14381441, 2012. J.H. Bang, B.O. Chinzorig, H.S. Koh, E.J. Cha and B.C. Ahn, "A small and lightweight antenna for handheld RFID reader applications," IEEE Antennas and Wireless Propagation Letters, vol. 11, pp. 10761079, 2012. S.X Ta, H.S. Choo and I. Park, "Planar, lightweight, circularly polarized crossed dipole antenna for handheld UHF RFID reader," Microwave and Optical Technology Letters, vol. 55, no. 8, pp. 18741878, August 2013. W.S. Chen and Y.C. Huang, "A Novel CP Antenna for UHF RFID Handheld Reader", IEEE Antennas and Propagation Magazine, vol. 55, pp. 128-137, 2013. P.V. Nitikin and K.V.S. Rao, "Compact Yagi Antenna for Handheld UHF RFID Reader", IEEE International Symposium Antennas and Propagation an CNC-USNC/URSI Radio Science Meeting, 2010. H.T. Hsu and T.J. Huang, "A Koch-Shaped Log-Periodic Dipole Array (LPDA) Antenna for Universal Ultra-High-Frequency (UHF) Radio Frequency Identification (RFID) Handheld Reader", IEEE Transactions on Antennas and Propagation, vol. 61, pp. 4852-4856, 2013. Y.F. Lin, H.M. Chen, C.H. Chen and C.H. Lee, "Compact shorted inverted-L antenna with circular polarisation for RFID handheld reader", Electronics Letters, vol. 49, pp. 442-444, 2013. C.Y.D. Sim, C.C. Chang and J.S. Row, "Dual-feed dual-polarized patch antenna with low cross polarization and high isolation", IEEE Transaction on Antennas and Propagation, vol. 57, pp. 33213324, 2009. S.K. Padhi, N.C. Karmakar and C.L. Law, "Dual polarized reader antenna array for RFID application", IEEE Antennas and Propagation Society International Symposium, vol. 4, pp. 265-268, 2003. H.W. Son, J.N. Lee and G.Y. Choi, "Design of compact RFID reader antenna with high transmit/receive isolation", Microwave and Optical Technology Letters, vol. 48, pp. 2478-2481, 2006. 130 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 142 - 152 11. X.Z Lai, Z.M. Xie, Q.Q. Xie, X.L. Cen, "A dual circularly polarized RFID reader antenna with wideband isolation", Antennas and Wireless Propagation Letters, vol. 12, pp. 1630-1633, 2003. 12. Y.K. Jung and B. Lee, "Dual-Band Circularly Polarized Microstrip RFID Reader Antenna Using Metamaterial Branch-Line Coupler", IEEE Transaction on Antennas and Propagation, vol. 60, pp. 786791, 2013. 13. M.T. Zhang, Y.B. Chen, Y.C. Jiao and F.S. Zhang, "Dual Circularly Polarized Antenna of Compact Structure for RFID Application", Journal of Electromagnetic Waves and Applications, vol. 20, pp.1895-1902, 2006. 14. B. Li, Y.Z. Yin, Y. Zhao, Y. Ding and R. Zou, "Dual-polarised patch antenna with low cross-polarisation and high isolation for WiMAX applications", vo. 47, pp. 952-953, 2011. 15. K. Zhang, F.G. Zhu and S. Gao, "Differential-fed ultra-wideband slot-loaded patch antenna with dual orthogonal polarization", Electronics Letters, vol. 49, pp. 1591-1593, 2013. 16. C.H. Weng, H.W. Liu, C.H. Ku and C.F. Yang, "Dual circular polarisation microstrip array antenna for WLAN/WiMAX applications", Electronics Letters, vol. 46, pp. 609-611, 2010. 17. J.J. Xie, Y.Z. Yin, J.H. Wang and X.L. Liu, "Wideband dual-polarised electromagnetic fed patch antenna with high isolation and low cross-polarisation", Electronics Letters, vol. 49, pp. 171-173, 2013. Arrived: 29. 12. 2014 Accepted: 09. 02. 2015 131 Original scientific paper /midem Journal of M Informacije | Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 2 (2015), 132 - 141 The optimal useful measurement range of an inductive displacement sensor Snezana M. Djuric, Nikola M. Djuric, Mirjana S. Damnjanovic Faculty of Technical Sciences, University of Novi Sad, Serbia Abstract: The purpose of this paper is to find the optimal useful measurement range of an inductive displacement sensor with meander type coils. The optimal useful measurement range was numerically examined using a developed model for impedance calculation. The sensor is composed of two sensor elements, with meander-type inductive coils. Each coil has five turns. With these two sensor elements, it is possible to detect normal displacement (using only one sensor element) and tangential displacement (using both sensor elements). Numerical results showed that the optimal useful measurement range was obtained when the gap of 0.23 mm was inserted in one of the coils of sensor element detecting normal displacement. Experimental results confirmed theoretical predictions. The paper demonstrates developing of a model for impedance calculation of an inductive displacement sensor. With this model, it was possible to determine numerically the optimal useful measurement range of the sensor. Keywords: inductive coils; inductance calculation; measurement range; displacement Optimalno uporabno območje induktivnega senzorja premika Izvleček: Namen članka je poiskati uporabno merilno območje induktivnega senzorja premika z meandrasta tuljavami. Optimalno področje je bilo numerično določeno z razvitim modelom za izračune impedanc. Senzor je sestavljen iz dveh senzorskih elementov z meandrasto tuljavo. Vsaka tuljava ima pet zavojev. S tema dvema senzorskima elementoma je mogoče zaznati običajne premike (z uporabo le enega senzorja) in tangencialne premike (pri uporabi obeh senzorjev). Numerični izračuni optimalnega merilnega območja so pri uporabi 0.23 mm reže v enem senzorju pri detekciji normalnega premika. Meritve potrjujejo teoretične izračune. V članku je predstavljen razvoj modela, ki omogoča določitev optimalnega merilnega območja senzorja. Ključne besede: induktivne tuljave ; izračuni induktivnosti; merilno območje; premik * Corresponding Author's e-mail: snesko@uns.acis 1 Introduction The planar inductive coil sensors have a large scale of application. They can be applied in the inspection of printed circuit boards using eddy-current testing (ECT) technique [1, 2, 3]. The development and comparison of different planar fluxgate magnetic sensor structures realized in PCB technology has been reported in [4]. The planar inductive sensor with planar coil and magnetic core can detect the cracks on nonmagnetic and magnetic specimens [5]. The linear displacement sensor based on the inductive concept using meander coil and pattern guide is used to detect the displacement of moving part on linear machines [6]. The effect of inductive coil shape (meander, square, and circle shape with different turn number of inductive coils) on the sensing performance of a linear displacement sensor has been analyzed in [7]. A planar inductive coil of circle shape is used in an eddy-current sensor for high resolution displacement detection with reduced temperature coefficient [8]. An eddy current senor with rectangular sensing element, printed by ink-jet technology on a flexible substrate, for displacement application, has been presented in [9]. An inductive sensor for distance measurement employs the principle of magnetic coupling between two coplanar coils [10]. Sensors, fabricated in PCB technology, with planar meander and interdigital coils in series and parallel combination, are used for measurement and monitoring of environmental parameters [11, 12]. In our previous papers [13, 14], design, modeling, and operating principle of an inductive displacement sen- 132 © MIDEM Society S. M. Djuric et al; Informacije Midem, Vol. 45, No. 2 (2015), 132 - 141 sor, with meander-type inductive coils, was presented. The sensor is composed of two sensor elements. Each sensor element presents a pair of meander coils. One sensor element detects normal displacement, whereas the other sensor element detects tangential displacement. Sensor element for normal displacement can be used independently, whereas the sensor element detecting tangential displacement is used in combination with the element detecting normal displacement. The sensor element that detects normal displacement, with inserted gap g in the stationary coil, is presented in Figure 1. The width of the segments in the stationary coil is w1 = 1.52 mm, in the moving (short-circuited coil) the width of the segments is w2 = 0.51 mm. The distance between axes of two neighboring segments is p = 1.78 mm and the number of turns is five. The gap width influences the useful measurement range of the sensor. The useful measurement range of the sensor is near y = 0 (zero position - the axes of the segments of the stationary coil are exactly above the axes of the segments of the moving coil.) In this range the input inductance of the sensor element detecting normal displacement is invariant versus tangential displacement (y-direction), thus the element detects only normal displacement. The goal of this paper was to examine the optimal useful measurement range of the sensor. Figure 1: The sensor element, detecting normal displacement, with inserted gap in the stationary coil. 2 Model of the sensor Each sensor element can be described with its equivalent circuit as it is shown Figure 2, where R1 and R2 are resistances of the stationary coil (Coil 1) and moving coil (Coil 2), L1 and L2 are the self-inductances of Coils 1 and 2, respectively [15]. Figure 2: Equivalent electrical circuit of sensor element. The input impedance of sensor element is equal to the input impedance of the equivalent circuit: U ! = R_ ! + jaLx I 1 + jrnMn _ 2 R12 + j^L112 + jwM!2 _ 1 = 0 jaMu j —2~n ■ T — 1 R2 + ja>L2 I, = — 2.1 2.2 U1 = (R + joLi) 1x - joM1: jrnMi 12 R2 + jœL2 ZIN = RIN+J®LIN where the total resistance of the impedance is 1i 2.3 2.4 Rnr — R + IN co2 R2 LxL2 k2 r22 + c2 LL2 2.5 and the total reactance of the impedance is R22 + a>2 LL2 (l - k2 ) cOLin = (oLx R¡ + c2 L22 2.6 Mutual position between the coils introduces magnetic coupling between coils. The coupling coefficient k is k- M 12 ■\JL1L2 2.7 where M12 is the mutual inductance between Coils 1 and 2, for specific mutual position, and L, and L2 are the self-inductances of Coils 1 and 2, respectively. The mutual inductance changes according to displacement between Coils 1 and 2. It can be assumed that the current of conductive segments is uniformly distributed over the whole cross-section because of relatively low working frequency (1 MHz). At this relatively low frequency, the skin and proximity effects are negligible (S = ^pCu /nf^o , where pCu = 1.72 x 10-8 Q is electrical resistivity of copper Cu, f the working frequency and IJ0 = 4n x 10-7 H/m the permeability). The concept of the partial inductance was applied as to calculate parameters L1, L2 and M12. 133 S. M. Djuric et al; Informacije Midem, Vol. 45, No. 2 (2015), 132 - 141 3 Inductance calculation 3.1 Self-inductance calculation of the meander-type coil with inserted gaps In order to optimize numerically the useful measurement range of the sensor, the mathematical model that describes the influence of the gap on the self-inductance of the coil shown in Figure 3, was developed. Figure 3: The stationary coil with inserted gap in conductive segments parallel to x-axis, lx segments. The equivalent circuit of the coil with inserted gaps is shown in Figure 4. The gaps were inserted symmetrically in the segments, hence it was assumed that the current in all bars was equal, and that phase shift did not change. Figure 4: Equivalent circuit of the stationary coil with inserted gap in segments parallel to x-axis, lx segments M31. Un = U Rx11 + ULx11 + URx 21 + ULx 21 + ••• + Urxn1 + Ulxn 1 + URy1 + ULy1 + + URy 2 + ULy 2 + ••• + URy ( N-1) + ULy ( N-1) 3.1 Because of the coupling between bars, (2) follows: URxU + ULxU = {rxU + jœLxll - jœMillï2l) + +jœM(11)(31) -.....- jœM[nlN0 + jœM[n)[n) - 3.2 - j®M( 11)( 22) +.....- j®M(11)( N2))• where Rx11 is the resistivity of the left bar in lx. segment, L .. is the partial self-inductance of the bar, M........ and x11 (11)(/1) M{m{j2) are the mutual inductances between bars, N is the number of lx segments (N = 10). The mutual inductance between bars is positive if the current flows through the bars in the same direction and negative if the current flows through the bars in the opposite direction. Summing voltages in all segments, the input voltage UIN is U1N = (Rxii + Mdi - jœM(11)(21) + jaM(um -... - j wMmm) + jœM(11)(12) -- j®M(n)(22) + ...- j®M(H)(N2)) • Y + (Rx21 + j®hll - j®M(2,)(„) -- jœM(21)(31) + ... + jœM(21)(NI) - jœM(21)(12) + jœM(21)(22) - ... + j®M(21)(N2)) ' ^T + + (RxN1 + j®LxN1 j®M(N1)(11) + jOM(N,)(21) - ... - jWM(N,)((N-1)1) - jaM(N,)(12) + 3.3 + jaM(Nl)(22) - ... + j®M(N!)(N2)) • I=N + (Ry1 + jaLyl + jaM(y1)(y2) + + jaM(yl)(y3) + ... + j®M(y1)(y(N-1))) • LlN + (Ry2 + j^yl + jaM(y2)(yl) + + jaM(y2)(y3) + ... + j®M(y2)(y(N-1))) • =IN + + (Ry(N-l) + jaLy(N-l) + jaM(y(N-1))(yl) + j®M(y(N-1))(y2) + ... + j®M(y(N-1))(y(N-2))) • =IN Finally, it is obtained that the input impedance of the stationary coil with inserted gap (Figure 3) is Z _ Un _1z + Z — ln j ~ ^ —y, Ljn 2 3.4 where Zx is the impedance of the segments parallel to x-axis (lx segments) and Zy is the impedance of the segments parallel to y-axis (ly segments). The impedance Z is given by The input voltage U,M is 134 S. M. Djuric et al; Informacije Midem, Vol. 45, No. 2 (2015), 132 - 141 Zx = I (Rxil +ja>( Lxil + I (-1)j Minm + i=i j=i i* j 3.5 N +I(- iy+jm (ii)(j 2))) j=1 where Rxi1 is the resistivity of the left bar in T segment, L is the partial self-inductance of the bar, M and xi1 r (i1)(/1) M,.,.,.,. are the mutual inductances between bars. (i1)(j2) The impedance Z is given as N-1 Zy = ï(Ryi + ML, + S M ( y,)) 3.6 i=i j=i i *j N-1 where Ryi is the resistivity of l segments, is the mutual inductance between l segments. The loop envelops the left bars of lx segments, yet the same result is obtained if the loop envelops the right bars of lx segments. Further extension of the useful measurement range could be achieved with inserting more narrow gaps in each segment parallel to x-axis. However, the number of gaps and their width are the consequence of the chosen geometrical parameters and limitations of the chosen technology for sensor prototypes. The geometrical parameters of meander coils were determined as a compromise between the value of the inductance that could be measured by an electrical interface for signal processing and the size of the meander coils. The mathematical model, which describes influence of two gaps inserted in each segment parallel to x-axis, has been presented in Figure 5. Figure 5: Equivalent circuit of the stationary coil with two inserted gaps in segments parallel to x-axis, lx segments. It was assumed that the current intensity in all bars was equal, and that phase shift did not change. The input voltage UIN is: u„ = (R ill + j^xll + jQM(11)(12) + jQM(11)(13) -jaM(11)(21) -jaM(11)(22) -jaM(11)(23) + + j®M(11)(31) + j®M (11)(32) + j®M(11)(33) - jQM (11)(41) - jQM (11)(42) - jQM (11)(43) + - jaM(11)(N1) - jaM(11)(N2) - jaM(11)(N3)) ' =N + R + jaL,1 + jaM(,1)1,2) + jaM(,!)(,3) + 3.7 + ... + jOM (,i)( ,n-i) + R,2 + jOl,2 + j®M(,2)(,l) + j°M (,2)(,3) + ... + j®M(,i)(,N-i) + + R,N-1 + M,V-1 + jaM(,1)(,M-1) + +jaM(,2)(,N-1) + ... + jaM(,N-1)(,N-2) ) ' L„ Finally, it is obtained that the input impedance of the stationary coil with two inserted gaps is: 7 = Ki^ =1 z + Z, In 3 -X —y ' 3.8 where Zx is the impedance of the segments parallel to x-axes (lx segments) and Zy is the impedance of the segments parallel to y-axes (ly segments). The impedance Zx is given as N N . Zx = E(Rxil + ja>Lm + jrnE(-1)JM(nm - i=l j=1 i* j 3.9 « « +jaï (-l)j M(n)(j 2) + jvï (- l)j M (i1)(j3)) Zx = i (Rxil + joLxn + jai (- l)i+JM(aU) + i=1 N 3 J=1 3.10 +jaii(-1) JM, j=1k=2 (il)(jk ) where Rx11 is the resistivity of a bar in lx segment, Lx11 is the partial self-inductance of the bar, M(11)(j1) and M(i1)(jk) are the mutual inductances between bars. The impedance Zy is given as N—1 N—1 zy = E (Ryi+ jvLyi+ jrnE M{yiM) 3.11 i=1 j=1 i # j where Ry. is the resistivity of ly segments, M is the mutual inductance between l segments. 3.2 Calculation of the self-inductances of the coils and L2 The sensor was modeled using the concept of the partial inductance. Planar meander-type coils were partitioned into constituent segments. Each partitioned segment was partitioned additionally into a certain number of filaments [13, 14]. Planar meander coils are partitioned into constituent conductive segments as it is shown in Figure 6. There are 19 conductive segments in meander coils. 135 S. M. Djuric et al; Informacije Midem, Vol. 45, No. 2 (2015), 132 - 141 Figure 6: Arrows show current flow direction in the coil. The resistances R1 and R2 of Coils 1 and 2, respectively, are calculated by Equations 3.12 and 3.13 R =Z Ri i=1 R2 =ÎR. i=l 3.12 3.13 where h is the number of segments in a coil (h = 19), and R. is the resistance of a segment (parallel to x- or y-axes). The resistance R. is given by Equation 3.14 l R = p Cu w ■ t 3.14 where pCu is resistivity of copper, l is the length of the segment, w is the width of the segment and t is the thickness of the segment (copper layer). The self-inductances of meander Coils 1 and 2 (L1 and L2) can be calculated as a sum: h h h e l ±ee| M. i=i i=\ j=1 j*i i h hh l =el ±eemj i=1 i=i j=i ' j*i 3.15 3.16 where Li is the partial self-inductance of each straight segment (Figure 6) and M.. is the mutual inductance between each pair of conductive segments (Figure 6), h is the number of partitioned segments in meander coils (h = 19). The mutual inductance is positive if current vectors in segments i and j are in the same direction or negative if current vectors are in opposite directions. As it was reported in [16, 17], each partitioned segment was additionally partitioned into a certain number of elementary filaments (l1Xi, l1W, l2Xj, l21j,...), having small, rectangular cross sections, as shown in Figure 7. This was done in order to achieve better precision in calculation, because the segment separation dimensions are not larger than the cross sectional dimensions for all geometries considered. (As it was previously reported, the distance between axis of two neighboring segments is p = 1.78 mm whereas, in one of the structures, cross sectional dimension is 1.52 mm.) Figure 7 shows the general case of segments partitioning, with overlapping in the corners. In reality, dimensions of the overlapping in the corners are too small to introduce significant error. Figure 7: A part of sensor element partitioned into filaments. Filaments parallel to x-axis are lx long and dx wide, whereas filaments parallel to y-axis are ly long and dy wide; D is the width of conductive segments. The number of filaments is such that dimensions of the cross section of each filament are less than a skin depth ( S = ^pCu , where pCu = 1.72 x 10-8 Q is electrical resistivity of copper Cu, f the working frequency and ij0 = 4n x 10-7 H/m the permeability), at the highest frequency of interest [17]. Each segment was partitioned into 24 filaments as to fulfill this condition and as compromise between complexity and accuracy of the model. The partial self-inductance L. is the sum of the mutual inductances between all pairs of elementary filaments within segment i: Figure 8: The partial self-inductance calculation of a conductive segment i. 136 S. M. Ddjuric et al; Informacije Midem, Vol. 45, No. 2 (2015), 132 - 141 L =HMk, k=1l=1 l # k 3.17 where n is the number of filaments in segment i and Mkj is the mutual partial inductance between filaments k and j within segment i, as it is shown in Figure 8. The mutual inductance M.. (Equations 3.15 and 3.16) is the sum of mutual inductances between all pairs of filaments from segments i and j: 1 m m2 Mij = X X Mkl m1 ■ m2 k=i i=i 3.18 where m1 and m2 are the number of filaments in segments i and j, respectively, and Mkl is the mutual inductance between filaments k and l from segments i and j, as it is shown in Figure 9. The numbers of filaments in conductive segments are identical, n = m1 = m2 = 24. Figure 9: The mutual inductance calculation between two segments. 3.3 The mutual inductance M calculation The mutual inductance between Coils 1 and 2 (M12) is calculated in a similar manner, as it is presented in Figure 10. Figure 10: Mutual inductance M12 calculation. The mutual partial inductance is calculated between each pair of elementary filaments, which belong to dif- ferent coils. M12 is the sum of all mutual partial inductances between filaments from Coils 1 and 2. Taking into account, that Coil 2 physically moves with respect to Coil 1, the distance between filaments of Coils 1 and 2 changes regarding the displacement. The distance between filaments is an important parameter for the mutual inductance calculation, as well their mutual position. While calculating the mutual partial inductances between filaments from different coils different equations were applied [18], depending on the mutual position between filaments, as it can be seen in Figure 11. The formula for mutual inductance of two parallel filaments of equal length (l) and distance (d) is M = M (l, d ) = l 2n / ln / — +. d 1 I' 1 + ^ d2 v 3.19 i d2 d 1 + -r + — l2 l In the model of the sensor, Coil 2 moves with respect to Coil 1 in y-z plane and it rotates around x- and y- axis, as well. In the case of rotation, filaments in Coil 2 can be placed in any desired position. Therefore, equations 3.20 - 3.22 [18] were applied to calculate the mutual partial inductance between filaments placed in any desired position, as it is shown in Figure12. Based on this model of the sensor, in-house software was specifically developed for resistance, inductance, and impedance calculation of the sensor. The software calculates variation of these parameters versus displacement in y-z plane and versus small rotations of the moving coil around x- and y-axes. M 0.01cOSS = 2[(tt +1) • arctg R1 + R2 arctg m + (v + m) • arctg l l Ri + R4 R3+R4 - v • arctg Qd ]-- R2 + R3 sm s 3.20 in which Q = arctg' d2 cos e + (p. +1 )(v + m) sin2 e\ dRx sin e d2 cose + (p, +1 )v sin2 e] dR2 sine arctg + arctg d2 cose + pvsin2 e \ 3.21 dR3 sin e - arctg- d2 cos e + p(v + m) sin2 e | dR4 sin e m 137 S. M. Djuric et al; Informacije Midem, Vol. 45, No. 2 (2015), 132 - 141 Figure 11: Calculation of the mutual inductance between filaments. Calculation depends on the mutual position between filaments. Parameters l, m, p, v, and d are given. The relations 3.22 calculate the distances Rv R2, R3, and R4 Rf = d2 + (0 +l)2 + (v + m)2 -2(0 + l)(v + m)cose R22 = d2 + (0 +1)2 + v2 -2v(0 +1)cose R2 = d2 + uU + v2 -2uvcose 3.22 R42 = d2 + u0 + (v + m)2 -2u(v + m)cose 4 Results and Discussion Figure 12: Two filaments placed in any desired position. Simulated values of the input inductance versus y-displacement for different gap widths g = [0.15, 0.18, 138 S. M. Djuric et al; Informacije Midem, Vol. 45, No. 2 (2015), 132 - 141 0.20, 0.23, 0.25, 0.28, 0.30, 0.33, 0.36, 0.38] mm and the most critical normal distance between coils z = 0.1 mm are presented in Figure 13. It can be observed in Figure 13, that invariance of the input inductance versus y-dis-placement near y = 0 is obtained for gaps g =0.23 mm red line, g = 0.25 mm blue line, and g = 0.28 mm green line. The input inductance changes versus tangential displacement if the gap width is increased above 0.28 mm. If the gap width is decreased below 0.23 mm, then the useful measurement range is narrower, as it can be seen in Figure 13. The variation of the input inductance versus y-displacement for gap widths g = 0.23 mm (red solid line), g = 0.25 mm (blue dash line), and g = 0.28 mm (green dot line) is presented in Figure 14. It can be seen that there is a slight variation of the input inductance in the useful measurement range (near y = 0) for gap widths g = 0.25 mm and g = 0.28 mm. Thus, it was chosen that the optimal useful measurement range was obtained for gap width g = 0.23 mm. The gap width g = 0.23 mm provides invariance of the input inductance versus tangential displacement in the useful measurement range, thus making sensor element good for detecting normal displacement. The sensor element for detecting normal displacement with the gap width g = 0.23 mm was fabricated, characterized, and compared with the sensor element without the gap. Fabricated prototypes are presented in Figure 15. Sensor prototypes were electrically tested by Impedance Analyzer HP4194A, at the working frequency of 1 MHz. Characterization procedure is similar as it was described in [19]. Figure 13: The sensor element that detects normal displacement: Simulated values of the input inductance variation versus y-displacement for different gap widths, and the most critical normal distance between the coils z = 0.1 mm. The displacement dependence of the input inductance Lin in y-z plane for the sensor element detecting normal displacement without the gap is presented in Figure 16 and with the gap g = 0.23 mm in Figure 17. The symmetry and periodicity of the input inductance characteristics can be observed in Figures 16 and 17. The difference between local minimums (L,., ) and maximums v INmin' Figure 14: The sensor element that detects normal displacement: Simulated values of the input inductance variation versus y-displacement for gap widths g = 0.23 mm, g = 0.25 mm, and g = 0.28 mm, and the most critical normal distance between the coils z = 0.1 mm. Figure 15: The sensor element: a) The stationary coil of sensor element that detects normal displacement without the gap, b) The stationary coil with inserted gap g = 0.23 mm, c) The stationary coil of sensor element that detects tangential displacement, and c) The moving (short-circuited) coil. Figure 16: The sensor element detecting normal displacement: The displacement dependence of the input inductance LN in y-z plane. (LINmax) decreases as the moving coil moves from the stationary coil in y-z plane. Eventually, the tendency of the input inductance characteristic is to achieve the self-inductance of the stationary coil. Figures 16 and 17 present displacement-input inductance dependence in the nearly whole y-displacement range. However, the useful measurement range is near y = 0 for sensor element detecting normal displacement. 139 S. M. Djuric et al; Informacije Midem, Vol. 45, No. 2 (2015), 132 - 141 Figure 17: The sensor element detecting normal displacement with inserted gap g = 0.23 mm: The displacement dependence of the input inductance LN in y-z plane. \ I I / V Ï I 1 ♦ z = 0.1 mm No Gap fcPtEP —o— z = 0.1 mm Gap ¿r # Displacement y(mm) Figure 18: The sensor element detecting normal displacement without the gap and with the gap g = 0.23 mm: The displacement dependence of the input inductance LN in y-z plane. Comparison of the input inductance characteristic LN for the sensor element detecting normal displacement without the gap and with the gap, for the most critical normal distance between the coils z = 0.1 mm, is presented in Figure 18. It can be observed that the invariance of the input inductance versus y-displacement is achieved in the useful measurement range (near y = 0), as well near other local minimums. Comparison of the useful measurement range between the sensor elements without the gap and with the gap is presented in Figure 19. Displacement step was approximately 0.0635 mm as to accurately analyze the useful measurement range. It can be seen in Figure 19, that the input inductance of the sensor element with the inserted gap is almost invariant near y = 0 in comparison with the input inductance of the sensor element without the gap. The smallest change in the position of a moving coil that can be detected depends on the signal processing interface. In case of measuring with Impedance analyzer, which can detect transition in the range of 10 micro ohms, resolution of the sensor can be estimated to 0.1 ^m. Further, resolution of the sensor is possible to improve with adjusting the design to meet the given applications. The useful measurement range of the sensor element detecting normal displacement, the worst- Î 84 _r 8 82 is I 80 - * z = 0.1 mm No Gap o z = 0.1 mm Gap ° n D □ □ □ □ □ □ □ □ : * * * ^ -A- £ ■U A v v •! Sistem na osnovi Zynq za izluscitev razvrščenih podsklopov iz obsežnih podatkovnih sklopov Izvleček: Članek predstavlja programsko/strojno zasnovo sistema za izluščitev največjih in najmanjših razvrščenih podsklopov v obsežnih podatkovnih sklopih. Predstavljeni sta dve metodi, ki omogočata visoko stopnjo vzporednosti in implementacijo sistema v tržnem ZYNG-7000 mikročipu na osnovi programabilne logike Xilinx sedme generacije. Metode temeljijo na vzporedni in enostavno razširljivih omrežjih ter omogočajo izluščitev podsklopov s hitrostjo blizu hitrosti prenosa podatkov. Rezultati dokazujejo veliko pohitrenje programsko/strojnih rešitev v primerjavi s programskimi rešitvami. Ključne besede: processing system; programmable logic; system-on-chip; sorting networks; hardware/software co-design * Corresponding Author's e-mail: skl@ua.pt 1 Introduction All Programmable Systems-on-Chip (APSoC) from Zynq-7000 family [1,2] combine on the same microchip the dual-core ARM® Cortex™ MPCoreTM-based highperformance processing system (PS) with advanced programmable logic (PL) from the Xilinx 7th family and may be used effectively for the design of hardware accelerators in such areas as hard real-time systems [3], image [4] and data [5] processing, satellite on-board processing [6], programmable logic controllers [7], driver assistance applications [8], wireless networks [9], and many others [2]. Interactions between the PS and PL are supported by different interfaces and other signals through over 3,000 connections [1]. Available four 32/64-bit high-performance (HP) Advanced extensible Interfaces (AXI) and a 64-bit AXI Accelerator Coherency Port (ACP) enable fast data exchange with theoretical bandwidths shown in [1]. Zynq APSoC design flow includes the development of hardware in the PL [10] (supported by available Xilinx IP cores) and software in the PS [11] for different types of applications such as standalone (bare metal) [12], running under an operating system (e.g. Linux) [12] and combined [13]. Hardware implemented in the PL can be the same for standalone and Linux applications but software programs use different functions and interaction mechanisms [12]. Since bare metal projects are generally faster, we will consider them as a base which does not exclude using the results for projects running under operating systems. The latter may benefit from available drivers and other support [12]. Since both 142 © MIDEM Society V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 142 - 152 types of projects can run in parallel in different cores [13] they may be combined if required. Many electronic, environmental, medical, and biological applications need to process data streams produced by sensors and measure external parameters within given upper and lower bounds (thresholds) [14]. Let us consider some examples. Applying the technique [15] in real-time applications requires knowledge acquisition obtained from controlled systems (e.g. plant). For example, signals from sensors may be filtered and analysed to prevent error conditions (see [15] for additional details). To provide more exact and reliable conclusion a combination of different values need to be extracted, ordered, and analysed. Similar tasks appear in monitoring thermal radiation from volcanic products [16], filtering and integration of information from a variety of different sources in medical applications [17] and so on. Since many systems are hard real-time, performance is important and hardware accelerators may provide significant assistance for software products. Similar problems appear in so-called straight selection sorting (in such applications where we need to find a task with the shortest deadline in scheduling algorithms [18]), in statistical data manipulation and data mining (e.g. [19-22]). To describe one of the problems from data mining informally let us consider an example [19] with analogy to a shopping card. A basket is the set of items purchased at one time. A frequent item is an item that often occurs in a database. A frequent set of items often occur together in the same basket. A researcher can request a particular support value and find the items which occur together in a basket either a maximum or a minimum number of times within the database [19]. Similar problems appear to determine frequent inquiries at the Internet, customer transactions, credit card purchases, etc. requiring processing very large volumes of data in the span of a day [19]. Fast extracting the most frequent or the less frequent items from large sets permits data mining algorithms to be simplified and accelerated. Sorting of subsets may be involved in many known methods from this area [e.g. 20-22]. Let us consider a system that collects data produced by some measurements or copies such data from a database. A valuable assistance for applications described above may be provided by fast extraction of the maximum and minimum sorted subsets from the set of collected data, where the maximum/minimum sorted subset contains L /L data items. This problem can max min ^ be solved in a software only system. For example, C function qsort permits large data sets to be sorted. After sorting is completed, extracting the maximum and minimum subsets may easily be done collecting them from the top and from the bottom of the sorted set. However, for many practical applications, such as that are referenced in [18,19], performance of the described above operations is important and software functions need to be accelerated. The paper suggests methods and high-performance implementations for solving the indicated above problem in APSoC from the Xilinx Zynq-7000 family. The remainder of the paper is organized in five sections. Section 2 presents the proposed system architecture and describes overall functionality. Section 3 suggests two novel methods allowing the maximum and minimum sorted subsets to be extracted from large data sets. Section 4 shows how large subsets (for which hardware resources are not sufficient) can be computed and discusses additional capabilities. Implementation in Zynq microchip and the results of thorough evaluation and comparison of software only and software/hardware solutions with explicit indication of the achievable accelerations are discussed in section 5. Section 6 concludes the paper. 2 System Architecture and Functionality The known results [2,5,12] have shown that software/ hardware solutions may be significantly faster than software only solutions. Let us look at Fig. 1. Clearly, software/hardware system is faster if: Ts > Tsch < Tsh + Th + Tc, where Ts, Tsch, Tsh, Tc, Th are time intervals required for different modules. In highly parallel implementations software, hardware and interactions between hardware and software can run concurrently. For example, software may run in parallel with hardware; operations in hardware over previously received data may be done at the same time when new data are being transferred. Thus, Tsch < Tsh + Th + Tc. This paper evaluates and compares software/hardware and software only solutions taking into account all the involved communication overheads and paying special attention to high level of parallelism. For instance we would like communication and application-specific operations to be overlapped in hardware as much as possible (see Fig. 1). Note that while hardware only designs may be the fastest, the complexity of such designs is often limited by the available resources in the PL. Figure 1: Software only and software/hardware systems 143 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 142 - 152 Fig. 2 presents the proposed software/hardware architecture. Extracting subsets is done in an application-specific processing block (ASP) which is entirely implemented in the PL. We will discuss the ASP in the next section with all necessary details. There is another block in the PL called communication-specific processing (CSP) which interacts with the PS, i.e. it receives a large set of data items step by step in blocks and transfers the extracted sorted subsets. Besides, CSP is responsible for exchange of control signals between the PS and PL. The PS is responsible for solving the following tasks: 1. Acquiring data and saving them in either on-chip memory (OCM) or external memory that is DDR. 2. Forming requests to extract subsets in the PL which is done through a set of control signals. 3. Collecting extracted subsets and storing them in OCM or external memory. 4. Verifying the results. 5. Solving exactly the same problem in software. This point is required just for experiments and comparison. 6. Computing the consumed time. The PL is responsible for solving the following tasks: 1. Processing control signals received from the PS which are: a request (start) to begin data processing; source address in memory of input data (i.e. the address of the set that has to be handled); desti- Figure 2: The proposed software/hardware architecture nation address in memory of output data (i.e. the address to copy the extracted subsets); the number of blocks Q of input data transferred from the PS to PL; and the number of items in the last block K. ' last The PL also forms two signals that are sent to the PS which are: an interrupt generated as soon as the job is completed (i.e. the subsets have been extracted and copied to memory) and the number of clock cycles consumed in the PL which is needed for experiments and comparisons. 2. Extracting subsets on requests from the PS in highly-parallel ASP. 3. Counting clock cycles consumed in the PL from receiving the request up to generating the interrupt. BP B- B 9= a- B BQ B- B B* B- processing_system7_0 S Data (32 address bits : 4G) axi_cdma_0 ;....„ axi_cdma_l «■ axi_cdma_2 » axi_cdma_3 » axi_cdma_4 » axi_bram_ctrl_Q axi_cdma_0 B Data (32 address bits : 4G) !■■■■ ■» processing_system7_0 «■ axi_bram_ctrl_l axi_cdma_l 9 Data (32 address bits : 4G) ■ processing_system7_0 » axi_bram_ctrl_2 axi_cdma_2 BS Data (32 address bits : 4G) ■ processing_system7_0 ■ axi_bram_ctrl_3 axi_cdma_3 H Data (32 address bits : 4G) » processing_system7_0 » axi_bram_ctrl_4 axi_cdma_4 S Data (32 address bits : 4G) i- ■» processing_system7_0 ™ processing_system7_0 !•■■• «» processing_system7_0 *» axi_bram_ctrl_5 B E Unmapped Slaves (1) ™ processing_system7_0 S_AXI_irTE Reg S_AXI_LITE Reg S_AXI_LITE Reg S_AXI_LITE Reg S_AXI_LITE Reg S AXI MemO S_AXI_HP0 S AXI S_AXI_HP1 S AXI S_AXI_HP2 S AXI S_AXI_HP3 S AXI S_AXI_ACP S_AXI_ACP S_AXI_ACP S_AXI S AXI ACP 0x4E200000 0X4E210000 0x4E220000 0x4E230000 0X4E240000 0x40000000 64K 64K 64K 64K 64K 64K HPO_DDR_LOWOCM MemO Mapping of HP AXI port 0 1WOCM 0x00000000 512M - OxCOOOOOOO 64K • ^/Mapping of HP AXI port 1 HP l_DDR_LOWOCM MemO 0x00000000 OxCOOOOOOO 512M 64K ^/Mapping of HP AXI port 2 HP2_DDR_LOWOCM 0x00000000 512M ■ MemO OxCOOOOOOO 64K - Mapping of HP AXI port 3 HP 3_DDR_LO WOCM MemO 0x00000000 OxCOOOOOOO 512M 64K ^/Mapping of HP AXI ACP ACP_DDR_LOWOCM ACP_Q5PI_LINEAR ACPJOP MemO ACP M AXI GP0 0x00000000 512M OxFCOOOOOO 16M OxEOOOOOOO 4M OxCOOOOOOO 64K 0X4E20FFFF 0X4E21FFFF 0x4E22FFFF 0X4E23FFFF 0X4E24FFFF 0X4000FFFF OxlFFFFFFF OxCOOOFFFF OxlFFFFFFF OxCOOOFFFF OxlFFFFFFF OxCOOOFFFF OxlFFFFFFF OxCOOOFFFF OxlFFFFFFF OxFCFFFFFF 0XE03FFFFF OxCOOOFFFF Figure 3: Address mapping from Vivado 2014.2 block design editor 144 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 142 - 152 Note that for experiments and comparisons some additional signals for interactions between the PS and PL may be needed. There are some generic parameters for which hardware in the PL is statically configured (see Fig. 2). They are: K - the number of items that are handled in hardware in each block (Klast < K); M - the size of each data item; L - the number of items in the maximum subset; max L - the number of items in the minimum subset. min Selection of proper AXI ports is very important. Experiments in [23] have shown that for transferring a small number of data items (from 16 to 64 bytes) generalpurpose input/output ports (GPP) are always the best. In Zynq APSoC there are four available 32-bit GPP, two of which are masters and the other two are slaves from the side of the PS. They are optimized for access from the PL to the PS peripherals and from the PS to the PL registers/memories [24]. Since the latter feature is what we need, a master GPP was chosen for transferring control signals shown in Fig. 2. AXI ACP allows cache memory of application processing unit (APU) in the PS to be involved for data transfers and there exists an opportunity to provide either cacheable or non-cacheable data from/to the indicated above memories (i.e. OCM or DDR) [23]. Mapping of memories may be done in computer-aided design software (in our case in Xilinx Vivado block design editor according to addresses given in [1] and shown in Fig. 3, and in Xilinx Software Development Kit - SDK). Experiments in [12,23] have shown that for transferring large volumes of data items AXI ACP is very appropriate. Thus, this port was chosen to receive the source set from memory (OCM or DDR) in the PL and to copy extracted subsets from the PL to memory. Fig. 4 gives more details about the chosen software/ hardware interactions where: solid arrows indicate who is the master (the beginning) and who is the slave (the end); triple compound lines show control flow; and dashed lines indicate directions of data flow (i.e. one direction - ^ or both directions - o). Control (and possibly a small number of additional auxiliary) signals are transferred through GPP. An initial (source) set and extracted subsets are copied through AXI ACP. The used memory (OCM or DDR) is indicated by the respective mapping both in hardware (see Fig. 3) and in software, which in our case was described in C language, and the mapping is done like the following: #define OCM_ADDRESS #define DDR_ADDRESS #define GPIO_BASE_IO_Control #define HP ADDRES Note that additional details about mapping with many examples can be found in [12]. The snoop controller [1] in Fig. 4 provides cacheable and non-cacheable access to memories (OCM or DDR) [1]. Cache area can be either disabled or enabled in software with the aid of function Xil_SetTlbAttributes [25]. In particular data received from/copied to memories may be pre-cached, i.e. they can be first saved into faster cache and then transferred with the main goal to increase performance of communications. Note that for standalone programs cache memory is entirely available. For programs running under an operating system (such as Linux) some area in cache memory may be used by programs of the operating system and the size of available cache memory is reduced. Many additional details can be found in [12]. Software modules running in processing cores Master Application Processing Unit -APU Snoop controller 512 KB cache and controller Slave (64-bit data) OCM (256 KB) Memory interfaces —>| Central interconnect PS mm£L Control signals On-chip components GPP Slave PL AXI ACP Data flow Î Ë ^ E A [ Control Unit Control ^ flow 1 Control flow Embedded . dual-port RAM * [ Output ^register ]< ^ Input register Yi Communication-specftVpröces'sirig" 0x00000000 0x16D84000 0x40000000 OCM ADDRESS Figure 4: Hardware/software interactions Initial (source) data set and extracted subsets are accommodated in memory as it is shown in Fig. 5. All necessary details about particular locations and sizes are supplied from the PS to PL through GPP (see Fig. 2). To extract the maximum and/or minimum sorted subsets the following sequence of operations is executed: 1. The PS prepares source data in memory, calculates the number of blocks Q = K is predefined), the number of items in the last block (which can be less than K), and indicates source and destination addresses. Here, N is the total number of data items that have to be processed. // OCM address (see [1] for details) // DDR address (see [1] for details) // GPP address (see [1] for details) // for this example OCM address is chosen 145 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 142 - 152 2. The PS sets the start signal that is permanently tested in the PL. 3. As soon as the signal start is set, the PL transfers blocks of data in burst mode and saves them in a dedicated dual-port embedded block RAM (one port is assigned for transferring data from the PS to PL and another port for copying data from the block RAM to PL registers considered in the next section). Figure 5: Accommodation of the initial data set and the extracted subsets in memory 4. As soon as the first block is completely transferred to the block RAM through the first port, it is copied through the second port to PL registers that are used as inputs of sorting networks for extracting subsets in ASP. 5. The maximum and minimum subsets are incrementally constructed using methods from the next section and subsequent blocks of source data are transferred from memory to the block RAM in parallel. 6. The block RAM is organized as a circular buffer as it is shown in Fig. 6. If it becomes full data transfer is suspended until space for subsequent block is freed. 7. As soon as all Q blocks are processed the maximum and minimum subsets are ready (the details will be given in the next section). 8. The maximum and minimum subsets are copied from the PL to memory (see Fig. 5). 9. As soon as the previous point is completed, the PL generates a hardware interrupt to the PS indicating that the job has been finished (the details about such interrupts with examples can be found in [12]). 10. Optionally, the PL may count the number of clock cycles for solving the problem in hardware that it supplied to the PS through GPP. 11. PS may solve other problems in parallel with the PL. However, as soon as the interrupt is generated it is handled by the PS. Hence, the extracted subsets may immediately be used, for example, as data needed for projects of higher hierarchical levels. Figure 6: Block RAM organized as a circular buffer The circular buffer in Fig. 6 is managed by the PL control unit (see Fig. 4) that is a finite state machine. The buffer is built in the PL block RAM which is written through the first port (used for transfer data from the PS) and read through the second port (used to copy data from the block RAM to PL registers). As soon as the buffer is full, data transfer from the PS to PL is suspended. As soon as some area of the buffer is released (because data have already been read) data transfer is renewed. 3 Methods for Extracting Sorted Subsets Let set S containing N M-bit data items be given. The maximum subset contains L largest items in S and max the minimum subset contains L smallest items in S min (L < N and L < N). We mainly consider such tasks max min for which L << N and L << N which are more com- max min mon for practical applications. Large and very large subsets may also be extracted and section 4 explains how to compute them. Experiments with such subsets are also reported in section 5. Sorting will be done in highly parallel networks, such as [26] or [27]. Since N may have very large value (millions of items) it cannot completely be processed in hardware due to unavailability of sufficient resources. 146 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 1118 - 152 We suggest solving the problem iteratively using hardware architecture of ASP shown in Fig. 7. Data are incrementally received in blocks containing exactly K items and then processed by parallel networks described below. We mentioned above that the last block may contain less than K items. If so, it will be extended up to K items (we will talk about such extension a bit later). Part of sorted items with maximum values will be used to form the maximum subset and part of sorted items with minimum values will be used to form the minimum subset. As soon as all Q blocks have been handled the maximum and/or minimum subsets will be ready to be transferred to the PS. We suggest two methods enabling the maximum and minimum sorted subsets to be incrementally constructed. The first method is illustrated in Fig. 8. -"" ^The maximum subset Processing individual blocks with K M-bit items each .1 kThe minimum subset Figure 7: Basic hardware architecture for ASP Loading the maximum possible value only at initialization step 3E SNmin input register Blocks of data loading loading Loading the minimum possible value only at initialization step IE Main sorting network (SN) SNmax input register * E IK ? IE The minimum subset Lmax • i-.-1 The maximum subset Figure 8: The first method of extracting the maximum and minimum sorted subsets Sorting networks SN and SN have input registers. min max ^ ^ The minimum and maximum sorted subsets will be built incrementally in halves of registers indicated at the bottom part of Fig. 8. At initialization step, these parts are pre-loaded with possible maximum and minimum values which data from the source set may have. Such values can be indicated by the PS in additional fields through GPP or calculated in the PL. Then the following steps are executed: 1. The first block containing K M-bit data items is copied from block RAM and becomes available at the inputs of the main SN. 2. The block is sorted in parallel in the main SN which can be done in combinational networks from [26] (such as even-odd merger) or in sequential iterative networks from [27] (such as iterative even-odd transition network). In the last case additional control is provided. 3. L sorted items with maximum values are loaded max in a half of the SN input register as it is shown in max Fig. 8. L sorted items with minimum values are min loaded in a half of the SN input register as it is min ^ ^ shown in Fig. 8. All the items are resorted by the relevant sorting networks SN and SN . max min 4. A new block is copied from block RAM and becomes available at the inputs of the main SN. Such operations are repeated until all Q-1 blocks are handled. 5. The last block may contain less than K items and it is processed slightly differently. As soon as all Q blocks have been transferred from the PS to the PL block RAM and Q-1 blocks have been handled in ASP, the last block (if it is incomplete) is extended to K items by copying the largest item from the created minimum sorted subset. Thus, the last block becomes complete. Clearly, largest item from the created minimum sorted subset cannot be moved again to the minimum subset and the last block is handled similarly to the previous blocks. Let as look at an example in Fig. 9. a b c d e f g SN„ ta c ■-P o SN„ 0 0 0 0 0 0 0 0 U 99 U ! 92 U Ï 71 U 170 Init i Load 35 99 70 92 12 71 29 70 58 58 71 36 99 :35 92 29 36 12 11 11 Load Sort U 35 U 29 U 12 U 11 99 99 99 99 99 99 99 99 Init Load 99 99 92 92 71 71 70 70 0 98 0 80 0 71 0 169 Sort Load 80 98 0 80 98 71 14 69 19 47 18 47 69 19 71 18 47 14 47 0 Load Sort 99 19 99 18 99 14 99 0 35] 35 29 29 12 12 11 11 Sort Load 99 99 98 98 92 92 80 80 71 20 71 19 70 18 69 17 Sort Load 11 20 12 19 13 18 14 17 15 16 16 15 17 14 18 13 19 12 20 11 Load Sort 35 14 29 13 19 12 18 11 14 14 12 12 11 11 0 0 99 98 92' 80 20 19 18 17 "Sort .a u (A £ u £ x ro £ Symbol U indicates undefined value U 14 14 13 12 12 11 11 0 £ u £ 'c £ 01 Sort Load Sort Figure 9: Example of extracting sorted subsets using the first method SN SN 146 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 142 - 152 It is assumed that the minimum possible value of data items is 0 and the maximum possible value is 99 (clearly, other values may also be chosen). At the first step (a), shown in left-hand part of Fig. 9, input registers for SN and SN are initialized, and the first block of data max min ' becomes available for the main SN. U indicates undefined values. At the next step (b) input registers are updated as it is shown by dashed fragments in Fig. 9. At step (c) a new block of data becomes available. Note that loading the register for the main SN can be done in parallel with copying L /L to SN /SN . Items max min max min in SN and SN are sorted as soon as the relevant in- max min put registers are updated. After executing steps (a) - (g) the maximum and minimum sorted subsets are ready (see the right-hand part of Fig. 9) for the items shown in grey in the main SN. Clearly, this method enables the maximum and minimum sorted subsets to be incrementally constructed for very large sets. The idea of the second method is illustrated in Fig. 10 on the same example from Fig. 9. Swap Sort Swap Swap Sort Figure 10: Example of extracting sorted subsets using the second method Now the size of the networks SN and SN was re- max min duced twice (there are now just 4 M-bit inputs instead of 8 in Fig. 9). Much like Fig. 8 both these networks have input registers (4 M-bit registers for our example). At initialization step SN and SN are filled in with the max min minimum and maximum values which are assumed as before to be 0 and 99. There are two additional fragments in Fig. 10 which contain circuits from [28]. They are composed of comparators shown in Knuth notation [29]. Any comparator converts a two-item input to the two-item output in such a way that the upper value is greater than or equal to the lower value. Let us call circuits from [28] a swapping network. If they are applied to two sorted subsets with equal sizes then it is guaranteed that the upper half outputs of the network con- tain the largest values from two sorted subsets and the lower half outputs of the network contain the smallest values from two sorted subsets. If we resort separately the upper and the lower parts then two sorted subsets will form a single sorted set. Let us analyse the upper swapping network in Fig. 10. At step (a) inputs of the network are sorted subsets {0,0,0,0} and {99,92,71,70}. Thus, two new subsets {70,71,92,99} and {0,0,0,0} are created. Sorting them enables the maximum sorted subset {99,92,71,70} with four items to be found on outputs of SNmax. At step (c) inputs of the swapping network are sorted subsets {99,92,71,70} and {98,80,71,69} and two new subsets {99,92,80,98} and {70,71,71,69} are created. Sorting them enables the maximum sorted subset {99,98,92,80} to be built. At step (e) inputs of the swapping network are sorted subsets {99,98,92,80} and {20,19,18,17} and no swapping is done. Hence, the maximum sorted subset is {99,98,92,80} and it is the same as in Fig. 9. The lower swapping network in Fig. 10 functions similarly. The second method involves an additional delay on the comparators of swapping networks but eliminates copying (through feedbacks in Fig. 8) from the main SN to SN and SN . Besides, the sizes of SN and SN max min max min are reduced twice. Let us discuss now an attainable complexity of sorting networks in the PL. It is shown in [5,27] that even in relatively complex field-programmable gate arrays (FPGAs) the size K is limited. For example, for even-odd merge and bitonic merge networks [26] K cannot exceed a few hundreds of 32-bit items even for very advanced FPGAs (such as the largest devices from the Xilinx Vir-tex-7 family [30]). In Zynq devices and circuits from [31] the maximum value of K cannot exceed 100 of 32-bit items. Iterative even-odd transition networks from [27] permit significantly larger number of items (exceeding thousands of 32-bit items) to be processed and they may efficiently be used for computing sorted subsets in hardware. Fig. 11 gives an example of the network from [27] which permits up to K = 16 data items to be sorted. K M-bit data items that have to be sorted are loaded (from block RAM) to the feedback register (FR). Sorting is executed in a segment of even-odd transition network composed of two linked lines with even and odd comparators. Sorting is completed in K/2 iterations (clock cycles) at most. Note, that almost always the number of iterations is less than K/2 because of the technique [27] according to which if there is no swaps of data on the right-most line of the comparators then sorting is completed. Note that the network [27] possesses significantly smaller combinational delays than networks from [26]. Besides, in the proposed architec- ts V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 1120 - 152 iterative sorting network N 2 ■v 3 •v 4 \.5 •v 6 v 7 v 8 ^ 9 V v 10 v 11 •v 12 13 14 1 _ 1/ 3s .is U-Xq JZ^ ] fr" 1 2 3 4 5 6 91 91 91 91 91 91 99 66 77 77 77 77 99 91 77 66 66 66 66 77 77 55 55 55 55 99 66 66 26 37 37 37 55 56 56 37 26 26 99 56 55 55 11 19 19 26 37 37 37 19 11 99 56 33 33 33 3 7 11 19 26 31 31 7 99 56 33 31 26 26 99 3 7 11 19 19 19 56 56 33 31 11 11 11 8 31 3 7 8 8 8 31 33 31 8 7 7 7 33 8 8 3 3 3 3 2 2 2 2 2 2 2 Figure 11: An example of iterative sorting network from [27] for K=16 data items ture (see Fig. 4) iterations are done at the same time as subsequent data are being received from the PS. Such parallelism enables delays to be optimally adjusted allowing the total performance to be improved. 4 Computing Large Subsets and Additional Capabilities For some practical applications the maximum and minimum subsets may be large and the available hardware resources become insufficient to implement sorting networks. Indeed, in accordance with [12] the largest sorting network that can be implemented in Zynq microchip xc7z020-1clg484c (that will further be used for experiments) is 512 32-bit items. The arising problem can be solved using the following technique. Let l and l be constraints for the upper (SN ) and max min 1 1 max bottom (SN ) parts in Fig. 7, i.e. the circuits SN and min max SN with larger values (than l and l ) cannot be min ~J max min implemented due to the lack of hardware resources or because of some other reasons. Let the parameters for the maximum and minimum subsets be greater than l and l , i.e. L > l and L > l . In such case max min max max min min the maximum and minimum subsets can be computed iteratively as follows: 1. At the first iteration, the maximum subset containing l items and the minimum subset con- max taining l items are computed. The subsets are min 1 transferred to the PS (to memories). The PS removes the minimum value from the maximum subset and the maximum value from the minimum subset. Such correction avoids loss of repeated items at subsequent steps. Indeed, the minimum value from the maximum subset (the maximum value from the minimum subset) can appear for subsets to be subsequently constructed in point 3 below and they will be lost because of filtering (see point 3). 2. The minimum value from the corrected in the PS maximum subset is assigned to Bu. The maximum value from the corrected in the PS minimum subset is assigned to Br The values Bu and B! are supplied to the PL through GPP. 3. The same data items (from memory), as in point 1 above, are preliminary filtered in the PL in such a way that only items that are less or equal than Bu and greater or equal than B! are allowed to be transferred to block RAM, i.e. computing sorted subsets is done only for the filtered data items. Thus, the second part of the maximum and the minimum subsets will be computed and appended (in the PS) to the previously computed subsets (such as subsets from point 1). 4. The points 2 and 3 above are repeated until the maximum subset with L items and the minimax mum subset with L items are computed. min ^ Note, that if the number of repeated items is greater than or equal to l /l , then the method above may max min generate infinite loops. This situation can easily be recognized. Indeed, if any new subset (that is sent from the PL to the PS) contains the same value repeated K times then an infinite loop will be created. In such case we can use another method based on software/hardware sorters from [12]. In the next section we will present the results of experiments for such sorters. For some practical applications only the maximum or the minimum subsets need to be extracted. This task can be solved by removing the networks SNmin (for finding only the maximum subset) or SNmax (for finding only the minimum subset). 5 Implementations, Experiments and Comparisons Fig. 12 shows the organization of experiments. We have used a multi-level computing system [12]. Initial (source) data are either generated randomly in software of the PS with the aid of C language rand function (see number 1 in Fig. 12) or prepared in the host PC (see number 2 in Fig. 12). In the last case data may be generated by some functions or copied from available benchmarks. Computing subsets in software/hardware systems is done completely in Zynq APSoC xc7z020-1clg484c housed on ZedBoard [32] with the aid of the described above software/hardware architecture (see Fig. 4). Computing subsets in software only sorters is completely done in the PS calling C language qsort function which sorts data and after that the maximum and minimum subsets are extracted from the sorted data. The results are verified in software running either 146 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 1121 - 152 in the PS (see number 3 in Fig. 12) or in the host PC (see number 4 in Fig. 12). Functions for verification of the results are given in [12]. Verification time is not taken into account in the measurements below. Methods that are used for copying files between the PC and APSoCs are explained in [12] with examples. Synthesis and implementation of hardware modules were done in Xilinx Vivado 2014.2 design environment from specifications in VHDL. Standalone software applications have been created in C language and uploaded to the PS memory from Xilinx SDK (version 2014.2) using methods described in [12]. Interactions with APSoC are done through the SDK console window. to software only system is again significant. For M=64 speed-up is increased in almost 2 times. Measuring the time required only and in hardware/softwai Figure 12: Experimental setup For all the experiments 64-bit AXI ACP port was used for transferring blocks between the PL and memories. More details about this port can be found in [12,23,33]. The size of each block for burst mode is chosen to be 128 of 64-bit items (two 32-bit items are sent/received in one 64-bit word). Two memories were tested: the OCM and external (on-board) DDR. The OCM is faster because it provides 64-bit data transfers [1], but the size of this memory is limited to 256 KB. The available on ZedBoard 4 Gb DDR provides 32-bit data transfers. The measurements were based on time units (returned by the function XTime GetTime [34]) for L = L = ' — L j/ maX min 64, M=32, and K = 200. Each unit returned by this function corresponds to 2 clock cycles of the PS [35]. The PS clock frequency is 666 MHz. Thus, any unit corresponds to approximately 3 ns. The PL clock frequency was set to 100 MHz. Fig. 13 shows the time consumed for computing the maximum and minimum subsets for data sets with different sizes in KB (from 2 to 128). Since M=32 the number of processed words (N) is equal to the indicated size divided by 4. Fig. 14 shows the acceleration of software/hardware systems comparing to software only systems. Note that Figs. 13, 14 present diagrams for OCM. If DDR memory is used then communication overheads are slightly increased but acceleration in the software/hardware systems comparing Time in ¡is 100,000 -y 10,000 1,000 -Software only i-Hardware (method 1) Hardware (method 2) The results for methods 1 and 2 are almost identical and that is why the respective lines overlap Size of data in KB Figure 13: Computing time in software only and software/hardware systems Example: this point indicates acceleration by a factor of 70.7 of the proposed software/ hardware solutions comparing to the software only solution •Acceleration of software/hardware systems comparing to software only system Size of data in KB Figure 14: Acceleration of software/hardware systems comparing to software only system If only the maximum or only the minimum subsets have to be computed the acceleration is almost the same, but the occupied hardware resources are reduced. If the size of the requested subsets is increased in such a way that all data need to be read from memory several times (see section 4) then acceleration is decreased. Table 1 presents the results for extracting larger subsets (containing from 127 to 505 32-bit data items) from 128 KB set. Table 1: The results for extracting larger subsets from 128 KB set N 127 190 253 316 379 442 505 I Time in ps 926.4 1,393.7 1,856.7 2,320.5 2,780.4 3,245.5 3,708.9 100 10 1 146 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 1122 - 152 For very large subsets acceleration may even be less than 1, i.e. software only system becomes faster. In such cases software/hardware sorters from [12] can be used directly and they provide acceleration for all potential cases even for L = N or L = N. Such accelera- max min tion is not as high as in Fig. 14 and it is equal to 6 for N = 512, K = 256 (now K is the size of blocks sorted in hardware and further merged in software) and 1.4 for N = 33,554,432, K = 256. These results were taken from experiments with data sorters from [12] (in all experiments M=32). We found that for small and moderate subsets the proposed here methods provide significantly better acceleration. 6 Conclusion The paper suggests hardware/software architecture for fast extraction of minimum and maximum sorted subsets from large data sets and two methods of such extractions based on highly parallel and easily scalable sorting networks. The basic idea of the methods is incremental construction of the subsets that is done concurrently with transfer of initial data (source sets) through advanced high-performance interfaces in burst mode. Thorough experiments were done with entirely implemented on-chip designs in Zynq xc7z020-1clg484c device housed on ZedBoard. The size of initial sets varies from 512 to more than 33 million of 32-bit words. The results demonstrate significant speed-up comparing to pure software implementations in the same Zynq device, namely performance was increased by 1-2 orders of magnitude for small subsets and by a factor ranging from 1.4 to 6 for very large subsets. 7 Acknowledgments This research was supported by EU through European Regional Development Funds, the institutional research funding IUT 19-1 of the Estonian Ministry of Education and Research, ESF grant 9251, and Portuguese National Funds through FCT - Foundation for Science and Technology, in the context of the project PEst-OE/ EEI/UI0127/2014. 8 References 1. Xilinx, Inc. (2014). Zynq-7000 All Programmable SoC Technical Reference Manual. http:// www.xilinx.com/support/documentation/user_ guides/ug585-Zynq-7000-TRM.pdf. 2. Crockett L.H., Elliot R.A., Enderwitz M.A., and Stewart R.W. (2014). The Zynq Book. University of Strathclyde. 3. Hao L. and Stitt G. (2012). Bandwidth-Sensitivity-Aware Arbitration for FPGAs. IEEE Embedded Systems Letters, 4(3), 73-76. 4. Bailey D.G. (2011) Design for Embedded Image Processing on FPGAs. John Wiley and Sons. 5. Sklyarov V., Skliarova I., Barkalov A., and Titarenko L. (2014) Synthesis and Optimization of FPGA-based Systems. Springer. 6. Cristo, A., Fisher, K., Gualtieri, A.J., Pérez, R.M., and Martinez, P. (2013). Optimization of Processor-to-Hardware Module Communications on Spaceborne Hybrid FPGA-based Architectures. IEEE Embedded Systems Letters, 5(4), 77-80. 7. Canedo, A., Ludwig, H., and Al Faruque, M.A. (2014). High Communication Throughput and Low Scan Cycle Time with Multi/Many-Core Programmable Logic Controllers. IEEE Embedded Systems Letters, 6(2), 21-24. 8. Santarini, M. (2013). All Eyes on Zynq SoC for Smart Vision. XCell Journal, 83(2), 8-15. 9. Dick, C. (2013). Xilinx All Programmable Devices Enable Smarter Wireless Networks. XCell Journal, 83(2), 16-23. 10. Xilinx, Inc. (2014) Vivado Design Suite Guides. http://www.xilinx.com/support/index.html/con-tent/xilinx/en/supportNav/design_tools.html. 11. Xilinx, Inc. (2014). Zynq-7000 All Programmable SoC Software Developers Guide. UG821 (v9.0). http://www.xilinx.com/support/documentation/ user_guides/ug821-zynq-7000-swdev.pdf. 12. Sklyarov, V., Skliarova, I., Silva, J., Rjabov, A., Sud-nitson, A., and Cardoso, C. (2014) Hardware/Software Co-design for Programmable Systems-on-Chip. TUT Press. 13. Xilinx, Inc. (2013). Simple AMP Running Linux and Bare-Metal System on Both Zynq SoC Processors. http://www.xilinx.com/support/documentation/ application_notes/xapp1078-amp-linux-bare-metal.pdf. 14. Sklyarov, V. and Skliarova, I. (2013). Digital Hamming Weight and Distance Analyzers for Binary Vectors and Matrices. International Journal of Innovative Computing, Information and Control, 9(12), 4825-4849. 15. Zmaranda, D., Silaghi, H., Gabor, G., and Vancea, C. (2013). Issues on Applying Knowledge-Based Techniques in Real-Time Control Systems, International Journal of Computers, Communications and Control, 8(1), 166-175. 16. Field, L., Barnie, T., Blundy, J., Brooker, R.A., Keir, D., Lewi, E., and Saunders, K. (2012) Integrated field, satellite and petrological observations of the No- 146 V. Sklyarov et al; Informacije Midem, Vol. 45, No. 2 (2015), 1123 - 152 vember 2010 eruption of Erta Ale. Bulletin of Vol-canology, 74(10), 2251-2271. 17. Zhang, W., Thurow, K., and Stoll, R. (2014). A Knowledge-based Telemonitoring Platform for Application in Remote Healthcare. International Journal of Computers, Communications and Control, 9(5), 644-654. 18. Verber, D. (2011), Hardware implementation of an earliest deadline first task scheduling algorithm. Informacije MIDEM, 41(4), 257-263. 19. Baker, Z.K. and Prasanna, V.K. (2006). An Architecture for Efficient Hardware Data Mining using Reconfigurable Computing Systems. Proc. 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Napa, USA, 67-75. 20. Sun, S. (2011). Analysis and acceleration of data mining algorithms on high performance reconfigurable computing platforms. Ph.D. thesis, Iowa State University. http://lib.dr.iastate.edu/cgi/ viewcontent.cgi?article=1421&context=etd. 21. Wu, X., Kumar, V., Quinlan, J.R., et al. (2014). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1-37. 22. Firdhous, M.F.M (2010). Automating Legal Research through Data Mining. International Journal of Advanced Computer Science and Applications, 1(6), 9-16. 23. Silva, J., Sklyarov, V., and Skliarova I. (2015) Comparison of On-chip Communications in Zynq-7000 All Programmable Systems-on-Chip. IEEE Embedded Systems Letters, 7(1), 31-34. 24. Neuendorffer, S., and Martinez-Vallina, F. (2013). Building Zynq Accelerators with Vivado High Level Synthesis. Proc. ACM/SIGDA Int. Symp. on Field Programmable Gate Arrays, Monterey, CA, USA, 1-2. 25. Xilinx, Inc. (2014). OS and Libraries Document Collection UG647. http://www.xilinx.com/sup-port/documentation/sw_manuals/xilinx2014_2/ oslib_rm.pdf. 26. Baddar, S.W.A.-H., and Batcher, K.E. (2011). Designing Sorting Networks. A New Paradigm. Springer. 27. Sklyarov, V., and Skliarova, I. (2014). High-performance implementation of regular and easily scalable sorting networks on an FPGA. Microprocessors and Microsystems, 38(5), 470-484. 28. Alekseev, V.E. (1969). Sorting Algorithms with Minimum Memory. Kibernetica, 5, 99-103. 29. Knuth, D.E. (2011). The Art of Computer Programming. Sorting and Searching, vol. III. Addison-Wesley. 30. Xilinx, Inc. (2014). 7 Series FPGAs Overview. http://www.xilinx.com/support/documentation/ data_sheets/ds180_7Series_0verview.pdf. 31. Mueller, R., Teubner, J., and Alonso, G. (2012) Sorting networks on FPGAs. Int. J. Very Large Data Bases, 21 (1), 1-23. 32. Avnet, Inc. (2014). ZedBoard (ZynqTM Evaluation and Development) Hardware User's Guide, Version 2.2. http://www.zedboard.org/sites/default/ files/documentations/ZedBoard_HW_UG_v2_2. pdf. 33. Sadri, M., Weis, C., When, N., and Benini, L. (2013). Energy and Performance Exploration of Accelerator Coherency Port Using Xilinx ZYNQ. Proceedings of the 10th FPGAWorld Conference, Copenhagen/Stockholm. 34. Xilinx, Inc. (2013). LogiCORE IP AXI Master Burst v2.0. Product Guide for Vivado Design Suite. http://japan.xilinx.com/support/documenta-tion/ip_documentation/axi_master_burst/v2_0/ pg162-axi-master-burst.pdf. 35. Xilinx, Inc. (2014). Standalone (v.4.1). UG647. http://www.xilinx.com/support/documentation/ sw_manuals/xilinx2014_2/oslib_rm.pdf. Arrived: 09. 11. 2014 Accepted: 14. 04. 2015 146 Original scientific paper /midem Journal of M Informacije | Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 2 (2015), 153 - 159 Radiation Behavior and Test Specifics of A-D and D-A Converters Alexander A. Demidov, Oleg A. Kalashnikov, Alexander Y. Nikiforov, Alexander S. Tararaksin, VitalyA. Telets National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Specialized Electronic Systems» (SPELS), Moscow, Russia Abstract: ADC/DAC radiation failures are mainly due to radiation-induced degradation of precision parameters of the transfer characteristic such as gain, zero offset, full-scale voltage, integral and differential non-linearity, conversion error. ADC/DAC radiation failure specifics is that even a slight deviation of electrical parameter of internal elements (comparator threshold, internal reference voltage, switch leakage, operational amplifier gain, etc.) often leads to significant degradation of ADC/DAC accuracy. ADC/DAC radiation test procedure and facilities are developed and test results are introduced. Keywords: analog-to -digital converter (ADC); digital-to-analog converter (DAC); radiation; test technique Sevalno obnašanje in testne posebnosti A-D in D-A pretvornikov Izvleček: ADC/DAC sevalne napake so običajno posledica radiacijsko pogojenega staranja natančnosti parametrov prenosnih karakteristik, kot je ojačenje, ničelni odmik, polna napetost, linearna in diferencialna linearnost in napaka pretvarjanja. Posebnost ADC/ DAC sevalnih napak je, da že majhna sprememba električnih lastnosti elementov (prag primerjalnika, interna referenčna napetost, uhajanje preklopnika, ojačenje...) v veliki meri vpliva na degradacijo natančnosti pretvornika. Predstavljeni so razviti postopki testiranja in rezultati meritev. Ključne besede: analogni digitalen pretvornik (ADC); digitalno analogen pretvornik (DAC); sevanje; tehnike testiranja * Corresponding Author's e-mail: oakal@spels.ru 1 Introduction Analog-to-digital and digital-to-analog converters (ADC and DAC) are widely used in space, collider physics, avionics, nuclear power plants, etc. applications, being the essential parts of data pre-processing and control units. Therefore important issues are to analyze radiation behavior particularities and to develop informative radiation test procedures and technique to estimate ADCs and DACs radiation sensitive parameters and characteristics degradation [1], [2]. The most radiation sensitive feature of ADCs and DACs is accuracy which is determined by the transfer characteristic parameters of such as gain, zero offset, full-scale voltage, integral and differential non-linearity, conversion error. ADC and DAC radiation failure specifics as compared with digital integrated circuits (ICs) is that even a slight deviation of a parameter (comparator threshold, internal reference voltage, switch leakage, operational amplifier gain, etc.) often leads to significant degradation of ADC/DAC accuracy [3]. Total ionizing dose (TID) accumulation in ADC and DAC results in continuous degradation of static and dynamic conversion parameters, while transient irradiation (gamma flesh or single charged particles) may result in ADC output code failures or in DAC output voltage transients. The radiation behavior of various ADCs and DACs is rather complicated and significantly depends on the particular IC architecture and on the bias and operation conditions during irradiation and testing. This should be considered in development of ADC/DAC radiation tests techniques and facilities. A lot of radiation tests of various ADC and DAC were carried out in the MEPhI-SPELS radiation test labora- 153 © MIDEM Society A. A. Demidov et al; Informacije Midem, Vol. 45, No. 2 (2015), 153 - 159 tory (Moscow, Russia) [4]. The analysis of test data demonstrates the critical importance of ADCs and DACs functional tests as compared to other IC groups. ICs dominant failure types (parametric or functional) statistics from our test experience is presented in Fig. 1. One can see the essential prevalence of parametric TID failures for simple logic while other (complex) ICs are characterized by subsequent or even dominant functional failures [5], and ADC/DAC ICs are the leaders. Figure 1: Relative part (%) of parametric and functional radiation failures for different ICs classes Variety of hardware and software solutions has been developed to provide reliable and informative testing of different ADC/DAC ICs directly under irradiation within -60...+125 C temperature range. The system implements both operational modes under irradiation assignment and monitoring of the entire set of static and dynamic parameters which characterize ADC/DAC radiation hardness. We present the structure, the operation principles and the basic technical specifications of the system in this paper. typical results, which cause the specific ADC-DAC radiation test technique development. 2 Total ionizing dose effects The typical TID effect in ADC and DAC is transfer function (TF) degradation and the associated degradation of a converter precision parameters (DAC TF - dependence of output voltage/current vs. input code, ADC TF - dependence of output code vs. input voltage). For example, in Fig. 2 TFs of ADC (Fig. 2a) and DAC (Fig. 2b) within the Data Acquisition System (DAS) ADuC812BS (Analog Devices) at different TID values is presented. Fig. 3 shows the TF degradation of ADC AD1671SQ/883B (Analog Devices) with TID accumulation. One can see that TF degradation can be gradual and smooth or sharp and abrupt [10]. (a) 0.625 1.250 1.875 Input voltage, V The used set of original compact radiation test basic facilities is introduced ([2], [4], [6]) including Co-60 and Cs-137 isotopic gamma-sources, electron linear accelerator, flash X-ray machine - all with minimum possible signal cables length (about 1 m only). The used ions cyclotron (in Dubna) and high energy proton synchrocyclotron (in Gatchina) were rather traditional. And we widely used laser and X-ray simulators which give us the unique possibility to measure all ADC/DAC informative parameters and characteristics. The paper also contains numerous test results of ADCs and DACs which designed by various manufacturers by using various architectures and technologies. We concentrate on TID effects, single event effects (SEE), and transient radiation effects (TRE). The data presented is mostly experimental - the theory of ADC/DAC radiation effects is well known and has been widely presented [7]-[9]. The purpose of this paper is to demonstrate the variety of these effects. We present here the most (b) 2048 Input code Figure 2: ADC (a) and DAC (b) within the DAS ADuC812BS - TFs at different TID values: 1 - initial, 2 -12 krad(Si), 3 - 16 krad(Si) 4096 3072 2048 1024 3072 154 A. A. Demidov et al; Informacije Midem, Vol. 45, No. 2 (2015), 153 - 159 Figure 3: ADC AD1671SQ/883B TF degradation with TID accumulation ADC/DAC TF degradation results in their accuracy parameters degradation - integral nonlinearity (INL), differential nonlinearity (DNL), offset and gain errors. INL is the measure of the deviation values on the actual TF from a straight line. DNL is the difference between an actual step width (for an ADC) or step height (for a DAC) and the ideal value of 1 least significant bit (LSB). Offset error is defined as the difference between the nominal and actual offset points when the digital output (for an ADC) or digital input (for a DAC) is zero. Gain error is defined as the difference between the nominal and actual gain points on TF when the digital output (for an ADC) or digital input (for a DAC) is full scale [11]. As an example, a number of INL and DNL TID-depend-encies of DAC within ADuC812BS is shown in Fig. 4. The curves are plotted for several irradiated samples and correspond to average TF changes which are presented in Fig. 2b [12]. It is important to mention that not only the maximum values of ADC/DAC accuracy parameters degrade under irradiation, but the dependencies of these parameters vs. input or output signals (codes) vary too. For example, the dependencies of INL and DNL of ADC PV2 are presented in Fig. 5 and 6 respectively. The different TID behavior of these two parameters may be noted. In the INL graphs there is a rise of "teeth" and general distortion increase (bending). At the same time, the degradation of DNL appears as increase of the spikes amplitude at certain ADC output code. The values of DNL for the rest of the codes do not increase practically [3]. Thus, to determine the radiation behavior of ADCs and DACs with TID accumulation, a set of TFs should be re- corded during irradiation, which is used to calculate TID dependencies of a converter accuracy parameters. It should be noted that such "standard" analog and digital parameters of converters as supply current, output voltage, maximum operating frequency etc. also changes under irradiation. However, ADCs and DACs have no specifics when compared with other functional classes of ICs both in these parameters degradation and in their control procedures during testing. Therefore this is not the issue of this paper. (a) 4 6 8 10 12 14 16 Total dose, krad (Si) (b) 2 4 6 8 10 12 14 16 Total dose, krad (Si) Figure 4: TID dependencies of DAC within ADuC812BS samples DNL (a) and INL (b) ADC output code on X-axes and DNL (in units of LSB) on Y-axes 3 Single event effects Single event effects (SEE) due to single nuclear particles (such as heavy ions and protons) may result in either failures (latch-up, burn-out and so on) or single event upsets (SEU). Failures, as well as the experimental methods of their detection are well known and presented in a large number of publications [13], [14]. 154 A. A. Demidov et al; Informacije Midem, Vol. 45, No. 2 (2015), 153 - 159 0.5 0.0 -0.5 0 initial 84 krad (Si) 0.0 -0.5 -1.0 0 4096 120 krad (Si) Wk i,, iNNWn \ liirn N 0 4096 160 krad (Si) l\U \ I m m\ \\\ W 1 -0 --1 --2 -3 --4 - 0 4096 8192 12288 200 krad (Si) Figure 5: Total ionizing dose degradation of integral nonlinearity: ADC output code on X-INL (in units of LSB) on Y-axes ADC PV2 axes and There is no specifics of SEE failures in ADC and DAC, so in this paper we focus on converters SEU. There are two types of ADC-DAC SEU. First, DAC SEU may lead to the output voltage (current) spikes during irradiation. Similarly, ADC SEU may occur as the output code pulse (reversible change). Fig. 7 shows the output voltage transients of DAC TLV5638MFKB (Texas Instruments) during irradiation by Xe-ions in the Dubna cyclotron [13]. 0.0 - -0.5 -0 initial » 84 krad (Si) 0.5 0.0 --0.5 - 0 4096 120 krad (Si) 0.5 -0.0 --0.5 - 0 4096 160 krad (Si) 0.5 0.0 --0.5 --1.0 - 0 200 krad (Si) Figure 6: Total ionizing dose degradation of ADC PV2 differential nonlinearity: Figure 7: DAC TLV5638MFKB output voltage transients during Xe-ions irradiation. The second type of SEU is upset of ADC-DAC internal flip-flops and registers as a result of nuclear particles influence. The upsets of data registers can change DAC output voltage (current) or ADC output code while the upsets of control registers can lead to a converter operational mode change. In either case it is usually necessary to restart a converter in order to restore its normal operation. The purpose of experimental research is to detect ADC and DAC SEU during irradiation at nuclear particle accelerators. Several ions with different Linear Energy Transfers (LET) are usually used. For each ion a SEU cross-section is determined by the equation: Sseu = nseu / (0 x nb^ (1) 0.5 3192 12288 3192 12288 16384 4096 8192 12288 154 A. A. Demidov et al; Informacije Midem, Vol. 45, No. 2 (2015), 153 - 159 where NSEU - number of upsets detected, O - particles fluence at irradiation session, NB - number of bits under test. The approximating curve is to be plotted based on these data, and a converter SEU parameters - the threshold LET and the saturation cross-section - are determined. Weibull-function is used for the experimental data approximation. Fig. 8 shows such a curve and SEU parameters for sigma-delta ADC AD7711ASQ (Analog Devices) [14]. Figure 8: ADC AD7711ASQ SEU experimental data, Weibull approximation, and SEU parameters (LET and cross-section) output response at the moment of gamma-ionization pulse is registered. But the upsets may occur in the dynamic operation modes of a converter as well. For example, the waveforms in Fig. 10 illustrate the gamma pulse upset of DAC PA3 when operating in the dynamic mode of sine signal generation [16]. .... .... 1 r 9E6 rad(Si)/s .... .... .... , , , _ 1,6E7 rad(Si)/s ; .. . a 3,3E7 rad( Si)/s ; : 6E7 rad(S i)/s ; 1E8 rad(Si)/s ; ■ i ■ 11 / ; :---- , , , , v,,, , , , , , , , , 1) c 21 C 3) C 4) C 5) C ' , , , , H2 1 V H2 1 V H2 1 H2 1 V H2 1 V , , , , 1 uS 1 uS 1 uS 1 uS 1 uS , , , , , , , , ... J | gamma pulse (15 ns) Figure 9: DAC PA1 output voltage pulses at different dose rates 4 Transient radiation effects Transient radiation effects (TRE) or dose-rate effects are caused by pulsed gamma irradiation. These effects in ADC and DAC are similar to SEE - both failures and upsets are also possible. The difference is that in SEE case only a single circuit element is locally affected by the particle every moment while TRE specific is that all functional elements and parasitic structures are jointly affected by radiation. Upsets are characterized by the threshold level of gamma dose rate and by the recovery time. Moreover, as a rule, there is a clear dependence of an output signal (voltage or current of DAC and code of ADC) pulse response amplitude and duration on the dose rate. As an example, Fig. 9 shows a set of radiation pulse waveforms of the DAC PA1 output voltage at different dose rates. It is seen the increase of pulses both amplitude and duration. The performance criteria are typically established by the maximum allowable amplitude and duration of the ionization pulses, and are determined by the particular equipment application conditions [15]. Generally, tested DAC and ADC are set to a static operation mode with a certain output level/code, and the Figure 10: Dose rate upset of DAC PA3 in dynamic mode 5 Radiation test technique As it was already mentioned above, the accuracy parameters of ADC and DAC are determined by the transfer function (TF). For its measurement during a TID test, 154 A. A. Demidov et al; Informacije Midem, Vol. 45, No. 2 (2015), 153 - 159 full range linearly increasing voltage (code) is to be put on an ADC (DAC) under test inputs, and output ADC code (DAC voltage/current) is to be measured. This procedure should be repeated for all TID values we are interested in, thus it is necessary to carry out the measurements as fast as possible to satisfy the condition [2]: tmeas < 0-1 T„ (2) where TMEAS - full TF measurement duration, TRAD - time between measurements. The 0.1 factor is normally used in TID test practice to provide relatively short measurement duration as compared to irradiation time. It allows to minimize the influence of annealing during measurement and eliminating test result distortion. One more test procedure feature is also provided by the timing requirements. According to our experience and data it is very important to measure TF directly during irradiation. Measuring after irradiation would distort the real radiation behavior picture and hardness level because of annealing that can result even in full operation recovery. In Fig. 11 two graphs of CMOS ADC nonlinearity are shown: the first is measured immediately after the 100 krads(Si) irradiation and the second - 12 hours later [3]. One can see that 12-hours annealing leads to an ADC's operation recovery. ware-implemented to meet ultra hard restrictions on measurement speed. The hardware structure based on a differential amplifier is shown in Fig. 12 [17]. One input of the amplifier is connected to DAC under test voltage output, and another input - to the bias voltage. Direct measurement of the output voltage is replaced by measurement of the adjacent codes output voltages difference. The specialized ADC and DAC testing system based on the National Instruments hardware, LabView software, and a set of device-under-test boards, adapted to different converters, is developed [18]. The results of radiation tests of several dozen converters carried out using this equipment, have confirmed its effectiveness [19]. offset DAC Uref é DAC r under test ri ADC Figure 12: Voltage biasing structure for precision DAC TF measuring 6 Conclusions Figure 11: ADC nonlinearity measured immediately after 100 krads(Si) irradiation and 12 hours later (datasheet margins are shown by dashed lines at ±4 LSB) Another problem is caused by the TF standard measuring procedure [3], which requires the error of a measuring device (accuracy of the input voltage) should be within 1/16 of a DAC (ADC) LSB value in the range of measurement corresponding to the full range of a DAC (ADC) output (input) voltage. To satisfy these conditions it is necessary to use the special methods of high accuracy voltage biasing, as well as multiple measurements and averaging the measured values. These procedures should be hard- The article presents the typical radiation effects in DACs and ADCs when exposed to different types of ionizing radiation. It can be seen that the converters specifics, which are characterized by both digital and analog parameters, leads to their radiation behavior specifics - the effects are caused by both digital registers and control circuits failures and failures and parameters degradation of analog units. This feature of ADCs and DACs leads to the fact that the procedure of radiation test has some specific features when compared with test procedure of "pure" digital or analog integrated circuits. It is necessary to use the special control and monitor technique, which combines software control and data processing with precise measurements. The implementation of this technique requires specialized test equipment that should be compatible with the specialized radiation facilities with short signal cables. 7 References 1. A. Nikiforov, A. Chumakov, V. Telets, et. al, "IC Space Radiation Effects Experimental Simulation'; + 154 A. A. Demidov et al; Informacije Midem, Vol. 45, No. 2 (2015), 153 - 159 // Proc. of Workshop "Space Radiation Environment Modelling New Phenomena and Approaches", Oct. 7-9, 1997, Moscow, Russia, p. 4-11. 2. V. Belyakov, A. Chumakov, A. Nikiforov, V. Pershen-kov, "IC's radiation effects modeling and estimation'; // Microelectronics Reliability, 1999, v. 40, Ne 12, pp. 1997-2018. [9] O. Kalashnikov, "Statistical Variations of Integrated Circuits Radiation Hardness', // RADECS Proceedings, 2011, pp. 661-665. 3. O. Kalashnikov, A. Artamonov, A. Demidov, et. al, "ADC/DAC Radiation Test Technique', // Workshop Record 4th European Conf. "Radiations and Their Effects on Devices and Systems" (RADECS 97), 1997, Palm Beach-Cannes, France, pp. 56-60. 4. Compendium of International Irradiation Test Facilities, // RADECS 2011, p. 66. 5. O. Kalashnikov,"Statistical Variations of Integrated Circuits Radiation Hardness", // RADECS Proceedings, 2011, pp. 661-665. 6. A. Chumakov, A. Nikiforov, V. Telets et. al, "IC space radiation effects experimental simulation and estimation methods', // Radiation Measurements, v. 30 (5) , pp. 547-552. 7. T. Turflinger et. al, "Understanding Single Event Phenomena in Complex Analog and Digital Integrated Circuits", // IEEE Trans. Nucl. Sci., 1990, v. 37, pp. 1832-1838. 8. J. Tausch, "Radiation Testing of Mixed-Signal Microelectronics," IEEE NSREC 2000 short course proceedings. 9. T. Turflinger, "Transient Radiation Test Techniques for High-Speed Analog to Digital Converters', // IEEE Trans. Nucl. Sci., 1989, v. 36, pp. 2356-2361. 10. O. Kalashnikov, A. Demidov, A. Nikiforov, et. al, "Integrating analog-to-digital converter radiation hardness test technique and results', // IEEE Trans. Nucl. Sci., 1998, v. 45, pp. 2611-2615. 11. Understanding Data Converters - Application Report, // Texas Instruments, 1995, http://www. ti.com/lit/an/slaa013/slaa013.pdf. 12. O. Kalashnikov, A. Nikiforov, "TID behavior of complex multifunctional VLSI devices", // Proceedings of the International Conference on Microelectronics, ICM, 2014, pp. 455-458. 13. A. Chumakov, A. Pechenkin, A. Egorov, et.al, "Estimating IC susceptibility to single-event latch-up', // Russian Microelectronics, 2008, v.37 (1), pp. 4146. 14. A. Chumakov, A. Vasil'ev, A. Kozlov, et. al, "Single-event-effect prediction for ICs in a space environment', // Russian Microelectronics, 2010, v.39 (2), pp. 74-78. 15. T. Agakhanyan, A. Nikiforov, "Predicting the Effect of Pulsed Ionizing Radiation on Operational Amplifiers', // Russian Microelectronics, 2002, v.31 (6), pp. 375-383. 16. A. Nikiforov, A. Sogoyan, "Modeling of high-dose-rate pulsed radiation effects in the parasitic MOS structures of CMOS LSI circuits', // Russian Microelectronics, 2004, v.33 (2), pp. 80-91. 17. T.B. Williams, "The calibration of a DAC using differential linearity measurements', //IEEE Trans. on Instr. and Meas., 1982, v.31, ^4. 18. D. Bobrovsky, G. Davydov, A.Petrov, et. al, "Realization of electronic component base radiation test methods based on hardware-software complex of National Instruments hardware', // Electronics, 2012, v.5(97), pp. 91-106. 19. A. Sogoyan, A. Artamonov, A. Nikiforov, D. Boy-chenko, Method for integrated circuits total ionizing dose hardness testing based on combined gamma- and xray- irradiation facilities,// Facta Univesitatis: series Electronics and Energetics, 2014, Vol. 27, No. 3, pp. 329-338. Arrived: 25. 11. 2014 Accepted: 17. 03. 2015 154 original scientific paper_ Informacije imidem Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 2 (2015), 160 - 170 Computing Worst-Case Performance and Yield of Analog Integrated Circuits by Means of Mesh Adaptive Direct Search Arpad Burmen1, Husni Habal2 1University of Ljubljana, Faculty of Electrical Engineering 2Technical University of Munich Abstract: Estimating the parametric yield of a circuit by means of a Monte Carlo analysis can be slow, particularly when the yield estimate is close to 100%, as a large number of samples are necessary to reach the desired level of confidence. Deterministic numerical algorithms have been successfully used in commercial tools for yield estimation. Many of them are gradient-based. The gradients are estimated numerically using finite differences, because most simulators do not compute sensitivities. In this paper, an approach is proposed based on a derivative-free optimization algorithm from the family of mesh adaptive direct search methods. The basic algorithm is extended with capabilities that speed up the convergence and enable the algorithm to cope with infeasible starting points. The new approach is compared to a commercial tool that uses gradient-based algorithms for worst-case analysis. The results show that the proposed approach is capable of producing accurate results within similar computational budgets. Keywords: analog circuit design;, design centering; worst-case analysis; yield analysis; optimization; mesh adaptive direct search Določanje najslabših lastnosti in izplena analognih integriranih vezij z adaptivnim mrežnim direktnim optimizacijskim postopkom Izvleček: Določanje izplena vezja s pomočjo Monte Carlo analize je pogosto zamuden postopek, še posebej, ko se izplen približuje 100%, ker potrebujemo za zanesljive rezultate veliko število vzorcev. Deterministični postopki za določanje izplena so na voljo v komercialnih orodjih. Številni postopki se zanašajo na informacijo o gradientu, ki ga določajo numerično, saj večina simulatorjev ne računa občutljivosti rezultatov. Članek opisuje pristop z uporabo brezgradientnega optimizacijskega postopka iz družine adaptivnih mrežnih direktnih optimizacijskih postopkov. Osnovni postopek je nadgrajen z razširitvami, ki pospešijo konvergenco proti rešitvi problema in omogočajo, da postopek uporabi začetno točko, ki krši omejitve. Predlagani pristop smo primerjali s komercialnih orodjem, ki uporablja gradientne optimizacijske postopke. Rezultati kažejo, da je predlagan pristop sposoben najti pravilne rešitve problemov v primerljivem času. Ključne besede: načrtovanje analognih vezij; centriranje; določaje najslabših vrednosti lastnosti; analiza izplena; optimizacija; adaptivni mrežni direktni optimizacijski postopki * Corresponding Author's e-mail: arpadb@fides.fe.uni-lj.si 1 Introduction Modern integrated circuits must exhibit adequate performance across a given range of operating conditions, such as supply voltage and temperature, and in the presence of random variations resulting from the manufacturing process [1]. Towards this objective, parametric yield is defined as the fraction of manufactured circuits that meet all performance specifications, such as minimum gain and maximum power, in consideration of all operating conditions, as well as the statistical distribution of random variations. A prerequisite for designing such a circuit is an efficient means of evaluating the circuit's worst performance. Manufactured circuits that fail an imposed performance specification must be discarded, such that the 160 © MIDEM Society Á. Burmen et al; Informacije Midem, Vol. 45, No. 2 (2015), 160 - 170 parametric yield is reduced below 100%. The simplest means to estimate the parametric yield is Monte-Carlo analysis (MCA). Unfortunately, a very large number of performance evaluations are needed for accurate estimation by MCA when the yield is close to 100% [2]. This is prohibitively inefficient, since each performance evaluation requires a costly circuit simulation. More efficient means to evaluate electrical performance given the worst-case combination of operating conditions and random variations is therefore necessary for robust circuit design; some alternatives have been presented in literature (cf. [2, 3]). In [2], the worst-case distance (WCD) metric was used to obtain yield estimates with less computation. The WCD method requires the numerical solution of an optimization problem. This problem can be solved in significantly less time than it takes an MCA to obtain similar or more accurate yield estimates. The alternative to yield estimation by WCD is the worst-case performance (WCP) method [2]. In WCP, the worst value of a performance is calculated which corresponds to a predefined parametric yield (Y). If this worst value satisfies the performance specification the parametric yield is not smaller than Y. In general, both WCD and WCP require the solution of a non-linear optimization problem by numerical methods. Deterministic optimizations have been successfully applied to solve the WCD and WCP problems typical in analog integrated circuits -- including academic and commercial tools. These algorithms have been derivative-based, so that the sensitivity of the electrical performances to the value of the operating and statistical parameters was needed (e.g. [2] and the references therein, [3]). In this paper a new deterministic and derivative-free method is proposed to solve the WCD and WCP problems. The method is based on mesh adaptive direct search (MADS)[4]. The remainder of the paper is organized as follows: section 2 introduces the mathematical formulation of WCD and WCP. Section 3 gives an overview of MADS and modifications introduced by the proposed approach. The implementation details are the subject of Section 4. Section 5 presents the results and compares them to the results obtained with a derivative-based algorithm implemented in a commercial tool (WiCkeD [5]). The concluding remarks are given in Section 6. Notation. Inequalities apply to vectors component wise. 0 denotes a vector of all-zeros. The i-th component of vector v is denoted by v.. An element of a matrix A is denoted by a...The ramp function ramp(x) is zero for x<0 and equal to x otherwise. The group of orthogonal transformations of Rn is denoted by On. The i-th orthonormal basis vector is denoted by e.. 2 Mathematical formulation for worst-case analysis Let x0 denote the vector of n0 parameters describing the circuit's operating condition, also referred to as the operating parameters. The prescribed range of operating conditions within which the circuit must operate is specified by lower and upper bounds on operating parameters given by vectors xL0 and xH(, respectively. The performance of the circuit is also affected by variations of the manufacturing process which in turn are modeled as mutually dependent random variables. Without loss of generality, the set of dependent process parameters can be mathematically transformed into a set of independent random variables with normal distribution. Let xS denote a vector representing a realization of these random variables. Components of xS are also referred to as the statistical parameters. By assumption the joint probability density of the statistical parameters can be expressed as P U) 1 (2n) (1) Circuit behavior is evaluated by a number of performances, for example power, amplification gain, and bandwidth. The performances are ordered in a vector f with length m. Their value for any specific circuit will depend on the value of the operating parameters x0, as well as the statistical parameters xS. Component f of f is the value of a map computed by a simulator. With some abuse of notation one can write f (x^ xS). For a circuit to behave correctly at (x0 xS) it must meet a set of performance specifications of the form f (x0, xS) >G , where G.. denotes the target value corresponding to f. Performance specifications of the form f (x0, xS) g. where fw (xS) is the worst value of f . at xS across the prescribed range of operating parameters and (3) xw [xs ) = arg l min [x0, x x0 - x0 - x0 This integral cannot be computed analytically and is usually estimated with a Monte Carlo analysis. 2 2 161 A. Buririen et al; Informacije Midem, Vol. 45, No. 2 (2015), 160 - 170 10 -0.5 \ A \\xs\\=0t X \ \ fii(xs)=G, xs=0 o 1 j?Xxsy>G< j .■HI* ftO fi.i io Figure 1: The worst-case point xsw,i and the linearization (dashed line) of fw (xs) =G. (thick line) in the space of the statistical parameters. The acceptance region of f is shaded. Replacing the nonlinear specification with its linearization at xsw,i makes it possible to compute a yield estimate using (6). The approximation introduces an error equal to the integral of (1) over the region shaded in dark grey. A good yield approximation can be obtained by replacing the performance with its linear model computed at the worst-case point (xowJ (xswi), xswJ) [2]. Figure 1 illustrates the worst-case point in the space of statistical parameters when n==2; the sphere ||xs| | =b,. is tangential to the boundary of the acceptance region at xsw,i. The statistical parameters corresponding to this point are given by xw'1 = arg mm ,^ fW (S G,." arg min fW (S ^ G .. The worst-case distance of f is defined as ' for fW(0)> Gt (4) in ||xJ ' otherwise A = IxHI, for fW(0) > Gt xW'1 , otherwise (5) If fw(xs) satisfies the design requirements at xs = 0 the worst case distance is positive, otherwise it is negative. By linearizing fw(xs) > G, at the worst-case point a yield approximation can be computed analytically by integrating (1) over the light grey region in Figure 1. The obtained yield approximation is Y =+ erf(/^ ^ > Oj (6) The difference between the actual and estimated yield corresponds to the integral of (1) over the dark grey region in Figure 1. The computationally intensive component of yield estimation is to find the solution to problem (4). For small yields the computational effort is in the same order of magnitude as that required by a Monte Carlo analysis. For large yields the number of the required Monte Carlo samples grows rapidly as the yield approaches 100% while the computational effort for solving (4) remains the same. Typically a designer tunes the design parameters until b (and the yield) is maximized. Problem (4) has a general nonlinear constraint that can only be evaluated by circuit simulation. An alternative approach to yield maximization is often used. Instead of computing the WCD, the WCP corresponding to a given b can be computed. W,l W,l ) = arg min f xj x0 — x0 - x0 (7) The constraint in the WCP formulation is a convex quadratic function that can be evaluated without circuit simulation. If the i-th performance f satisfies fj(xgwJ, xsw,i) > G ,, then the WCD (b) and corresponding yield estimate (Y) will satisfy b-, > b and Y. > y2 (1 + erf (b/V2)). Problems (4) and (7) are typical problems for which the initial deterministic method of choice is a gradient-based optimization algorithm, for example a sequential quadratic programming (SQP) or an interior point method [6]. An alternative is to use gradient free optimization methods. Mesh adaptive direct search (MADS) is one of these methods. MADS is capable of handling problems with nonsmooth objective function and constraints. Unfortunately as most derivative-free methods MADS converges slowly to a solution. To accelerate its convergence one can use quadratic models of the objective and of the constraints to compute promising points that speed up the algorithm's progress. The quadratic model can be built gradually by applying an update formula to the current approximation of the Hessian matrix. 3 Mesh Adaptive Direct Search MADS is a family of algorithms where the steps the algorithm takes to explore the search space lie on a grid. In the presented algorithm the grid is defined as xs II — P 2 162 A. Buririen et al; Informacije Midem, Vol. 45, No. 2 (2015), 160 - 170 G ={{ : a e z} (8) where Dkm denotes the mesh size parameter. The algorithm solves problems of the form minxeo=*» f W (9) where W = {x :xL < x < x" a c. (x) < 0, i = 1, ..., nc} denotes the set of feasible points. The lower and the upper bounds on the components of x are given by vectors xL and x", respectively. Nonlinear inequality constraints are defined by functions c. (x). For convenience the nc functions c. (x) are joined in a vector-valued function c (x). The incumbent solution in the k-th iteration and the corresponding value of f are denoted by xk and fk, respectively. Any point that is considered to be sufficiently good to replace the incumbent solution is referred to as an improving point. The initial point x0 eW corresponds to the first iteration (k = 0). MADS can handle constraints with the extreme barrier approach by replacing with whenever x gW. Unfortunately this also requires the initial point to be feasible. Infeasible initial points can be handled by using a filter [7][8]. The filter approach decides whether a point can replace the incumbent solution by applying a bi-objective criterion based on the values of the objective and constraints at points evaluated in the past. Algorithm 1: k-th iteration of the proposed algorithm based on the MADS framework. 1. Complete the quadratic model by computing the gradient of f and the Jacobian of c. 2. Make the model convex by replacing the Hessian H with H + el, e > 0. 3. Compute s by solving the convex quadratic model and rounding the result to Vk. 4. Evaluate f and c at x = xk + s. if x is an improving point, set xk+1: = x and go to step 8. 5. Generate the set of poll directions Dk c Qk. 6. Evaluate fand c at x = 0(xk + d) for d e Dk. If x is an improving point set xk+1: = x. If the step resulting in an improving point was cut, go to step 8, else go to step 7. When Dk is exhausted go to step 8. 7. Evaluate f and c at x = -0(xk + 2(x - xk)). If x is an improving point, set xk+1:=x. 8. If xk+i = x„ set lk+r = lk +1 else if step 7 failed to produce an improving point, set lk+i: = k else if x * x., the step resulting in x 1 was not k+l cut, and lk > 0 set lk+t: = lk - 1; else set lk+,: = lk. the speculative step, respectively. Set Dk is referred to as the set of scaled poll directions. The length of a scaled step is determined by the step size parameter Dkp. Function 0 maps points that violate bounds (xL and x") to points that satisfy them. A step is cut if 0(x) * x. Although the proposed approach uses quadratic models the convergence properties of the MADS framework enable it to find a solution of the optimization problem even when the search step is omitted. Refining subsequences are sequences of iteration indices k e Kfor which Dkp ^ 0. The MADS convergence theory applies to refining subsequences. More details can be found in [4] (extreme barrier approach) and [7] (filter-based approach). Algorithm 1 differs from the basic MADS framework published in the literature ([4][7][9]) in several ways. The normalized poll directions are uniformly distributed on the unit sphere. The algorithm constructs a quadratic model of the objective function using a minimum Frobenius norm-based approach and a linear model of the constraints by means of regression. A quadratic programming solver then uses the model to compute a search step that accelerates the convergence. The point acceptance criterion in the search and the poll step is based on a filter instead on strict descent. The algorithm that generates the ordered poll steps and the definition of function 0 are the subject of section 3.1. The construction of the quadratic model and a more detailed description of the search step are given in section 3.2. The conditions under which a point is considered to be an improving point are given in section 3.3. The relation between the mesh index (lk), the mesh size, and the step size is the subject of section 3.4. 3.1 The poll step and the set of scaled poll directions The poll step (steps 4-6 of Algorithm 1) is the one that guarantees the convergence properties of MADS [4][7]. The scaled poll directions d e Dk are ordered according to the angle they form with the last search (s) or poll (d) direction that resulted in an improving point [9]. The function and the constraints are evaluated at points xk + d corresponding to the ordered scaled poll directions. If xk + d violates any of the bounds imposed by xL and xH it is replaced by 0(xk + d). Function 0 modifies the components of xk + d that violate bounds by replacing them with the value of the corresponding violated bound. This has the effect of sliding the point along the violated boundary. The evaluation of points in the poll step is interrupted as soon as an improving point is found (greedy evaluation). Steps 1-4, 5-6, and 7 of Algorithm 1 are also referred to in the MADS literature as the search, the poll, and 162 A. Buririen et al; Informacije Midem, Vol. 45, No. 2 (2015), 160 - 170 The set of scaled poll directions Dk is generated by applying an orthogonal transformation 0k eOn to n + 1 vectors forming a regular n simplex v (cf. [10] on how to construct v) and scaling the resulting vectors with Akp. This results in set uk = {Akp 0k v:v e V} whose members are rounded to the nearest points in Qk to obtain Dk. Index tk plays a role in ensuring the convergence properties of the algorithm and will be discussed later. The sequence of orthogonal transformations {0}Q=0 is constructed by Algorithm 2 from a sequence of realizations of a random matrix with independent normally distributed random elements {N}Q=0. Algorithm 2: Constructing a sequence of orthogonal transformations [11]. 1. Apply QR decomposition to N, resulting in Q. and R. 2. Construct diagonal matrix D. with d,. = 1 if r. > 0 . .. .. and d.. = -1 otherwise. 3. Q = Q D. Sequence {0|.}Qi=0 is uniformly distributed (i.e. distributed according to the Haar measure on On [11]). Due to this the normalized vectors from the sequence of sets {U}Q=0 are uniformly distributed (and consequently dense) on the unit sphere. Furthermore, if the mesh size parameter satisfies Akm ^ 0 the union of sets Dk is also dense on the unit sphere (which is required by the MADS convergence theorem [4]) and the distribution of normalized poll directions converges to the uniform distribution on the unit sphere. 3.2 The quadratic model and the search step MADS can be significantly improved if steps 1-4 of Algorithm 1 examine points obtained by solving a model of the original optimization problem. In the presented method a quadratic model of the objective and a linear model of the constraints are constructed. The model can be formulated as mf (x)= 1(x- xk)(x- xk)+gT (x- Xk)+f(xj (10) mc (x) = Jx - xk) + C xk (11) Where B, g, and J denote the approximate Hessian and the approximate gradient of the objective f(x), and the approximate Jacobian of the constraints c(x), respectively. The model optimization problem can now be written as arg mm mc(x)<0 mf xL< x < X (X) (12) The approximate Hessian matrix is obtained by repeatedly applying an update formula. The initial Hessian approximation is set to an all-zero matrix. Every time the algorithm evaluates three collinear points x, x + a+p, and x + ap (i.e. after every speculative step that does not violate the bounds) the directional second derivative can be approximated as -fD = d2 f (( + tp) dt2 - a _ f (( +a-p)~ f U) a_ 2 a f ( +a+ p)_ f U) _ v a+ (13) Let B and B+ denote the approximate Hessian and its updated value. When the second directional derivative is available the Hessian update formula from [12] can be used. B+= B + ( - pT Bp^ (14) It is more common the points are not collinear. In that case an update technique based on least Frobenius norm updating (LFNU) is used [13]. The proposed algorithm uses is a special case of LFNU for n + 2 points. It is applied every time a new point is evaluated to update the Hessian of the objective f. The linear part of the model is computed by means of linear regression [14]. Up to 2n+1 most recently evaluated points (x) satisfying llx - xkll < pAkp are selected for regression. The regression computes vector g for which mf(x) is the closest fit to f(x) at the selected points. Similarly the approximate Jacobian J of the constraints is obtained by fitting m(x) to c(x). Whenever a quadratic model of the problem is successfully computed (i.e. the Hessian update and the linear regression are successful) it is used for ordering the scaled poll directions instead of the smallest angle criterion. The primary criterion for model-based direction ordering is the cumulative constraint violation computed as the sum of squares of positive components in vector mc(x). The secondary criterion is the value of the quadratic model mf(x). The obtained model is used for computing a trial point for the quadratic search step (step 1 of Algorithm 1). For this purpose problem (12) is solved using a quadratic program solver [15]. The solver can handle only positive definite Hessians matrices. Therefore B is replaced with B + e I and an additional constraint of the form llx - xkll^ < p_Akp is imposed whenever B is not positive definite. The value of e > 0 is chosen by repeatedly ap- 162 A. Buririen et al; Informacije Midem, Vol. 45, No. 2 (2015), 160 - 170 plying Cholesky decomposition to B + e I for increasing values of until the decomposition succeeds [6]. 3.3 Point acceptance criterion When the initial point xg is feasible a point x can be considered as improving if it is feasible and f(x) < f(xk). Often the initial point xg is not feasible (i.e. xg t W). Such a situation occurs when one tries to solve (4) to obtain the worst-case distance and chooses 0 as the initial value of the statistical parameters. In this case the nonlinear constraints cannot be handled with the extreme barrier approach. A possible alternative is to use a point acceptance criterion based on a filter [8]. Figure 2: The regions with acceptable (light gray) and dominating (dark gray) points for an optimization problem given by f(x,, x2) = x21 + x22 (dashed contours) and c(xf x2) = - x1 - x2 + 2 (dotted contours). The points corresponding to the filter entries are marked by dark dots. The white dot marks the solution of the problem at (1, 1). h = 2. max The acceptance criterion based on a filter takes into account an improvement of the objective value, as well as an improvement of the feasibility. For that purpose a function is defined that expresses the constraint violation. h(x) = ^^ ramp( (x)) (15) For a feasible point the corresponding value of h(x) is zero. A filter entry is a tuple of the form (f(x), h(x)). A filter is a set of mutually non-dominated filter entries. A tuple (f1, h1) dominates (f2, h2) if f1 < f2, h1 < h2 and the two tuples are not equal. Initially the filter contains only (f(xg), h(xg)). A point xis said to be Figure 3: The filter entries (dark dots) and the solution (white dot) of the problem in Figure 2 in the f-h space. Dark gray and light gray regions correspond to dominating and acceptable points, respectively. - dominating if the filter is empty or (f(x), h(x)) dominates at least one filter entry, - dominated if at least one filter entry dominates (f(x), h(x)) or h(x) >hmax. - acceptable otherwise. Figure 2 and Figure 3 illustrate a 2-dimensional problem and a filter with 5 entries. Adding a point to the filter means that the corresponding filter entry (f(x), h(x)) is added to the filter. Dominating points and acceptable points are always added to the filter immediately after they are evaluated. The incumbent solution is always a member of the filter. Adding a dominating point implies that at least one dominated point is removed from the filter so that the filter entries remain mutually non-dominated. An acceptable point does not dominate any of the filter points and thus no points are removed from the filter when the corresponding entry is added. Dominated points are never added to the filter. If parameter h is set to 0, MADS behaves as if the ex- 1 max treme barrier approach had been used for handling the constraints. For every filter point its position is defined by sorting the filter entries according to h. The filter entry corresponding to a feasible point is assigned position 0 (rightmost dark dot in Figure 2, leftmost dark dot in Figure 3), while infeasible filter entries are assigned integer positions starting from 1. A point examined by the search step is considered as an improving point if it is not dominated. A point examined in the poll step 162 A. Buririen et al; Informacije Midem, Vol. 45, No. 2 (2015), 160 - 170 and in the speculative step is considered as an improving point if it is not dominated and its position is not higher than the position of the incumbent solution. This effectively requires that the poll and the search step prioritize improving feasibility over improving the objective. When the incumbent solution is feasible these two steps behave as if the extreme barrier approach had been used. 3.4 Updating the mesh and the step size Iterations of Algorithm 1 are assigned a mesh index lk with initial value l0 = 0. The mesh and the step size parameter depend on lk. Am : min(l, A-24 )/(A 0 |"l +y\ Ap = A-4 (16) (17) signed to thereby causing (Dk}keK to correspond to the complete sequence (N}Qi=0. 4 Finding the worst-case point The outline of the proposed approach comprises the same steps as [5]. The SQP-based optimization algorithm is replaced with the proposed version of MADS. The initial point in the space of statistical parameters is computed from the linearized optimization problem. An extended stopping criterion is proposed based on the approximate gradient of the circuit's performance. In the beginning xs = 0, x0 is set to the nominal value of the operating parameters x0nom, and the set of relevant statistical parameters is empty. The procedure for solving problem (4) and problem (7) is given by Algorithm 3. This strategy (see step 8 of Algorithm 1) refines the mesh and shortens the step when the algorithm is not making progress (i.e. fails to find an improving point). The mesh index is not changed if the speculative step fails to produce an improving point or if the improving point is obtained with a cut step. Otherwise the mesh is coarsened and the step size is increased, but never above its initial value. Rounding can affect the set of unrounded scaled poll directions Uk to such extent that Dk no longer positively spans Rn. The effect of rounding is more pronounced when ratio Apk/Amk is small. Because Ap i -m = Ao[1 +y\2k > A0[l + Y A k (18) the aforementioned situation can be avoided if one chooses a sufficiently large . It can be shown that y = n3/2/2 is an appropriate choice for all A0 > 1. The normalized poll directions from a refining subsequence (Dk}kE must be dense on the unit sphere [4]. This is true if the refining subsequence corresponds to the complete sequence (W.}Q.=0.Therefore index is chosen in the following manner. \ lk. for h > max lt tk = \ ' (k I 1 + max t., othervise i (k ' (19) Index t. increases from iteration to iteration with the k exception of iterations that correspond to the finest observed mesh over 0..k. As the mesh index of iterations forming a refining subsequence takes consecutive values from (0,1,2, ...} the same values are also as- Algorithm 3: One pass of the main algorithm for solving problem (4) / problem (7) w2 Solve x0 = arg min x0 —x0 - x0 ft ((X0> xs). If fi (( X0 ), set Xq : x0 . 1. Compute the approximate sensitivity of f. to statistical parameters. 2. Update the set of relevant statistical parameters and compute the initial point for step 4. 3. Solve (4) or (7) in the space of relevant statistical parameters to obtain the new value of xS. In step 1 of Algorithm 3 the set of worst operating parameters is determined. The performance corresponding to (x0, xS) is evaluated. Every operating parameter is perturbed to its respective upper and lower value resulting in the need to evaluate 2n0 points by circuit simulation. The results are used for constructing the initial vector of operating parameters (x0w'). Every component of this vector is equal to the nominal value, the upper bound, or the lower bound of the corresponding operating parameter, whichever produced the worst value of f.. The optimization in step 1 of Algorithm 3 is completed with the MADS algorithm as proposed in section 3 starting from x0w1 and using the extreme barrier approach. Steps taken by the optimizer are scaled in such manner that a step of length 1 in direction of any operating parameter corresponds to 1/16 of the difference between the upper and the lower bound. The sensitivity to the statistical parameters (step 2 of Algorithm 3) is computed at (x0xS) using forward differences. The parameters are perturbed by 1/64 of the difference between the lower and the upper bound (-10 and 10, respectively). Let Ax and Af. denote the parameter perturbation and the corresponding difference in 162 A. Buririen et al; Informacije Midem, Vol. 45, No. 2 (2015), 160 - 170 the performance, respectively. The components of the gradient with respect to the statistical parameters (VSf.) can then be approximated as Afi /Dx. The obtained sensitivity information is used for eliminating the statistical parameters that contribute little to the behavior of f : (step 3 of Algorithm 3). For this purpose the absolute performance differences | Df | are ordered and all parameters that contribute less than 1% to the total change of f : are removed in increasing order of contribution until the cumulative contribution of the removed parameters reaches 25% or there are no statistical parameters left. The remaining statistical parameters are added to the set of relevant statistical parameters. Let gS denote the estimated gradient in the space of statistical parameters. Components of the gradient not corresponding to relevant parameters are set to 0. For problem (4) the initial point is obtained by updating xS to S fi Uo,oj- g ||2 gS g S (20) For problem (7) xS is replaced with g S ß g S (21) MADS is then used for solving the main optimization problem (step 4 of Algorithm 3) in the space of statistical parameters. The value of h is chosen as max(100, max y ' h (x0)) so that the initial point is always added to the filter. The scaling of parameters is the same as in step 1 of Algorithm 3. The main optimization in case of problem (7) is stopped if the constraint satisfaction condition I c(x) | < bec and the gradient angle condition Z(VS f, -VS c) < ea are satisfied (note that c(x) is a scalar because the problem has only one nonlinear constraint). These two stopping conditions are applied only if the step satisfies Dpk < 0.5. The constraint satisfaction condition for problem (4) is formulated somewhat differently as |c(x)| < 3||gs||ec. In the presented examples ec = 10-2 and ea = 15o are used. Regardless of these stopping conditions MADS is stopped when Dpk drops below 0.01. Algorithm 3 is repeated in multiple passes until the set of relevant statistical parameters remains unchanged in step 3 and the accepted solution in step 1 of Algorithm 3 does not change f (x0 xs) by more than 1% compared to the difference between f (x0nom, 0) and f (x00) from the first pass. The following values of optimizer parameters were used: D = 4, D0 = 220, p = D2, p_ = 1. For problem (7) the gradient of the constraint with respect to the statistical parameters can be expressed explicitly as 2xS and is not computed numerically. Similarly for problem (4) the gradient and the Hessian of the objective can be expressed as 2xS and I (identity matrix), respectively. 5 Application and verification of the approach The proposed approach was implemented in the PyO-PUS framework [16] and its performance was compared to that of a commercial worst-case analysis tool WiCkeD [5]. Both algorithms were tested on two circuits: a Miller operating transconductance amplifier (OTA) in Figure 4 and a folded cascode operating transconductance amplifier (FCOTA) in Figure 5. Figure 4: Miller transconductance amplifier. Figure 5: Folded cascode transconductance amplifier. Both circuits have 3 operating parameters (temperature, supply voltage, and bias current). A mismatch model with two statistical parameters per transistor was used. Global variations of the manufacturing process were modeled with 10 statistical parameters. The circuits in Figure 4 (Figure 5) have 26 (42) statistical parameters. The results are listed in Table 1. The first and the second column list the names of the performances and their types (i.e.f > Gi or f < G). The worst-case performances at b = 3 obtained by solving problem (7) are listed in columns titled WC. The number of circuit eval- 2 162 A. Buririen et al; Informacije Midem, Vol. 45, No. 2 (2015), 160 - 170 uations and the number of algorithm passes are listed in the columns to the right of the WC column. Problem (4) is solved with Gi set to the WC value at b = 3 obtained by WiCkeD (third column). The worst-case point obtained by solving this problem lies at | |xs| | = 3. The obtained value of b is listed in columns titled WCD / and the number of circuit evaluations and algorithm passes is listed in the columns to the right of the WCD column. The results in Table 1 show that the proposed approach is capable of finding the solution of problem (7) within 5% accuracy. The two cases where the accuracy was worse than 5% are marked with an asterisk in the WC column. The settling time (rise) of the Miller OTA was different due to the noise in the performance. In case of the PSRR VSS performance of the FCOTA circuit MADS converged to a different local minimizer. A more pessimistic worst case value was found by MADS in one case (shaded cell in the table). The number of circuit evaluations required by MADS was in 7 cases (marked with an asterisk) significantly worse than that required by WiCkeD. On the other hand in two cases MADS was significantly faster than WiCkeD (shaded cells in the table). On the remaining cases both algorithms exhibited similar performance. Solving problem (4) is somewhat more challenging. The proposed approach found the same solution within 5% WiCkeD MADS Circuit / Performance type WC Evals WCD Evals WC Evals Passes WCD Evals Passes Miller OTA Swing [V] > 1.43 139 2.99 145 1.43 147 2 3.00 *176 2 Gain [dB] > 68.0 88 3.00 94 68.0 98 1 3.00 106 1 UGBW [MHz] > 1.61 93 3.00 100 1.61 98 1 3.02 116 1 Phase margin [o] > 67.3 129 3.00 123 67.3 *299 2 3.04 *438 2 CMRR [dB] > 65.2 98 3.00 104 65.3 *150 2 3.00 *166 2 PSRR VDD [dB] > 85.0 124 3.00 112 85.3 *396 3 *3.21 *398 3 PSRR VSS [dB] > 61.0 92 3.00 98 61.0 98 1 3.00 106 1 Settling i [|s] < 0.892 134 3.00 151 0.892 145 2 3.01 165 2 Settling t [|ms] < 1.04 108 3.00 116 *1.03 102 1 3.00 *195 2 Slew i [V/|ms] > 1.10 94 3.00 96 1.10 99 1 3.00 115 1 Slew t [V/|ms] > 0.953 461 3.06 196 0.960 101 1 3.09 *260 2 FCOTA Offset (high) [mV] < 11.2 124 3.02 194 11.2 *202 2 3.00 211 2 Offset (low) [mV] > -11.9 124 3.03 194 11.9 *200 2 3.00 231 2 Swing [V] > 0.478 122 3.00 127 0.476 130 1 2.95 130 1 Gain [dB] > 70.7 125 3.03 131 70.8 *291 2 3.02 *332 2 UGBW [MHz] > 6.28 130 3.00 137 6.28 133 1 3.00 149 1 Phase margin [o] > 85.6 222 3.00 227 85.6 *368 2 3.01 *493 3 CMRR [dB] > 60.8 290 3.06 265 60.8 *460 2 3.00 *456 2 PSRR VDD [dB] > 55.8 236 3.00 207 58.0 282 2 3.12 *642 2 PSRR VSS [dB] > 54.6 315 3.00 227 *58.3 249 2 *3.17 *315 2 IRN@100Hz [|V/ e Hz] < 3.17 126 3.00 133 3.17 133 3.01 151 IRN@10kHz [|V/ e Hz] < 0.321 126 3.00 133 0.321 133 3.02 154 IRN@1MHz [|V/ eHz] < 59.4 126 3.03 132 59.3 133 3.00 144 1/f corner [kHz] < 437 122 3.00 128 436 143 3.01 147 Settling i [|ms] < 0.131 142 3.00 148 0.131 135 3.00 146 Settling t [|s] < 0.127 148 3.00 154 0.127 134 3.01 180 Slew i [|s] > 4.17 207 3.00 202 4.17 203 2 3.00 *250 2 Slew t [|s] > 4.28 196 3.00 213 4.28 205 2 3.00 255 2 162 Table 1: Summary of the results obtained with WiCkeD and the proposed MADS-based algorithm. A WC/WCD value (the number of evaluations) that is more than 5% (20%) worse than the corresponding result obtained by WiCkeD is denoted by an asterisk. A. Buririen et al; Informacije Midem, Vol. 45, No. 2 (2015), 160 - 170 accuracy in all but two cases marked with an asterisk in the WCD column of Table 1. Both of them (as well as the PSRR VDD performance of FCOTA) are the result of convergence to a different local minimizer. In such cases a fair comparison with WiCkeD is not possible. When the number of circuit evaluations is considered both approaches exhibit similar performance on more than half of the performances. The cases where the proposed approach is significantly slower than WiCkeD are marked with an asterisk. All optimization problems except for two are solved in one or two algorithm passes. Both MADS and WiCkeD face the same disadvantage originating from the local nature of the underlying optimization algorithms. Due to it the obtained solution can be a local minimizer and not the actual solution of problem (4) or (7) because the outcome greatly depends on the choice of the initial point. MADS performs best on noisy nonlinear problems for which points exist where the function or the constraints cannot be evaluated (i.e. the simulator fails to converge or the circuit's performance cannot be evaluated). For such problems the finite difference approximation of the gradient cannot be computed and classical optimization methods like SQP used in commercial tools exhibit slow progress or fail. On these problems we expect MADS to outperform commercial gradient-based tools. 6 Conclusion Finding the worst performance and the worst-case distance of a circuit's performance are important subproblems that arise in the process of automated integrated circuit sizing. The solution to these problems enables the designer to verify the satisfaction of the minimum yield requirement. This is an accurate and less costly alternative to yield estimation by Monte-Carlo analysis. An approach for solving both problems by means of MADS was presented. Several extensions were implemented in the general MADS framework that make it possible for the algorithm to rapidly close in on the solution of the optimization problem. The proposed algorithm was tested on two real world integrated circuit design problems. The results were compared to the results obtained with a commercial worst-case analysis tool (WiCkeD) that uses a gradient-based optimization algorithm. The results show the proposed approach is competitive with the approach used in WiCkeD. 7 Acknowledgements The research was co-funded by the Ministry of Education, Science, and Sport (Ministrstvo za Šolstvo, Znanost in Šport) of the Republic of Slovenia through the programme P2-0246 Algorithms and optimization methods in telecommunications. 8 References 1. K. Papathanasiou, "A designer's approach to device mismatch: Theory, modeling, simulation techniques, scripting, applications and examples', Analog Integrated Circuits and Signal Processing, vol. 48, no. 2, pp. 95-106, 2006. 2. H. E. Graeb, "Analog design centering and sizing', Springer, 2007. 3. A. Singhee, R. A. Rutenbar (eds.), "Extreme Statistics in Nanoscale Memory Design", Springer, 2010. 4. C. Audet, J. E. Dennis, Jr., "Mesh adaptive direct search algorithms for constrained optimization', SIAM Journal on Optimization, vol. 17, no. 1, pp. 188-217, 2006. 5. MunEDA inc, "WiCkeD, a tool suite for nominal and statistical custom IC design', available at http://www.muneda.com/Products/, 2014. 6. J. Nocedal, S.Wright, "Numerical optimization', Springer, 2006. 7. C. Audet, J. E. Dennis, Jr., "A progressive barrier for derivative-free nonlinear programming', SIAM Journal on Optimization, vol. 20, no. 1, pp. 445472, 2009. 8. R. Fletcher, S. Leyffer, "Nonlinear programming without a penalty function', Mathematical Programming, vol. 91, no. 2, pp. 239-270, 2002. 9. C. Audet, A. Ianni, S. Le Digabel, C. Tribes, "Reducing the Number of Function Evaluations in Mesh Adaptive Direct Search Algorithms", SIAM Journal on Optimization, vol. 24, no. 2, pp. 621-642, 2014. 10. A. R. Conn, K. Scheinberg, L. N . Vincente, "Introduction to derivative-free optimization', SIAM, 2009. 11. G. W. Stewart, "The efficient generation of random orthogonal matrices with an application to condition estimators', SIAM Journal on Numerical Analysis, vol. 17, no. 3, pp. 403-409, 1980. 12. D. Leventhal, A. S. Lewis, "Randomized Hessian estimation and directional search", Optimization: A Journal of Mathematical Programming and Operations Research, vol. 60, no. 3, pp. 329-345, 2011. 13. M. J. D. Powell, "Least Frobenius norm updating of quadratic models that satisfy interpolation conditions', Mathematical Programming, vol. 100, no. 1, pp. 183-215, 2003. 162 A. Buririen et al; Informacije Midem, Vol. 45, No. 2 (2015), 160 - 170 14. A. L. Custodio, L. N. Vincente, "Using sampling and simplex derivatives in patters search methods', SIAM Journal on Optimization, vol. 18, no. 2, pp. 537-555, 2007. 15. M. S. Andersen, J. Dahl, L. Vandenberghe,"CVXOPT, Release 1.1.6", available at http://cvxopt.org/user-guide/index.html, 2014. 16. "PyOPUS - Circuit Simulation and Optimization', available at http://fides.fe.uni-lj.si/pyopus/, 2014. Arrived: 12. 02. 2015 Accepted: 09. 05. 2015 162 Call for papers Informacije imidem Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 2 (2015), 171 - 171 MIDEM 2015 551st INTERNATIONAL CONFERENCE ON MICROELECTRONICS, DEVICES AND MATERIALS WITH THE WORKSHOP ON TERAHERTZ AND MICROWAVE SYSTEMS Announcement and Call for Papers September 23th - 25th, 2015 Hotel Golf, Bled, Slovenia ORGANIZER: MIDEM Society - Society for Microelectronics, Electronic Components and Materials, Ljubljana, Slovenia CONFERENCE SPONSORS: Slovenian Research Agency; IMAPS, Slovenian Chapter; IEEE, Slovenian Section; Zavod TC SEMTO. GENERAL INFORMATION The 51th International Conference on Microelectronics, Electronic Components and Devices with the Workshop on Terahertz and Microwave Systems continues a successful tradition of the annual international conferences organised by the MIDEM Society, the Society for Microelectronics, Electronic Components and Materials. The conference will be held at Hotel Golf, Bled, Slovenia, well-known resort and conference centre, from SEPTEMBER 23th - 25th, 2015. Topics of interest include but are not limited to: - Workshop focus: Terahertz and Microwave Systems - Novel monolithic and hybrid circuit processing techniques, - New device and circuit design, - Process and device modelling, - Semiconductor physics, - Sensors and actuators, - Electromechanical devices, Microsystems and nano- systems, - Nanoelectronics - Optoelectronics, - Photonics, - Photovoltaic devices, - New electronic materials and applications, - Electronic materials science and technology, - Materials characterization techniques, - Reliability and failure analysis, - Education in microelectronics, devices and materials. ABSTRACT AND PAPER SUBMISSION: Prospective authors are cordially invited to submit up to 1 page abstract before May 1st, 2015. Please, identify the contact author with complete mailing address, phone and fax numbers and e-mail address. After notification of acceptance (June 15th, 2015), the authors are asked to prepare a full paper version of six pages maximum. Papers should be in black and white. Full paper deadline in PDF and DOC electronic format is: August 31st, 2015. IMPORTANT DATES: - Abstract deadline: May 1st, 2015 (1 page abstract or full paper) - Notification of acceptance: June 15th, 2015 - Deadline for final version of manuscript: August 31st, 2015 Invited and accepted papers will be published in the conference proceedings. Deatailed and updated information about the MIDEM Conferences is available at http://www.midem-drustvo.si/ under Conferences. 171 Boards of MIDEM Society | Organi društva MIDEM MIDEM Executive Board | Izvršilni odbor MIDEM President of the MIDEM Society | Predsednik društva MIDEM Prof. Dr. Marko Topič, University of Ljubljana, Faculty of Electrical Engineering, Slovenia Vice-presidents | Podpredsednika Prof. Dr. Barbara Malič, Jožef Stefan Institute, Ljubljana, Slovenia Dr. Iztok Šorli, MIKROIKS, d. o. o., Ljubljana, Slovenija Secretary | Tajnik Olga Zakrajšek, UL, Faculty of Electrical Engineering, Ljubljana, Slovenija MIDEM Executive Board Members | Člani izvršilnega odbora MIDEM Prof. Dr. Slavko Amon, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Darko Belavič, In.Medica, d.o.o., Šentjernej, Slovenia Prof. Dr. Bruno Cvikl, UM, Faculty of Civil Engineering, Maribor, Slovenia Prof. DDr. Denis Donlagič, UM, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia Prof. Dr. Leszek J. Golonka, Technical University Wroclaw, Poland Leopold Knez, Iskra TELA d.d., Ljubljana, Slovenia Dr. Miloš Komac, UL, Faculty of Chemistry and Chemical Technology, Ljubljana, Slovenia Prof. Dr. Miran Mozetič, Jožef Stefan Institute, Ljubljana, Slovenia Jožef Perne, Zavod TC SEMTO, Ljubljana, Slovenia Prof. Dr. Giorgio Pignatel, University of Perugia, Italia Prof. Dr. Janez Trontelj, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Supervisory Board | Nadzorni odbor Prof. Dr. Franc Smole, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Mag. Andrej Pirih, Iskra-Zaščite, d. o. o. , Ljubljana, Slovenia Dr. Slavko Bernik, Jožef Stefan Institute, Ljubljana, Slovenia Court of honour | Častno razsodišče Emer. Prof. Dr. Jože Furlan, UL, Faculty of Electrical Engineering, Slovenia Prof. Dr. Radko Osredkar, UL, Faculty of Computer and Information Science, Slovenia Franc Jan, Kranj, Slovenia Informacije MIDEM Journal of Microelectronics, Electronic Components and Materials ISSN 0352-9045 Publisher / Založnik: MIDEM Society / Društvo MIDEM Society for Microelectronics, Electronic Components and Materials, Ljubljana, Slovenia Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale, Ljubljana, Slovenija www.midem-drustvo.si