ISSN 0352-9045 Journal of Microelectronics, Electronic Components and Materials Vol. 53, No. 2(2023), June 2023 Revija za mikroelektroniko, elektronske sestavne dele in materiale letnik 53, številka 2(2023), Junij 2023 UDK 621.3:(53+54+621+66)(05)(497.1)=00 ISSN 0352-9045 Informacije MIDEM 2-2023 Journal of Microelectronics, Electronic Components and Materials VOLUME 53, NO. 2(186), LJUBLJANA, JUNE 2023 | LETNIK 53, NO. 2(186), LJUBLJANA, JUNIJ 2023 Published quarterly (March, June, September, December) by Society for Microelectronics, Electronic Components and Materials - MIDEM. Copyright © 2023. All rights reserved. | Revija izhaja trimesečno (marec, junij, september, december). Izdaja Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale – Društvo MIDEM. Copyright © 2023. Vse pravice pridržane. Editor in Chief | Glavni in odgovorni urednik Marko Topič, University of Ljubljana (UL), Faculty of Electrical Engineering, Slovenia Editor of Electronic Edition | Urednik elektronske izdaje Kristijan Brecl, UL, Faculty of Electrical Engineering, Slovenia Associate Editors | Odgovorni področni uredniki Vanja Ambrožič, UL, Faculty of Electrical Engineering, Slovenia Arpad Bürmen, UL, Faculty of Electrical Engineering, Slovenia Danjela Kuščer Hrovatin, Jožef Stefan Institute, Slovenia Matija Pirc, UL, Faculty of Electrical Engineering, Slovenia Franc Smole, UL, Faculty of Electrical Engineering, Slovenia Matjaž Vidmar, UL, Faculty of Electrical Engineering, Slovenia Editorial Board | Uredniški odbor Mohamed Akil, ESIEE PARIS, France Giuseppe Buja, University of Padova, Italy Gian-Franco Dalla Betta, University of Trento, Italy Martyn Fice, University College London, United Kingdom Ciprian Iliescu, Institute of Bioengineering and Nanotechnology, A*STAR, Singapore Marc Lethiecq, University of Tours, France Teresa Orlowska-Kowalska, Wroclaw University of Technology, Poland Luca Palmieri, University of Padova, Italy Goran Stojanović, University of Novi Sad, Serbia International Advisory Board | Časopisni svet Janez Trontelj, UL, Faculty of Electrical Engineering, Slovenia - Chairman Cor Claeys, IMEC, Leuven, Belgium Denis Đonlagić, University of Maribor, Faculty of Elec. Eng. and Computer Science, Slovenia Zvonko Fazarinc, CIS, Stanford University, Stanford, USA Leszek J. Golonka, Technical University Wroclaw, Wroclaw, Poland Jean-Marie Haussonne, EIC-LUSAC, Octeville, France Barbara Malič, Jožef Stefan Institute, Slovenia Miran Mozetič, Jožef Stefan Institute, Slovenia Stane Pejovnik, UL, Faculty of Chemistry and Chemical Technology, Slovenia Giorgio Pignatel, University of Perugia, Italy Giovanni Soncini, University of Trento, Trento, Italy Iztok Šorli, MIKROIKS d.o.o., Ljubljana, Slovenia Hong Wang, Xi´an Jiaotong University, China Headquarters | Naslov uredništva Uredništvo Informacije MIDEM MIDEM pri MIKROIKS Stegne 11, 1521 Ljubljana, Slovenia T. +386 (0)1 513 37 68 F. + 386 (0)1 513 37 71 E. info@midem-drustvo.si www.midem-drustvo.si Annual subscription rate is 160 EUR, separate issue is 40 EUR. MIDEM members and Society sponsors receive current issues for free. Scientific Council for Technical Sciences of Slovenian Research Agency has recognized Informacije MIDEM as scientific Journal for microelectronics, electronic components and materials. Publishing of the Journal is cofinanced by Slovenian Research Agency and by Society sponsors. Scientific and professional papers published in the journal are indexed and abstracted in COBISS and INSPEC databases. The Journal is indexed by ISI® for Sci Search®, Research Alert® and Material Science Citation Index™. | Letna naročnina je 160 EUR, cena posamezne številke pa 40 EUR. Člani in sponzorji MIDEM prejemajo posamezne številke brezplačno. Znanstveni svet za tehnične vede je podal pozitivno mnenje o reviji kot znanstveno-strokovni reviji za mikroelektroniko, elektronske sestavne dele in materiale. Izdajo revije sofinancirajo ARRS in sponzorji društva. Znanstveno-strokovne prispevke objavljene v Informacijah MIDEM zajemamo v podatkovne baze COBISS in INSPEC. Prispevke iz revije zajema ISI® v naslednje svoje produkte: Sci Search®, Research Alert® in Materials Science Citation Index™. Design | Oblikovanje: Snežana Madić Lešnik; Printed by | tisk: Biro M, Ljubljana; Circulation | Naklada: 1000 issues | izvodov; Slovenia Taxe Percue | Poštnina plačana pri pošti 1102 Ljubljana Journal of Microelectronics, Electronic Components and Materials vol. 53, No. 2(2023) Content | Vsebina Original scientific papers Izvirni znanstveni članki Y. Li, D. Sang, M. Li, X. Li, T. Wang, B. O. Mohammed: A New Quantum-Based Building Block for Designing a Nano-Circuit with Lower Complexity 57 Y. Li, D. Sang, M. Li, X. Li, T. Wang, B. O. Mohammed: Nov gradnik na kvantni osnovi za načrtovanje nano vezja z manjšo kompleksnostjo R. K. Pandey, V. Bhadauria,, V.K.Singh: High-Gain Super Class-AB Bulk-driven Sub-threshold Low-Power CMOS Transconductance Amplifier for Biomedical Applications 65 R. K. Pandey, V. Bhadauria,, V.K.Singh: Ojačevalnik prevodnosti CMOS z nizko močjo in velikim ojačenjem Super Class-AB za biomedicinske aplikacije R. Pilipović, P. Bulić, U. Lotrič: An Energy-efficient and Accuracy-adjustable bfloat16 Multiplier 79 R. Pilipović, P. Bulić, U. Lotrič: Energijsko učinkovit približni množilnik v zapisu bfloat16 z nastavljivo natančnostjo L. Khanfir, J. Mouine: A New Design Optimization Methodology of Fully Differential Dynamic Comparator 87 L. Khanfir, J. Mouine: Nova metodologija optimizacije zasnove polnega diferencialnega dinamičnega komparatorja Ž. Rojec: Towards Smaller Single-point Failure-resilient Analog Circuits by Use of a Genetic Algorithm 103 Ž. Rojec: Manjšanje analognih vezij odpornih na odpoved poljubne komponente z uporabo genetskega algoritma Front page: Topology evolution of analogue electrical circuits using evolutionary algorithms (Ž. Rojec) Naslovnica: Razvoj topologije analognih električnih vezij z uporabo evolucijskih algoritmov (Ž. Rojec) 55 56 Original scientific paper https://doi.org/10.33180/InfMIDEM2023.201 Journal of Microelectronics, Electronic Components and Materials Vol. 53, No. 2(2023), 57 – 64 A New Quantum-Based Building Block for Designing a Nano-Circuit with Lower Complexity Yao Li1, Dong Sang1, Min Li1, Xiaofang Li1, Tiantian Wang1, Bayan Omar Mohammed2 School of Computing, Weifang University of Science and Technology, Weifang Shandong, China Development Center for Research and Training, College of Science and Technology, University of Human Development, Sulaimani, Kurdistan Region, Iraq 1 2 Abstract: Next-generation nano-scale computational systems are being hampered by two significant obstacles: shrinking transistor size and power dissipation. Moore’s law does not hold when transistor size reaches the atomic level. So, it becomes necessary to investigate alternative technologies that surpass traditional Complementary Metal Oxide Semiconductor (CMOS) technology’s physical constraints. Quantum Dot Cellular Automata (QCA), a transistor-free computational paradigm, is thought to be the best alternative to CMOS technology for designing nano-scale logic circuits. However, not many designs cut energy usage and offer straightforward access to inputs and outputs. Moreover, adders, the primary component in logic circuits and digital arithmetic, are crucial in developing several efficient QCA designs. In this context, the 4-bit Ripple Carry Adder (RCA) is a straightforward type of adder that can help produce circuits with minimal necessary space and power consumption because of its exceptional qualities. The synthesis of high-level logic further demonstrates the design’s effectiveness. The outcomes of QCADesigner demonstrated that the proposed circuits are less complicated and use less power than earlier designs compared to conventional design approaches. Keywords: Nanotechnology; Quantum-dot cellular automata; XOR gate; Majority voter gate; Full adder; Ripple Carry Adder Nov gradnik na kvantni osnovi za načrtovanje nano vezja z manjšo kompleksnostjo Izvleček: Računalniške sisteme naslednje generacije v nano merilu ovirata dve pomembni oviri: zmanjševanje velikosti tranzistorjev in razprševanje energije. Moorov zakon ne velja, ko velikost tranzistorja doseže atomsko raven. Zato je treba raziskati alternativne tehnologije, ki presegajo fizikalne omejitve tradicionalne tehnologije kovinsko oksidnih polprevodnikov (CMOS). Quantum Dot Cellular Automata (QCA), računska paradigma brez tranzistorjev, naj bi bila najboljša alternativa tehnologiji CMOS za načrtovanje logičnih vezij nano velikosti. Vendar pa ni veliko zasnov, ki bi zmanjšale porabo energije in omogočile neposreden dostop do vhodov in izhodov. Poleg tega so seštevalniki, glavna komponenta v logičnih vezjih in digitalni aritmetiki, ključni pri razvoju več učinkovitih zasnov QCA. V tem kontekstu je 4-bitni Ripple Carry Adder (RCA) enostavna vrsta seštevalnika, ki lahko zaradi svojih izjemnih lastnosti pomaga pri izdelavi vezij z minimalno potrebnim prostorom in porabo energije. Sinteza logike visoke ravni dodatno dokazuje učinkovitost zasnove. Rezultati programa QCADesigner so pokazali, da so predlagana vezja manj zapletena in porabijo manj energije kot prejšnje zasnove v primerjavi z običajnimi pristopi načrtovanja. Ključne besede: nanotehnologija; kvantni točkovni celični avtomati; vrata xor; vrata večinskega volivca; popolni seštevalnik; ripple carry adder * Corresponding Author’s e-mail: liyao@wfust.edu.cn 1 Introduction a different technology for the next Integrated Circuits (ICs) and diode-based technologies [2-4]. To address the issues with CMOS technology [5], VLSI designers are looking into a number of other technologies, including Quantum-dot Cellular Automata (QCA), single High leakage power and sub-node scaling of 22 nm technology are issues that transistor-based technologies must deal with [1]. These problems motivate designers of Very Large-Scale Integration (VLSI) to create How to cite: Y. Li et al., “A New Quantum-Based Building Block for Designing a Nano-Circuit with Lower Complexity", Inf. Midem-J. Microelectron. Electron. Compon. Mater., Vol. 53, No. 2(2023), pp. 57–64 57 Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64 electron transistors, and tunnel field effect transistors. Compared to competing technologies, QCA technology provides a number of advantages, including a smaller footprint need, quick switching times, and reduced power dissipation [6]. context of QCA, “micro” refers to the individual components or elements of the system, namely the quantum dots. These quantum dots are the building blocks of QCA and serve as the basic units of information processing [13]. In the cell, a tunnel junction connecting two pairs of quantum dots allows for the passage of two electrons between them. The two electrons are positioned in the cell at opposite ends because of Coulombic repulsion [14, 15]. In the context of QCA (Quantum-dot Cellular Automata), nonlinear and linear refer to different types of behavior exhibited by the system. Linear QCA refers to a system where quantum dots’ behavior can be described using linear operations, similar to classical digital logic gates, while nonlinear QCA involves more complex interactions between quantum dots, resulting in nonlinearity due to quantum effects like Coulomb interactions and electron tunneling [16]. There is no cell-to-cell tunneling; tunneling only takes place within the cell. Bisectional behavior results from the interaction of the discrete electronic charge, Coulombic repulsion, and quantum confinement. Binary “0” and “1” with polarisations of “1” and “+1”, respectively, can be represented by the two charge configurations. A QCA “wire” is a chain of cells contiguous to one another, as opposed to a physical wire, as depicted in Figure 1 (b). As there are no electron tunnels between cells, QCA offers a method of information transfer without current flow [17]. QCA is a very intriguing and well-liked technology for creating nano-scale logic circuits. There are no transistors in the QCA technology. The QCA cell, which comprises 4 quantum dots, is the fundamental unit of QCA [7]. This method is energy-efficient since there is no actual charge movement between QCA cells. Logical values are determined based on the electrons’ location in quantum dots. Due to Coulombic contact, electrons in a QCA cell are situated at the opposing corners. There are two logics 1 or 0 values in each cell. On the other hand, these advantages led researchers to develop a number of projects that explain how to construct QCA circuits [8]. Adders, SRAM [9], ALUs, switching, encoderdecoders [10], reversible logic, and memories are just a few of the recently invented circuits. Full adders play a very prominent part in digital circuits since they are employed in the creation of logical and mathematical processes [11]. Therefore, building a QCA-based adder with reduced space, shorter delays, straightforward access to inputs and outputs, and lower complexity will be more crucial than ever [11]. This paper uses a novel, low-complexity, and low-power three-layer full adder circuit to suggest a new QCA-based ripple carry adder (RCA) design for improving the previous designs. With simple access to inputs and outputs, XOR and majority gates were used to create an RCA circuit, and the results were compared to earlier designs. All protected Nano-communication networks [12] are designed using adders and RCA designs. QCADesigner-E as a usually used tool for power analysis, will be utilized in this paper for simulation and assessment. 1.2 QCA Logic Gates The fundamental gates of QCA are inverters and threeinput majority gates. A majority gate comprises 4 cells that achieve the function of M (a, b, and c) = ab+bc+ac, as shown in Figure 2 (a) [18]. Cells are placed diagonally from one another to achieve the inversion functionality, as shown in Figure 2 (b). Inverters and majority gates make up a universal set that can be employed to implement any logic operation. By setting one of the The structure of this essay is as follows. The background of QCA is presented in Section 2, with a focus on its distinctive cells. The 4-bit RCA’s detailed architecture is shown in Section 3. Section 4 displays the simulation’s findings. Finally, the paper is concluded in the last section. 2 QCA background and related works This section discusses the important and basic parts of this technology and the best previous works related to the subject. 1.1 QCA Cells and Wires Figure 1: Structure of basic QCA: (a) QCA cells, and (b) QCA wire. A QCA cell is a square nanostructure with four quantum dots (micro), roughly as shown in Figure 1(a). In the 58 Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64 majority gate inputs to “0,” for instance, AND (a, b) = M (a, b, 0) = ab, a two-input AND gate is realized. In the same manner, an OR gate is implemented by setting one input to “1,” i.e., OR (a,b) =M (a,b,1) = ab + b 1 + a 1 = a + b [19]. with minimum complexity and high speed. It has a delay of 1.25 clock cycles and 209 cells in a 0.3 µm2 area. Also, the fundamental QCA and QCA-based digital design concepts have been put out by Chan, et al. [24]. The creation of straightforward digital logic utilizing certain QCA approaches has been discussed in this article. The four-bit ripple adder has been provided using a combinational notion from the traditional RCA and the CLA. These circuits were implemented utilizing the 5-input majority gate, which theoretically can lower the latency of the traditional QCA-based RCA. The recommended adder has a latency of 3.25 clock cycles, an area of 2.5 µm2, and 1246 cells. The designed structures have been verified using the QCADesigner. Finally, Hashemi and Navi [25] suggest a reliable QCA and an RCA full adder circuit based on a successful five-input majority gate. These circuits have employed a robust crossover design in comparison to similar designs. Owing to the full adder circuit’s efficient architecture, it has been employed for RCA design in a variety of scales. The coherent and bistable simulation engines of the QCADesigner have used to simulate the suggested designs. The proposed RCA uses 442 cells with an area of 1 µm2 and a delay of 2 clock cycles. Figure 2: Structure of basic QCA: (a) Three-input majority gate, and (b) Inverter gate. 1.3 QCA Clocking To drastically reduce metastability issues and enable long pipelines, adiabatic switching is used for QCA clocking. One-half of the wire is used for signal transmission during each clock cycle, and the other half is left unpolarized [20]. The cells in the active clock zone that is still present cause the newly activated cells to become polarized during the subsequent clock cycle, which deactivates half of the previously active clock zone [21]. As a result, signals continue from one clock zone to the next. Four-phase clock signals are used to control four different circuit areas. Each zone of the clock signal has four states: high, low, low to high, and high. When the status changes from high to low, the cell starts to calculate and keeps the value while the state is low. The cell is released when the clock is in the low-to-high state and not operating [22]. 2 Proposed design This part presents and simulates new designs and effective architectures for a one-bit QCA full adder and four-bit QCA RCA. One-bit QCA full adder block diagram is illustrated in Figure 3, and the exploited full adder’s QCA-based layout with a three-input majority gate and three-input XOR gate is shown in Figure 4. This complete adder comprises 15 cells and uses 0.5 clock cycles to generate outputs with a 0.01 µm2 area and simple input and output connectivity. This threelayer implementation of a QCA full adder uses ordinary QCA cells. Input cells are A, B, and C, and output cells are COUT and SUM. In this design, the first layer acts as an XOR gate and is used to generate the SUM, while the second layer is utilised to transmit values to the third layer, where all of the circuit’s inputs are applied and the COUT output is generated. 1.4 Related work This section reviews numerous significant and useful recommendations for the design of sophisticated and straightforward QCA RCA circuit designs. Abedi, et al. [23]. propose a cross-level QCA architecture in a full adder QCA design. Additionally, supplied proposed a RCA that is based on this design. Using QCADesigner, these designs have been accuracy-tested and assessed. Compared to earlier methods, conventional evaluation methodology and particular cost function QCA were applied for superior performance. The suggested RCA has a delay period of 1.75 clock cycles and uses 262 cells in a 0.208 µm2 area. Also, Balali and Rezai [14] proposed a QCA structure for the full adder to create a high-speed, efficient, and reliable four-bit RCA using the QCA technology. Their modeling results have demonstrated that there are significant increases in circuit speed and latency. To verify the accuracy of these designs, QCADesigner was employed. The four-bit RCA that is suggested in the QCA technology is designed Figure 3: QCA-based full adder diagram The proposed adder can easily implement the higher adder designs. Higher adders, such as 4-bit RCA, have been designed using this Complete adder with fewer QCA cells, which is entirely distinct from earlier ver59 Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64 Figure 4: QCA-based full adder layouts and layers Table 1: Simulation parameters sions. The proposed four-bit RCA design is illustrated in Figure 5 with its structure. Also, a four-bit QCA-based RCA that uses four one-bit full adder QCA-based circuits as its structural unit is also depicted in Figure 6. The 72 cells in the suggested four-bit QCA-based RCA have an area of 0.11 µm2 and a delay of 1.75 clock cycles. All of the inputs and outputs on this three-layer circuit are accessible. There are five outputs (S0-S3, COUT) and 9 inputs (A0-A3, B0-B3, C). The outputs in this design are easily accessible because they are not encircled by other cells. To transfer signal output, this structure does not need a wire in other words. Thus, it is simple to feed the outputs to another QCA input. Parameter Cell size Radius of effect Bistable approxima- Coherence Vection engine Value tor engine Value 18 *18 nm2 18 *18 nm2 65 nm Relative per12.9000000 mittivity Clock high Clock low Clock amplitude factor Clock shift Layer separation Maximum iterations per sample Number of samples Convergence tolerance Figure 5: The proposed schematic for 4-bit RCA 80 nm 12.9000000 9.8e−22J 3.8e−23J 9.8e−22J 3.8e−23J 2.000000 2.000000 0.000000e+000 0.000000e+000 11.5000 nm 11.5000 nm 100 - 12800 - 0.001000 - The constructed full adder circuit simulation results are shown in Figure 7. All possible states have been applied to the circuit’s inputs, and the outputs have created the desired outcomes, as shown in the correct table. Both outputs are formed concurrently after two clock cycles. The third layer of this full adder, designed in three layers, receives the three inputs and processes them to produce the COUT output from the third layer and the SUM output from the first layer. The accuracy of the suggested designs was demonstrated by these simulations, which were run using the default settings. Tables 2 and 3 compare the supplied full adder and RCA circuit cell, latency, and space usage to the best previous designs. Figure 6: Three layers of the proposed QCA-based 4-bit RCA 3 Simulation tool and results The software QCADesigner-E is used in this paper to simulate the suggested design [26]. Fast design, layout, and simulation of QCA circuits are made possible by QCADesigner software. Table 1 contains all of the simulation parameters for the simulated objects. The default parameters for all simulation measures and conditions are used in this tool [27]. 60 Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64 Table 2: Comparisons among the designs Area (µm2) Proposed design 0.01 Ahmadpour, et al. [28] 0.01 Seyedi and Navimipour [6] 0.01 Sarmadi, et al. [29] 0.04 Sayedsalehi, et al. [30] 0.02 Designs Cells 15 20 22 30 33 Delay (Clock cycle) 0.5 0.5 0.75 1.0 0.75 Figure 8 displays the simulation results for the QCAbased RCA circuit. The circuit generates the proper output when subjected to every possible condition. Actually, Figure 8 displays the outcomes of the simulation for the variables A0, A1, A2, A3, B0, B1, B2, B3, and C. As depicted in the figure, the circuit receives input from all potential states and generates the desired output. Also, simulation results show strong polarization of the output cells for this circuit. Figure 8: Simulation result of the proposed RCA Table 3: Comparisons among the RCA designs Area (µm2) Proposed design 0.11 Balali and Rezai [31] 0.3 Sonare [32] 0.51 Rashidi and Rezai [33] 0.14 Abedi, et al. [23] 0.208 Mohammadi, et al. [34] 0.24 Labrado and Thapliyal [35] 0.3 Designs Figure 7: Simulation outcomes of the proposed design 61 Cells 72 209 366 175 262 237 295 Delay (Clock cycle) 1.75 1.25 2/5 1 1.75 1.5 1.5 Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64 Table 4: Comparison of total and average energy dissipation Designs Proposed full adder design Ahmadpour, et al. [28] Seyedi and Navimipour [6] Sarmadi, et al. [29] Sayedsalehi, et al. [30] Proposed RCA design Balali and Rezai [31] Sonare [32] Rashidi and Rezai [33] Abedi, et al. [23] Mohammadi, et al. [34] Labrado and Thapliyal [35] The suggested full adder consists of 15 cells and achieves output generation in 0.5 clock cycles. It occupies an area of 0.01 µm2 and features straightforward input and output connectivity. Additionally, the suggested four-bit QCA-based RCA incorporates 72 cells, covering an area of 0.11 µm2. The RCA exhibits a delay of 1.75 clock cycles. In this study, QCADesigner-E assessed the total power dissipation of the QCA structure. These circuits have one of the best power consumption rates and are easily accessible to the inputs and outputs. In the future, high-speed adders can be designed that play an essential role in multi-layer designs and further improve computational performance. Highperformance QCA circuits and an n-bit ripple carry adder can be created at the nanoscale using the given effective architectures. The suggested concept may therefore have a fundamental impact on the development of high-speed circuits as well as other forms of adders, such as complete subtractors and borrow ripple subtractors. Power and energy analysis Total energy Average energy dissipation (eV) dissipation (eV) 1.458 1.057 1.25 1.15 1.55 1.56 1.80 1.87 1.69 1.55 2.80 2.63 2.48 2.59 2.74 2.70 3.02 2.98 3.56 3.15 2.89 3.12 2.485 2.84 5 Conflict of Interest The authors declare that they have no conflicts of interest. 6 References Additionally, we compared the suggested designs in Table 4 to the best current designs in terms of Total energy dissipation (eV) and Average energy dissipation in order to better comprehend and compare circuits (eV). It is obvious that the current design is the most energyefficient one. 1. 2. 4 Conclusion and future works A new and emerging technology that plays a significant role in nanotechnology and has been researched for years is QCA technology. Considering the advantages of QCA, such as fast switching time, low power requirement, and high device density, it can be a good alternative. According to the cases mentioned in this article, this technology has been used to implement adder circuits. In fact, it creates an innovative architecture for a 1-bit QCA full adder. Then, applying this innovative full adder layout, a high-speed adder is developed as a 4-bit RCA. Our study effort is shown to provide fewer cells and smaller areas with realistic simulation results compared to the newly published collector architecture. The presented multi-layer architecture is significantly more durable than the conventional full adder. 3. 4. 5. 62 T. Tuncer, E. Avaroglu, M. Türk, and A. B. Ozer, “Implementation of non-periodic sampling true random number generator on FPGA,” Informacije Midem, vol. 44, pp. 296-302, 2014. M. A. S. Bhuiyan, “CMOS series-shunt single-pole double-throw transmit/receive switch and low noise amplifier design for internet of things based radio frequency identification devices,” Informacije MIDEM, vol. 50, pp. 105-114, 2020. H. Tian, J. Liu, Z. Wang, F. Xie, and Z. Cao, “Characteristic Analysis and Circuit Implementation of a Novel Fractional-Order Memristor-Based Clamping Voltage Drift,” Fractal and Fractional, vol. 7, p. 2, 2022. J. Xiang, W. Yang, H. Liao, P. Li, Z. Chen, and J. Huang, “Design and thermal performance of thermal diode based on the asymmetric flow resistance in vapor channel,” International Journal of Thermal Sciences, vol. 191, p. 108345, 2023. S. Li, J. Chen, X. He, Y. Zheng, C. Yu, and H. Lu, “Comparative study of the micro-mechanism of charge redistribution at metal-semiconductor and semimetal-semiconductor interfaces: Pt (Ni)MoS2 and Bi-MoS2 (WSe2) as the prototype,” Applied Surface Science, vol. 623, p. 157036, 2023. Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. S. Seyedi and N. J. Navimipour, “An optimized design of full adder based on nanoscale quantumdot cellular automata,” Optik, vol. 158, pp. 243256, 2018. S. Seyedi and N. J. Navimipour, “Designing a three-level full-adder based on nano-scale quantum dot cellular automata,” Photonic Network Communications, vol. 42, pp. 184-193, 2021. S. Seyedi and N. Jafari Navimipour, “Designing a multi‐layer full‐adder using a new three‐input majority gate based on quantum computing,” Concurrency and Computation: Practice and Experience, vol. 34, p. e6653, 2022. A. Yan, J. Xiang, A. Cao, Z. He, J. Cui, T. Ni, et al., “Quadruple and Sextuple Cross-Coupled SRAM Cell Designs With Optimized Overhead for Reliable Applications,” IEEE Transactions on Device and Materials Reliability, vol. 22, pp. 282-295, 2022. W. Dang, S. Liao, B. Yang, Z. Yin, M. Liu, L. Yin, et al., “An encoder-decoder fusion battery life prediction method based on Gaussian process regression and improvement,” Journal of Energy Storage, vol. 59, p. 106469, 2023. A. Kamaraj and P. Marichamy, “Design of faulttolerant reversible floating point division,” Informacije MIDEM, vol. 48, pp. 161-172, 2018. Z. Qu, X. Liu, and M. Zheng, “Temporal-Spatial Quantum Graph Convolutional Neural Network Based on Schrödinger Approach for Traffic Congestion Prediction,” IEEE Transactions on Intelligent Transportation Systems, 2022. X. Jianhua, D. Liangming, C. Zhou, H. Zhao, J. Huang, and T. Sulian, “Heat Transfer Performance and Structural Optimization of a Novel Microchannel Heat Sink,” Chinese Journal of Mechanical Engineering= Ji xie gong cheng xue bao, vol. 35, 2022. A. Kamaraj, P. Marichamy, and R. Abirami, “MULTI-PORT RAM DESIGN IN QCA USING LOGICAL CROSSING,” Informacije MIDEM, vol. 51, pp. 49-61, 2021. J. Gao, H. Sun, J. Han, Q. Sun, and T. Zhong, “Research on recognition method of electrical components based on FEYOLOv4-tiny,” Journal of Electrical Engineering & Technology, vol. 17, pp. 3541-3551, 2022. S. Xu, H. Dai, L. Feng, H. Chen, Y. Chai, and W. X. Zheng, “Fault Estimation for Switched Interconnected Nonlinear Systems with External Disturbances via Variable Weighted Iterative Learning,” IEEE Transactions on Circuits and Systems II: Express Briefs, 2023. S. Seyedi, B. Pourghebleh, and N. Jafari Navimipour, “A new coplanar design of a 4‐bit ripple carry adder based on quantum‐dot cellular autom- 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 63 ata technology,” IET Circuits, Devices & Systems, vol. 16, pp. 64-70, 2022. R. M. Macrae, “Mixed-valence realizations of quantum dot cellular automata,” Journal of Physics and Chemistry of Solids, vol. 177, p. 111303, 2023. M. Kikelj, B. Lipovšek, and F. Smole, “Orthodox Theory Monte-Carlo Simulation of Single-Electron Logic Circuits,” Informacije MIDEM, vol. 48, pp. 241-247, 2018. J. Wang, J. Tian, X. Zhang, B. Yang, S. Liu, L. Yin, et al., “Control of time delay force feedback teleoperation system with finite time convergence,” Frontiers in Neurorobotics, vol. 16, 2022. A. Asthana, A. Kumar, and P. Sharan, “N× N Clos Digital Cross-Connect Switch Using Quantum Dot Cellular Automata (QCA),” Computer Systems Science & Engineering, vol. 45, 2023. S. Riyaz and V. K. Sharma, “Design of reversible Feynman and double Feynman gates in quantum-dot cellular automata nanotechnology,” Circuit world, vol. 49, pp. 28-37, 2023. D. Abedi, G. Jaberipur, and M. Sangsefidi, “Coplanar full adder in quantum-dot cellular automata via clock-zone-based crossover,” IEEE transactions on nanotechnology, vol. 14, pp. 497-504, 2015. S. T. Y. Chan, C. F. Chau, and A. bin Ghazali, “Design of a 4-bit ripple adder using Quantum-dot Cellular Automata (QCA),” in Circuits and Systems (ICCAS), 2013 IEEE International Conference on, 2013, pp. 33-38. S. Hashemi and K. Navi, “A novel robust QCA fulladder,” Procedia Materials Science, vol. 11, pp. 376-380, 2015. M. Patidar, U. Singh, S. K. Shukla, G. K. Prajapati, and N. Gupta, “An ultra-area-efficient ALU design in QCA technology using synchronized clock zone scheme,” The Journal of Supercomputing, vol. 79, pp. 8265-8294, 2023. D. Manna, C. Mukherjee, A. Banerjee, M. Dhar, S. Panda, and B. Maji, “Towards Energy-Efficient Cost-Effective Toffoli Gate Design using Quantum Cellular Automata,” in 2023 IEEE Devices for Integrated Circuit (DevIC), 2023, pp. 56-60. S.-S. Ahmadpour, M. Mosleh, and S. R. Heikalabad, “A revolution in nanostructure designs by proposing a novel QCA full-adder based on optimized 3-input XOR,” Physica B: Condensed Matter, vol. 550, pp. 383-392, 2018. S. Sarmadi, S. Sayedsalehi, M. Fartash, and S. Angizi, “A structured ultra-dense QCA one-bit fulladder cell,” Quantum Matter, vol. 5, pp. 118-123, 2016. S. Sayedsalehi, M. H. Moaiyeri, and K. Navi, “Novel efficient adder circuits for quantum-dot cellular automata,” Journal of Computational and Theoretical Nanoscience, vol. 8, pp. 1769-1775, 2011. Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64 31. 32. 33. 34. 35. M. Balali and A. Rezai, “Design of Low-Complexity and High-Speed Coplanar Four-Bit Ripple Carry Adder in QCA Technology,” International Journal of Theoretical Physics, vol. 57, pp. 1948-1960, July 01 2018. N. Sonare, “Design and Simulation Study of Coplanar Full Adder and Ripple Carry adder using Quantum Dot Cellular Automata,” 2018. H. Rashidi and A. Rezai, “High-performance full adder architecture in quantum-dot cellular automata,” The Journal of Engineering, vol. 1, 2017. M. Mohammadi, M. Mohammadi, and S. Gorgin, “An efficient design of full adder in quantum-dot cellular automata (QCA) technology,” Microelectronics Journal, vol. 50, pp. 35-43, 2016. C. Labrado and H. Thapliyal, “Design of adder and subtractor circuits in majority logic-based fieldcoupled QCA nanocomputing,” Electronics letters, vol. 52, pp. 464-466, 2016. Copyright © 2023 by the Authors. This is an open access article distributed under the Creative Commons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Arrived: 16. 11. 2022 Accepted: 09 .07. 2023 64 Original scientific paper https://doi.org/10.33180/InfMIDEM2023.202 Journal of Microelectronics, Electronic Components and Materials Vol. 53, No. 2(2023), 65 – 77 High-Gain Super Class-AB Bulk-driven Subthreshold Low-Power CMOS Transconductance Amplifier for Biomedical Applications Rakesh Kumar Pandey1, Vijaya Bhadauria2 and V.K.Singh3 Dr. A.P.J. Abdul Kalam Technical University (AKTU), Lucknow, India Motilal Nehru National Institute of Technology (MNNIT), Prayagraj, India 3 Institute of Engineering and Technology (IET), Lucknow, India 1 2 Abstract: This article describes a high-gain sub-threshold region-operated bulk-driven (BD) super class-AB power-efficient singlestage operational transconductance amplifier (OTA) with enhanced unity gain frequency (UGF). The proposed amplifier has a BD adaptively biased flipped voltage follower (FVF) differential input pair functioning in class-AB mode to raise the dynamic current and subsequently raise the UGF, and slew rate. Additionally, the core circuit of the proposed OTA employs partial positive feedback (PPF) to magnify the circuit’s effective input transconductance and gain. Moreover, the circuit’s overall gain is moved up by using three additional low-power current mirror loads, two of which are FVF current mirrors and one of which is a self-cascode current mirror, placed at the output. The proposed OTA circuit and its traditional counterpart are developed and simulated on the Cadence Spectre tool by exploiting UMC 0.18μm CMOS process technology, both circuits are biased with a minimal supply of 0.5V. The simulation results exhibit that the proposed circuit delivers 72.35dB open loop DC gain, 61.33º phase margin, and 18.706 kHz UGF with a consumption of only 62.82nW power. The performance outcomes ensured the suitability of the proposed OTA circuit for biomedical applications. Keywords: Adaptive biasing, Bulk-driven OTA, FVF, Partial Positive Feedback, Self-cascode Ojačevalnik prevodnosti CMOS z nizko močjo in velikim ojačenjem Super Class-AB za biomedicinske aplikacije Izvleček: V članku je opisan enostopenjski operacijski transkonduktančni ojačevalnik (OTA) z visokim ojačenjem, ki deluje v podpražnem območju in je voden preko substrata (BD), ki je energetsko učinkovit in ima povečano frekvenco enotnega ojačenja (UGF). Predlagani ojačevalnik ima BD adaptivno pristranski diferencialni vhodni par z obrnjenim napetostnim sledilnikom (FVF), ki deluje v načinu razreda AB za povečanje dinamičnega toka in posledično povečanje UGF in hitrosti premikanja. Poleg tega jedro vezja predlaganega ojačevalnika OTA uporablja delno pozitivno povratno zvezo (PPF) za povečanje učinkovite vhodne transkonduktivnosti in ojačitve vezja. Poleg tega se celotno ojačenje vezja poveča z uporabo treh dodatnih tokovnih zrcal z nizko porabo, od katerih sta dve tokovni zrcali FVF, eno pa je tokovno zrcalo s samokaskodo, ki je nameščeno na izhodu. Predlagano vezje OTA in njegovo tradicionalno analogno vezje sta razvita in simulirana v orodju Cadence Spectre z uporabo 0,18μm CMOS procesne tehnologije UMC, obe vezji sta obremenjeni z minimalnim napajanjem 0,5 V. Rezultati simulacije kažejo, da predlagano vezje zagotavlja 72,35 dB DC ojačitve v odprti zanki, 61,33º fazno razliko in 18,706 kHz UGF s porabo samo 62,82 nW energije. Rezultati delovanja so zagotovili primernost predlaganega vezja OTA za biomedicinske aplikacije. Ključne besede: prilagodljiva pred napetost, množično voden OTA, FVF, delna pozitivna povratna zanka, samo-kaskoda * Corresponding Author’s e-mail: rakesh18.pnd@gmail.com How to cite: R. K. Pandey et al., “High-Gain Super Class-AB Bulk-driven Sub-threshold Low-Power CMOS Transconductance Amplifier for Biomedical Applications", Inf. Midem-J. Microelectron. Electron. Compon. Mater., Vol. 53, No. 2(2023), pp. 65–77 65 R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 1 Introduction proach is no longer used since it lowers output voltage swings. A self-cascode (SC), as described in references [5, 11, 14, 16], is an excellent approach to carry a strong DC gain and great output swing. It consists of two transistors but is handled as a single composite transistor. When SC loads are used, the output impedance roughly increases by a factor of 10, which is comparable to a gain improvement of about 20 dB. The composite SC loads don’t require any extra bias sources to drive the cascode transistors and hence maximize the voltage gain. Some authors utilize partial positive feedback (PPF) techniques mentioned in [5, 12, 16, 17, 23-29] to improve the input core bulk- transconductance and hence improve the small-signal performances of bulkdriven OTAs, but the enhancement of the large-signal performances is not mentioned in these techniques. Over the last few years, due to the advancement of CMOS technology, there is a continuous requirement for portable handset electronic devices like laptops, notepads, wireless sensor networks, mobile phones, biomedical implantable devices, etc in our everyday lives. The medical field is also drastically changing towards portability to continuously monitor patient health [1-5], this makes the attraction in the evaluation of ultra-low-voltage, low-power circuit designs for portable applications. Analog circuit designers are still rigorously working in this field and illustrating different design techniques for low voltage in the literature [6-8]. As the most fundamental block in an analog circuit, the OTA plays a pivotal role in analog front-end circuits used in biomedical data acquisition systems. Electromyograms (EMG), Electroencephalograms (EECG), Electrocardiograms (ECG), and other bio-potential signals are low voltage (amplitude in mV), and lowfrequency signals with only a few kHz range. Rail-to-rail input/output swing, high DC gain, low noise, high linearity, and minimal power consumption are the basic requirements of the OTA used in biomedical applications [3]. Achieving these characteristics using a low power supply in deep submicron technologies is really a challenge. The conventional gate-driven technique is unsuitable for application under a 1V environment because of the threshold voltage constraint; OTA’s restricted linear range and high power consumption are two of its biggest flaws. The weak-inversion design technique is well-suited to reduce power consumption, as the necessary drain-to-source voltage (VDS) for strong inversion is 250 mV, which is decreased to about 78 mV [5, 9-11]. An alternate approach for operating rail-torail is to use the bulk-driven (BD) technique, which can prevail over the aforementioned linearity and threshold voltage restrictions. The bulk-driven technique in combination with the sub-threshold technique is preferable for biomedical applications, as the combined effect of both techniques increases linearity and reduces power consumption. Although the bulk-driven technique increases input common-mode range (ICMR), it reduces open-loop DC gain, and UGF and raises inputreferred noise, since the gate transconductance (gm) is (2.5–5) times higher than the bulk transconductance (gmb) [12-16]. A number of bulk-driven OTA designs are described to improve the above-mentioned disadvantages of reduced bulk-transconductance under sub-1V environments in the literature [10-19], and also discussed in the references [20-22] in extremely low voltage conditions with very little power loss. Despite being the most power-efficient, single-stage amplifiers cannot deliver enough gain, in order to provide high gain, cascode techniques are used earlier. This ap- This paper presents an improved bulk-driven low-power single-stage super class-AB OTA [12, 26, 30-32], operated in a sub-threshold region, which has been termed as super class-AB bulk-driven sub-threshold (SBDST) OTA in the whole paper. The proposed amplifier utilizes an adaptive bias technique in the input differential pair based on a BD-FVF [26, 31] functioning in the classAB mode to improve the dynamic current and unity gain frequency. The partial positive feedback (PPF) technique has been exploited in the core circuit to improve the overall effective input transconductance and hence gain of the circuit. In addition to the improvement of input transconductance, output impedance also increases by using low power and high performance three current mirrors at the output, hence, the overall gain of the circuit further raises. The proposed SBDST OTA offers significant open loop DC gain, UGF, and slew-rate while exploiting minimal power, by utilizing the aforementioned techniques. This paper is structured as follows: The study of conventional OTA is covered in Section 2, along with a thorough circuit description of both the proposed and conventional OTA. Section 3 discusses the proposed OTA’s intricate circuit analysis. In Section 4, the simulation outcomes of conventional and proposed OTA including Monte Carlo, process corner analysis, and layout are covered. Section 5 compares the proposed OTA’s performance to those of the other previously reported designs, and Section 6 finally brings to a conclusion of the paper. 2 Circuit Descriptions 2.1 Conventional Bulk-driven Sub-threshold (BDST) OTA The conventional BDST OTA, in which the input core circuit is designed using bulk-input PMOS transistors PI1a66 R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 PI1b, is depicted in Fig. 1. The differential input transistor pair working in the sub-threshold region is biased by using transistor PB; the drain current (IDS) of a transistor operating in sub-threshold is expressed as [11]:  VGS  VTn  nKT   W  q I DS = Is   e  L  [1  e   VDS   q KT    The output impedance Rout of the BDST OTA is provided as: Rout = [ro4,bN ||(ro3,bP + (ro3,bN||ro2,bP))] since, ] (1) (4) ro3,bP>> (ro3,bN||ro2,bP) therefore, equation (4) is simplified as: where IS and K are the characteristic current of the subthreshold and Boltzmann constant, n is the slope of the curve in the sub-threshold region, T is the absolute temperature and q is the charge of electron respectively. Rout = (ro4,bN||ro3,bP) (5) The transistors are in saturation in the sub-threshold region if VDS ≥ 3V T, where (V T = KT/q) is the thermal equivalent voltage and its value at 27º is 26 mV. AV,BDST = Gm,BDST .Rout AV,BDST = K1.gmb1,a/b .(ro4,bN||ro3,bP) Effective transconductance and the circuit’s output impedance combine to provide the open loop dc gain (AV), which is expressed as: The main problem of conventional BDST OTA is a very low open-loop gain, it is only about 32 dB. A non-linear current mirror is employed here to increase the amplifier’s slew rate and unity gain frequency (UGF), which can be evaluated with capacitive load CL by the following equations: Applying the condition VDS ≥ 3V T in (1), then the term e   VDS  q   KT   1, hence the equation (1) simplifies to  VGS -VTn   nKT   W  q I DS =Is   e  L (6) (2) UGFBDST = K1.g mb1,a/b (7) The output of differential input pair consists of NMOS transistor pair (N1a-3a - N1b-3b), form the non-linear current mirrors with a current transfer ratio K1 = 2. These current mirrors known as the adaptive loads [26], are loads of the input transistor pair. The output of the adaptive loads is routed to the summing stage, which is at the circuit’s output, to raise the output impedance. The summing stage of the conventional OTA uses PMOS transistors (P2a-2b - P3a-3b) as a current mirror to boost the largely dc gain of the circuit. The conventional BDST OTA’s effective transconductance is provided by: The values of UGF and slew rate of the conventional amplifier are 1.637 kHz, and 0.92V/ms respectively, which are quite low. Therefore, some structural change is required in the amplifier to get better the whole performance of the conventional BDST OTA concerning open-loop gain, slew rate, UGF, etc. Gm,BDST = K1.gmb1,a/b (3) 2.2 Proposed SBDST OTA where gmb1 represents the bulk-transconductance of input transistors. We proposed the super class-AB bulk-driven subthreshold (SBDST) OTA, which is depicted in Fig. 2, to enhance the performance of conventional BDST OTA. Its input core makes use of two identical adaptively biased BD-FVF pair, eliminating the bias current source of the conventional circuit, which is supplied by PB in Fig. 1. SR BDST = 2πCL 2.K1.I B (8) CL The best possible dimensions of transistors are selected to function the proposed SBDST in a sub-threshold region so that the circuit can obtain low power operation i,e., below 100nW. To extend the input commonmode range (0 to VDD) of the circuit bulk-driven differential pair is used in the input stage, however, gate transconductance (gm) is (2.5–5) times more than the bulk transconductance (gmb) [12]. Consequently, the Figure 1: Conventional Bulk-driven sub-threshold (BDST) OTA 67 R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 circuit’s effective transconductance decreases, and hence, the gain and UGF of the circuit are also considerably low. To overcome these limitations an adaptive biased super class-AB [26, 31, 32] is incorporated into the input core. The BD-FVF pair at the input consists of bulk-driven transistors PI1a-PI1b, diode-connected transistors PI3a-PI3b connected in negative feedback [31, 32], and the current source made by transistors NI5a-NI5b. The input core of the SBDST is made up of adaptively biased input differential pair PI2a-PI2b and adaptive loads. Non-linear current mirrors (N1a-3a - N1b-3b) with a current ratio of 1:K1 are called adaptive loads. The source terminal of FVF pair is the output node that has very low impedance, given by cascode current mirror. In composite SC structure, the aspect ratio of the cascode transistor and the transistor connected to the supply is set to 20, to operate in saturation in the sub-threshold region [11, 16]. Hence, by raising the input stage’s transconductance and the output stage’s output impedance, the proposed SBDST OTA’s overall performance is enhanced. 1 . The FVF is g m3a g m1a ro1a Figure 2: Proposed SBDST OTA capable to source a significant amount of current even greater than the bias current Ib on the variation of differential input voltage because of the low impedance at the output node. Hence, the FVF pair in combination with the adaptive biased differential pair makes the proposed OTA function in class-AB. This combination eliminates the limitations of traditional BDST OTA. 3 Explanation of the proposed SBDST OTA This section describes the SBDST OTA’s overall transconductance, voltage gain, UGF, and stability. Differential input signals Vin- and Vin+ are applied across the bulk terminal of transistors PI1a-PI1b as well as to the bulk terminal of adaptive biased differential pair PI2a-PI2b. Due to the voltage follower action of FVF, the transistor PI1a source terminal is also Vin-. This terminal is named C as shown in Fig.2 and is also connected to the source terminal of the transistor PI2a, hence, the total signal voltage that appears across the transistor PI2a is VBS = [Vin+ - Vin-] = [Vin+ - (-Vin+)] = 2 Vin+. Similarly, the VBS of the transistor PI2b is 2Vin-. Therefore, the proposed SBDST OTA’s effective transconductance is twice as much as that of the traditional OTA and is equal to 2gmb1,a/b. 3.1 Effective transconductance and UGF The half sub-circuit of the input core of SBDST and its small signal AC equivalent circuit are depicted in Fig. 3a and b. The input signals Vin- and Vin+ are applied to the bulk terminal of the transistors PI1a and PI2a respectively, assuming the voltage at their source terminals is VC, and (a) Since the transconductance of the SBDST increases, so the gain and UGF are also increase. To further enhance the transconductance, and gain, the PPF technique has been introduced using transistors N4a and N4b at adaptive load ends of the input core, shown in Fig. 2 inside the box colored green. The PPF loop increases the overall input transconductance but with a little loss of phase margin (PM), as it generates a non-dominate pole at node E of the SBDST OTA. Therefore, a small compensation capacitor CC is used between the drain of NI5b and the output node. (b) In addition to the improvement of transconductance, output impedance is also enhanced using three current mirrors at the output of the circuit. Among these mirrors, two are highly-effective FVF current mirrors [5] with a current gain factor of 1.25, and one is a self- Figure 3: (a) Input core’s half circuit of SBDST OTA, (b) half circuit small signal equivalent model 68 R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 the transistor PI3a gate voltage is VD. The drain terminal of PI1a is also connected to point D, so its drain voltage is also VD. A constant DC source Ib made by transistors NI5a-NI5b biases the transistor PI1a. As a result, zero AC small signal current passes through the input transistor PI1a [31]. As the circuit is symmetry, therefore the input core transconductance is given as: G m, input core = G m, effective |SBDST = K1 The small signal current through transistor PI1a at node D is given by: VC  VD   0 ro1,a VC  VE  ro 2,a (9) given by K1 = UGF = (10) g m1, a  g mb1,a VC   g mb1,aVin  VC   g mb1,aVin g m1, a  g mb1,a  m1, a  g mb1,a  g m2, N G m, effective |SBDST 2πCL = K1 2g mb1 1-α  2πCL m 2, a (14) AV|SBDST = Gm,effective|SBDST.Rout ⇒ AV|SBDST = K 2g mb1 [(gm7,bP ro7,bP ro5,bP)||(gm6,bN ro6,bN ro7,bN)] (22)  g mb 2,a  (15) 1-α  3.3 Stability analysis 1 The proposed SBDST OTA introduces a dominant pole (р1) at the output node owing to its capacitive load and output impedance hence, its frequency is not influenced by the PPF loop and is given as: (16) Putting the above relations into Eq. (15), then the value of Io can be simplified as: I o  2 g mb1,aVin (21) Hence, the proposed SBDST OTA’s overall voltage gain is given by: Since the transistors PI1a and PI2a are identical, therefore gm1,a = gm2,a and gmb1,a = gmb2,a (20) The SBDST OTA’s open loop voltage gain is obtained by multiplying the circuit’s output impedance and effective transconductance. The entire circuit’s output impedance at the output node is provided by: Rout = (gm7,bP ro7,bP ro5,bP)||(gm6,bN ro6,bN ro7,bN) g . (12) Putting the value of VC from Eq. (13) to Eq. (14) and solving it in terms of Vin, then the equation becomes: g g m4, N 3.2 Voltage gain (13) I o  g mb 2,aVin  VC  g m 2,a  g mb 2,a  I o  g mb 2,aVin  g m2, N and α = (11) Eq. (10) can be written as: g mb1,aVin g m3, N Due to its enhanced effective transconductance value, the proposed OTA provides a significantly higher UGF than the traditional BDST OTA, as shown by expression (20). So, Eq. (9) can be approximated as: ⇒ (19) The UGF of the SBDST OTA is given by: Neglecting the output resistance term from both equations since its value is very high. g m1,a VC   g mb1,a Vin  VC   0 2g mb1 1-α  where K1 and α are the aspect ratios of the transistors and the output current Io contributed by PI2a is expressed as: I o  g m 2,a  VC   g mb 2,a Vin  VC   (18) Considering the current gain K1 = 2 of the non-linear current mirror together with the partial positive feedback technique employing the transistors N4a, and N4b in the input core, the effective overall transconductance of the proposed SBDST OTA is provided by: And all the AC signal current of PI3a is the output AC small signal current Io which flows through the transistor PI2a [12, 26,31,32]. The effective overall transconductance of the proposed OTA is calculated from the following equations, g m1,a  VC   g mb1,a  Vin  VC   Io = 2g mb1 Vin p1 = (17) 69 1 R out  CC +CL  (23) R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 where CC is the small compensation capacitor and CL is the load capacitance. comes demonstrate that the open loop DC gain, UGF, and phase margin of the proposed SBDST OTA are 72.356 dB, 18.7057 kHz, and 61.3255º respectively. This result of the proposed OTA exposes that the improvement in DC gain is 2.26 times and in UGF is 11.42 times than the conventional circuit, with a little loss of phase margin. A compensation capacitor of value, CC = 0.4pF is used in the SBDST OTA, to increase its phase margin above 60º. The PPF technique employed in the input core causes the non-dominant pole (P2) at the drain terminal of N2,a/b to shifts towards a lower value, and its value given in [12], is expressed as: p2 = g m2,N  g m4,N  CP (24) The overall effective input core transconductance is exposed in Fig. 5, the proposed SBDST OTA accomplishes a significantly greater effective input core transconductance of 1.76μS compared to 158.5nS of conventional bulk-driven OTA. where CP indicates the parasitic capacitance at the above-mentioned drain terminal node. This node has a higher impedance due to PPF action. The lower secondary pole value in (24) limits the maximum possible UGF. To ensure a stable phase margin a small compensation capacitor CC is placed between the drain of NI5b and the high-impedance output node. 4 Simulation results Using 180nm CMOS process technology, the traditional BDST and proposed SBDST OTAs are driven by only 0.5V supply for a load capacitor of 15pF and are simulated in the Cadence Virtuoso simulator. The bias current Ib of the BD-FVF pair in SBDST OTA is fixed to 10nA and the total stand-by-current under the sub-threshold region of operation is 124nA while that of conventional OTA is 100nA, and the reference temperature for both is 27ºC. The bias current used for biasing the high-performance FVF current mirror is 1.2nA. In the design, all the MOSFETs have an optimum value of aspect ratio to lower the influence of channel length modulation and input referred noise of the circuit. Additionally, the bias voltage Vb1 has been chosen properly to bias the transistor N2,a/b in the triode region, so that the combination of transistors (N1a-3a – N1b-3b) works as a non-linear mirror. Figure 5: Effective input core transconductance of BDST OTA and SBDST OTA One of the most crucial factors of an OTA is noise, it is an undesired signal that frequently combines with the desired signal as a result of fluctuations in the power supply or component mismatches, producing unwanted output. In addition to this, the thermal, as well as flicker noise of MOS transistors itself, adds to the overall noise density. Since the range of biosignals is 10mHz ≤ fbio ≤ 1kHz, hence the flicker noise predominates more in bi- Figure 4: Simulated AC plot of BDST OTA and SBDST OTA Figure 6: Plot of input referred noise voltage of BDST OTA and SBDST OTA Figure 4 shows the AC responses of conventional BDST OTA and proposed SBDST OTA, the simulation out70 R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 omedical applications. So, the design of the OTA circuit must assure minimum input-referred output noise for biomedical applications. Figure 6 highlights the inputreferred noise produced at the input pair terminals of the proposed and conventional OTA, the SBDST OTA and BDST OTA are found to have input-referred noise (IRN) values of 0.959 μV/√Hz and 1.347 μV/√Hz, respectively, at 1 kHz. Its value is less in the proposed SBDST OTA due to the enhancement in the overall effective transconductance of the input pair. respectively on the non-inverting input terminal in Fig. 8 at 250 Hz frequency. The simulation’s outputs are revealed in Fig. 10(a) and (b), the result displays that the proposed OTA provides (12.55 mV−491.3 mV) and (12.56 mV−399.37 mV) of output signal swing respectively. The output voltage swing in response to a sinusoidal transient is nearly rail-to-rail. The PSRR ± and CMRR values must be very large to reject unwanted signals which are generated by variations in the power supply, these unwanted signals are common to both inputs. Figure 7 shows the result of CMRR and PSRR ± of the SBDST OTA, it is found that the proposed SBDST OTA at 1 mHz provides a high CMRR, PSRR+, and PSRR− of values 161.48 dB, 86.17 dB, and 69.22 dB respectively. Figure 9: Large-signal pulse response to 0.5Vpp at 250 Hz square wave for proposed SBDST OTA (a) Figure 7: CMRR, PSRR (+/−) of proposed SBDST OTA (b) Figure 8 shows a unity gain closed loop structure of the suggested SBDST OTAs by shorting its inverting input to output to achieve the transient response of largesignals. The output response is highlighted in Fig. 9 for a 15pF capacitive load when a step input signal of 0.5V peak-to-peak voltage (Vpp) at 250Hz frequency is applied at the non-inverting input of OTA. It is found that the value of an average slew rate of the SBDST OTA is 2.07 V/ms. Figure 10: Sinusoidal transient response of the proposed SBDST OTA for (a) Vin,pp = 0.5V with Vcm= 0.25V, (b) Vin,pp = 0.4V with Vcm= 0.2V The proposed OTA’s input common-mode range (ICMR) is evaluated by performing its DC sweep analysis in a non-inverting voltage buffer configuration with a 15pF capacitive load, and the simulation’s output is exposed in Fig. 11(a), and the variation of error voltage (Vout-Vin) over the whole input (0 to VDD) voltage range is displayed in Fig. 11(b). It has been found that the error voltage generated at 0V input is 12.63 mV, while at 0.5V input is 8.8 mV only. Thus, it is ensured from the DC sweep results revealed in Fig. 11(a) and (b), that the proposed SBDST OTA is linear over a wide range of ICMR. Figure 8: Unity gain configuration of the SBDST OTA Sinusoidal transient response is evaluated by applying two sinusoidal input signals of 0.5Vpp and 0.4Vpp with common-mode voltages (Vcm) of 0.25V and 0.2V, 71 R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 (a) (a) (b) (b) (c) Figure 11: (a) DC sweep for ICMR of the SBDST OTA, (b) Error voltage (Vout - Vin) in DC sweep (d) Figure 12: Plot of THD against amplitude for SBDST OTA Figure 13: Simulation results of Monte Carlo iteration of (a) DC gain, (b) PM, (c) UGF, (d) total power consumption for 300 samples A 250 Hz sine wave input signal with varying peak-topeak amplitudes from 50mV to 500mV has been used to assess the nonlinearity of the SBDST OTA in a unit gain configuration. The simulation result is highlighted in figure 12. At 200 mV (pp), the total harmonic distortion is -60.91dB, and up to 466 mV(pp) amplitude of the input sine wave, the SBDST OTA ensures that the THD value is less than -40dB. Table 1: Performance result of proposed SBDST OTA under Monte Carlo simulation using 300 samples Parameters Open loop DC gain (dB) Phase margin (degree) UGF (kHz) CMRR (dB) @ 1mHz PSRR+ (dB) @ 1mHz PSRR− (dB) @ 1mHz SR(av) (V/ms) IRN (μV/Hz0.5) at 1kHz Total current (nA) Total Power (nW) The robustness of the OTA is determined by the deviation of its performance parameters from process and mismatch. For 300 samples, Monte Carlo simulations are utilized to assess the proposed SBDST OTA’s robustness. The statistical data of such analysis is shown in Fig. 13 in the form of a histogram. Mean (μ) 72.41 61.94 18.84 151.7 86.31 69.28 2.078 0.96 124.5 62.27 SD (σ) 726m 531.8m 1.462k 10.66 1.399 883.6m 140.1m 6.999n 13.55n 6.773n Integrated circuits (ICs) must be so designed by manufacturers that after fabrication, PVT (process, voltage, and temperature) fluctuations have no effect on ICs. Deviations in manufacturing conditions like dopant concentrations, temperature, pressure, and variations in the semiconductor fabrication process cause “process In addition to this, Monte Carlo simulations of the whole parameters of the SBDST OTA have been tabularized in Table 1. Table 1’s outcomes demonstrate that the proposed OTA delivers low standard deviation (SD) for all the performance parameters and hence is insensitive to process variations. 72 R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 variation”. The other key factors for process variation are variations in metal thickness, oxide thickness, UV light wavelength, faults in the manufacturing process, and variations in transistors characteristics [12]. There may be a chance of voltage fluctuation also in some circumstances, so the proposed OTA’s simulation results should also be verified by varying the supply voltage. Figure 15 depicts the layout of the single-stage SBDST OTA. The proposed OTA takes up (76 x 81) μm2 area, including the area of the compensation capacitor, and the post-layout outcome of the AC response of the SBDST OTA is exposed in Fig. 16. The simulation outcomes of post-layout express that the open loop DC gain, UGF, and phase margin are 72.281 dB, 18.329 kHz, and 61.635º respectively. The results of pre-layout and post-layout AC responses expose that there is a high degree of proximity. This proximity supports the usability and design of this SBDST OTA. Figure 14 shows the five process corners (TT, FF, SS, FNSP, and SNFP) effects at 27 ºC on the gain and phase margin of the proposed SBDST circuit. The SS corner has the largest DC gain, measuring 76 dB, and the FNSP corner has the lowest DC gain, measuring 65.59 dB. To check the sensitivity of the proposed OTA against the variations of PVT, corner analysis for five different corners at temperatures (−14 ºC, 27 ºC, and 60 ºC) has been done, and the performance of OTA has been also verified by varying ±10% supply voltage. The simulation results of all the performance parameters against fluctuations of PVT are tabulated in Table 2 and Table 3. (a) (b) Figure 15: Proposed SBDST OTA’s layout Figure 14: Process corners effect on DC gain and phase margin at room temperature Table 2: Simulation results on the variation of supply voltage Parameters Open loop DC gain (dB) Phase margin (degree) UGF (kHz) CMRR (dB) @ 1mHz PSRR+ (dB) @ 1mHz PSRR− (dB) @ 1mHz IRN (μV/Hz0.5) at 1kHz Total current (nA) Total Power (nW) VDD − 10% 63.33 68.43 7.4 126.88 71.13 61.66 0.96 102.9 46.3 VDD + 10% 77.64 56.83 27.46 115.1 94.4 73.28 0.94 144 79.2 Figure 16: Post-layout AC plot of BDST OTA 73 R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 Table 3: Simulation results of the performance of SBDST OTA on variations of process and temperature Parameters Open loop DC gain (dB) Phase margin (degree) UGF (kHz) CMRR (dB) @ 1mHz PSRR+ (dB) @ 1mHz PSRR− (dB) @ 1mHz SR(av) (V/ms) IRN (μV/Hz0.5) at 1kHz Total Power (nW) Parameters Open loop DC gain (dB) Phase margin (degree) UGF (kHz) CMRR (dB) @ 1mHz PSRR+ (dB) @ 1mHz PSRR− (dB) @ 1mHz SR(av) (V/ms) IRN (μV/Hz0.5) at 1kHz Total Power (nW) Parameters Open loop DC gain (dB) Phase margin (degree) UGF (kHz) CMRR (dB) @ 1mHz PSRR+ (dB) @ 1mHz PSRR− (dB) @ 1mHz SR(av) (V/ms) IRN (μV/Hz0.5) at 1kHz Total Power (nW) Different corners at temperature −14 ºC TT FF SS 79.41 76.17 80.68 53.37 54.04 53.29 9.08 23.57 2.68 135.18 131.26 149.96 98.44 91.51 102.45 73.09 70.34 74.68 1.22 2.5 0.58 0.93 0.81 1.31 18.14 58.99 4.83 Different corners at temperature 27 ºC TT FF SS 72.36 68.06 76 61.96 65.42 59.69 19.11 34.77 8.47 161.49 142.19 131.91 86.17 79.86 93.27 69.21 65.56 72.41 2.06 3.66 1.12 0.95 0.92 1.08 62.81 162.98 21.79 Different corners at temperature 60 ºC TT FF SS 65.35 60.48 69.72 69.83 73.96 65.86 23.15 33.87 13.87 133.58 139.19 124.83 76.11 70.17 82.49 63.69 59.27 67.59 2.74 4.32 1.66 1.03 1 1.09 137.94 316.13 55.62 FOM La  5 Performance comparison and discussions UGF  MHz   CL  pF  IT   A  79.08 58.79 5.21 109.27 91.49 72.24 1.58 1.03 10.85 FNSP 73.63 54.1 10.16 133.98 85.1 68.85 0.84 0.88 29.58 SNFP 75.99 61.28 15.41 107.02 89.12 71.72 2.91 0.94 42.62 FNSP 65.59 68.07 14.39 136.82 74.44 63.69 1.41 0.96 88.66 SNFP 70.18 65.74 25.77 102.66 81.76 67.47 3.9 0.96 102.48 FNSP 14.08 102.17 0.093 137.38 19.35 14.08 1.8 1.21 175.65 SRav V /  s   CL  pF  IT   A  (26) The proposed SBDST OTA performance parameters are compared with some of the other recent BD OTAs and reported in Table 5. According to Table 5, the proposed SBDST OTA has offered the largest DC gain, PSRR+/− among others and also has maximum CMRR except that of [19] only. The proposed SBDST OTA’s large-signal response (FOMLa) is comparable to only [14] in comparison to the other remaining OTAs given in Table 5, but it has provided the highest small-signal response (FOMSm) as compared to other reported OTAs, with the exception of [20]. Table 4 lists the performance parameters of proposed OTAs. To verify the overall performance of OTA in terms of the responses to small- and large-signals, two popular figures of Merit (FOMSm, FOMLa) are specified in [22, 27, 28], and are given in equations (25) and (26) respectively. FOM Sm  SNFP (25) 74 R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 Table 4: Proposed SBDST OTA performance parameters Parameters Power supply (V) Load capacitance (pF) Technology DC gain Phase margin (º) UGF (kHz) CMRR (dB) @ 1mHz PSRR+ (dB) @ 1mHz PSRR− (dB) @ 1mHz IRN @ 1kHz(μV/Hz0.5) SR average (V/ms) Total current (nA) Total power (nW) FOMSm FOMLa weak-inversion region, powered by 0.5V of power supply. The proposed architecture of the amplifier employs a BD-FVF that is based on an adaptively biased differential input pair operating in the class-AB mode to improve dynamic current and unity gain frequency. Additionally, it employs a partial positive feedback technique in the differential pair’s core to increase the gain of the circuit. Further, the gain of the circuit is increased by using a low-power, high-performance current mirror load based on FVF at the amplifier’s output. The suggested OTA’s simulation results show that the amplifier uses just 62.82nW of power and has a DC gain of 72.35 dB, a phase margin of 61.32º, and a UGF of 18.7 kHz. For a 250 Hz input sine wave of 200 mV (pp), the SBDST OTA in its unity gain configuration offers -60.91 dB total harmonic distortion. The obtained outcomes of the amplifier ensured that the proposed SBDST OTA is appropriate for biomedical signal processing, audio signal processing, and low-frequency sensors. SBDST OTA 0.5 15 0.18μm 72.35 61.33 18.7 161.48 86.17 69.22 0.95 2.07 125.64 62.82 2.23 0.247 7 Conflict of Interest 6 Conclusions The authors declare that they have no conflicts of interest. This article’s work presents an enhanced bulk-driven single-stage architecture of an OTA functioning in the Table 5: Proposed SBDST OTA and previously reported BD OTAs performance differences at 0.18μm technology Parameters [14] [11] [15] [16] [16] [17] [19] [29] [20] 2015 2017 2017 OTA1 OTA2 2018 2019 2021 2022 2018 15 0.6 93 61.9 2018 15 0.6 78.41 60.76 50 0.8 87 44.3 50 0.7 89.07 71.35 30 0.4 60 60 0.0024 122 @ 1Hz 62.8@ 1Hz 0.00773 119@ 1Hz 61.8 15 0.6 74 71@ 0.1Hz 0.0182 201.8@ 10Hz 77.4@ 10Hz 60.23 0.779 0.00157 0.079 0.00207 200 140 0.27 0.39 − 125.64 62.82 2.23 0.247 6,156 CL(pF) Power supply (V) Phase margin (º) (DC gain)a (dB) 15 0.5 68.9 67.8 15 0.5 54 70.4 12 0.6 62.45 61.5 UGF (MHz) CMRRa (dB) 0.003 − 0.009 106 @ 1Hz 70@ 1Hz 0.03015 − PSRR − (dB) − IRNb @ (μV/HZ0.5) 0.56 − 2.53 − 2.454 SR average (V/μs) 0.84/ 0.967 − 6.25 @ 0.1Hz 0.0553 @ 1Hz − 2.97 1.15 1.404 3.5 − 0.25@ 0.1Hz 0.0066 125.5 64 1.11 116 − 275 165 1.31 3.01 − 50.63 30.38 0.711 340.7 6620 115 69 1.008 183.13 7406 62,000 49,600 1.17 2.82 − 240 144 1.13 0.412 16,000 PSRRa+ (dB) − a 0.59 52 26 0.94 0.24 52000 67.9 1.45 − − − − ThisWork 2023 15 0.5 61.33 72.35 0.00107 0.007 138.5 85 0.0187 161.48 77.08 86.17 76 − − 69.22 0.95 c Total current (nA) Total power(nW) FOMSm FOMLa Area (μm2) a: at 1mHz, b: at 1kHz, c: V/ms 75 60 24 3.5 39.5 7900 R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 8 References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. R. R. Harrison, C. Charles, “A low-power low-noise CMOS amplifier for neural recording applications”, IEEE J. Solid State Circuit, vol. 38, no.6, pp. 958–965, 2003. https://doi.org/10.1109/JSSC.2003.811979 S. Chatterjee, Y. Tsividis, P. Kinget, “0.5-V analog circuit techniques and their applications in OTA and filter design”, IEEE J. Solid State Circuit, vol. 40, no. 12, pp. 2372–2387, 2005. https://doi.org/10.1109/JSSC.2005.856280 A.P. Chandrakasan, N. Verma and D. C. Daly, “Ultralow-power electronics for biomedical applications”, Annu. Rev. Biomed. Eng, vol.10, pp. 247274, 2008. https://www.annualreviews.org/doi/ abs/10.1146/annurev.bioeng.10.061807.160547 N. Suda, P.V. Nishanth, D. Basak, D. Sharma, R.P. Paily, “A 0.5-V low power analog front-end for heart-rate detector”, Analog Integr. Circuits Signal Process, vol. 81, no. 2, pp. 417–430, 2014. https://doi.org/10.1007/s10470-014-0402-1 Rakesh Kumar Pandey, Vijaya Bhadauria and V.K. Singh, “Rail-to-Rail, Reconfigurable Subthreshold Bulk-Driven OTA Based on Flipped Voltage Follower for Biomedical Applications”, Pub. in: 2021, IEEE International Conference on Technology, Research, and Innovation for Betterment of Society (TRIBES), 2021. https://doi.org/10.1109/TRIBES52498.2021.9751665 S. Yan and E. Sanchez-Sinencio, “Low voltage analog circuit design techniques: A tutorial”, IEICE Trans. Analog Integr. Circuits Syst., E83-A, pp. 179–196, 2000. https://people.engr.tamu.edu/ssanchez/607-lvtutorial-2000.pdf S. S. Rajput, & S. S. Jamuar, “Low voltage analog circuit design techniques”, IEEE Circuits and Systems Magazine, vol. 2, no. 1, pp. 24–42, 2002. https://doi.org/10.1109/MCAS.2002.999703 R. G. Carvajal, J. Ramı´rez-Angulo, A. J. Lo´pezMartı´n, A. Torralba, J. A. G. Gala´n, A. Carlosena, & F. M. Chavero, “The flipped voltage follower: A useful cell for low voltage low-power circuit design”, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 52, no. 7, pp. 1276–1291, 2005. https://doi.org/10.1109/TCSI.2005.851387 H. C. Ferreira, T. C. Pimenta and R. L. Moreno, “An ultra-low-voltage ultra-low-power weak inversion composite MOS transistor concept and applications,” IEICE Trans. Electron, E91-C, pp. 662–665, 2008. https://doi.org/10.1093/ietele/e91-c.4.662 M. O. Trakimas, and S. Sonkusale, “A 0.5 V bulk-input OTA with improved common-mode feedback for low-frequency filtering applications”, Analog 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 76 Integrated Circuits and Signal Processing, vol. 59, no.1, pp. 83-89, 2009. https://doi.org/10.1007/s10470-008-9236-z T. Sharan, and V. Bhadauria, “Fully differential, bulk-driven, class AB, subthreshold OTA with enhanced slew rates and gain”, Journal of Circuits System and Computers, vol. 26, no. 1, 1750001, 2017. https://doi.org/10.1142/S0218126617500013 S. Ghosh, and V, Bhadauria, “An ultra-low-power bulk-driven subthreshold super class-AB rail-torail CMOS OTA with enhanced small and large signal performance suitable for large capacitive loads”, Microelectronics Journal, 115, 105208,2021, https://doi.org/10.1016/j.mejo.2021.105208 L.H. Ferreira, T.C. Pimenta, R.L. Moreno, “An ultralow-voltage ultra-low-power CMOS miller OTA with rail-to-rail input/output swing”, IEEE Transactions on Circuits and Systems II: Express Briefs, vol.54, no. 10, pp. 843–847, 2007. https://doi.org/10.1093/ietele/e91-c.4.662 X. Zhao, H. Fang, T. Ling, J. Xu, “Transconductance improvement method for low-voltage bulk-driven input stage”, Integration, vol. 49 pp. 98–103, 2015. https://doi.org/10.1016/j.vlsi.2014.11.005 M. Akbari, O. Hashemipour, M. H. Moaiyeri, and A. Aghajani, “An efficient approach to enhance bulkdriven amplifiers”, Analog Integrated Circuits and Signal Processing, vol. 92, no. 3, pp.489–499, 2017. https://doi.org/10.1007/s10470-017-1010-7 T. Sharan, P. Chetri, and V. Bhadauria, “Ultra-lowpower bulk-driven fully differential subthreshold OTAs with partial positive feedback for Gm-C filters”, Analog Integr. Circuits Signal Process, vol. 94, no. 3, pp. 427–447, 2018. https://doi.org/10.1007/s10470-017-1065-5 X. Zhao, Q. Zhang, Y. Wang, and L. Dong, “An approach to essentially improve current efficiency for bulk-driven OTA”, AEU-International Journal of Electronics and Communications, vol. 86, pp. 103–107, 2018. https://doi.org/10.1016/j.aeue.2018.01.028 T. Kulej, and F. Khateb, “0.4-V bulk-driven differential-difference amplifier”, Microelectron. J., vol. 46, no. 5, pp. 362–369, 2015. https://doi.org/10.1016/j.mejo.2015.02.009 A. Ghaemnia, and O. Hashemipour, “An ultra-low power high gain CMOS OTA for biomedical applications”, Analog Integrated Circuits and Signal Processing, vol. 99 no. 3, pp. 529–537, 2019. https://doi.org/10.1007/s10470-019-01438-6 Akbari, M.; Hussein, S.M.; Hashim, Y.; Tang, K.-T. 0.4-V Tail-Less Quasi-Two-Stage OTA Using a Novel Self-Biasing Trans-conductance Cell. IEEE Transactions on Circuits and Systems I: Regular Papers R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 2022, 69, 2805–2818. https://doi.org/10.1109/TCSI.2022.3161964 Kulej, T., Khateb, F., Design and implementation of sub 0.5-V OTAs in 0.18 um CMOS, International Journal of Circuit Theory and Applications, vol. 46, no.6, pp. 1129-1143, 2018. https://doi.org/10.1002/cta.2465 Kulej, T.; Khateb, F. A Compact 0.3-V Class AB BulkDriven OTA, IEEE Transactions on Very Large Scale Integration (VLSI) Sys-tems 2020, 28, 224–232. https://doi.org/10.1109/TVLSI.2019.2937206. R. Wang and R. Harajani, “Partial posistive feedback for gain enhancement of CMOS OTAs”, Analog Integr. Circuits Signal Process, vol.8, pp. 21–35, 1995. https://doi.org/10.1007/978-1-4615-2283-6_3 J.M. Carrilo, G. Torelli, R. Perez-Aloe, J.F. DuqueCarrillo, “1-V rail-to-rail CMOS OpAmp with improved bulk-driven input,”, IEEE J. Solid State Circ., vol. 42, no.3, pp.508–517, 2007. https://doi.org/10.1109/JSSC.2006.891717 L.H. Ferreira, S.R. Sonkusale, “A 60-dB gain OTA operating at 0.25-V power supply in 130-nm digital CMOS process”, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 61, no. 6, pp. 1609– 1617, 2014. https://doi.org/10.1109/TCSI.2013.2289413 H. Veldandi, and R.A. Shaik, “An Ultra-low-voltage bulk-driven analog voltage buffer with rail-to-rail input/output range”, Circuits, Systems, and Signal Processing, vol. 36, no. 12, pp. 4886–4907, 2017. https://doi.org/10.1007/s00034-017-0663-x Ballo, A.; Grasso, A.D.; Pennisi, S. 0.4-V, 81.3-NA Bulk-Driven Single-Stage CMOS OTA with Enhanced Transconductance. Electronics 2022, 11, 2704. https://doi.org/10.3390/electronics11172704 Andrea Ballo , Alfio Dario Grasso and Salvatore Pennisi, A 0.6 V Bulk-Driven Class-AB Two- Stage OTA with Non-Tailed Differential Pair J. Low Power Electron. Appl. 2023, 13, 24. https://doi.org/10.3390/jlpea13020024 S. Ghosh, S. Tripathi, V. Bhadauria, “A low harmonic high gain subthreshold flipped voltage followerbased bulk-driven OTA suitable for low-frequency applications”, in: D. Harvey, H. Kar, S. Verma, V. Bhadauria (Eds.), Advances in VLSI, Communication, and Signal Processing. Lecture Notes in Electrical Engineering, vol. 683, Springer, Singapore, 2021, https://doi.org/10.1007/978-981-15-6840-4_ 38 S. Ghosh, and V, Bhadauria, “ High current efficiency single-stage bulk driven subthreshold-biased class-AB OTAs with enhanced transconductance and slew rate for large capacitive loads”, Analog 31. 32. Integrated Circuits and Signal Processing, 2021. https://doi.org/10.1007/s10470-021-01929-5 J.A. Galan, A.J. Lopez-Martin, R.G. Carvajal, et al., “Super class-AB OTAs with adaptive biasing and dynamic output current scaling”, IEEE Trans. Circuits Syst., I Reg. Pap., vol. 54 pp. 449–457, 2007. https://doi.org/10.1109/TCSI.2006.887639 X. Zhao, Q. Zhang, and M. Deng, “Super class-AB bulk driven OTAs with improved slew rate”, Electronics Letters, vol.51, no. 19, pp. 1488–1489, 2015. https://doi.org/10.1049/el.2015.1776 Copyright © 2023 by the Authors. This is an open access article distributed under the Creative Commons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Arrived: 23. 04. 2023 Accepted: 09. 07. 2023 77 78 Original scientific paper https://doi.org/10.33180/InfMIDEM2023.203 Journal of Microelectronics, Electronic Components and Materials Vol. 53, No. 2(2023), 79 – 86 An Energy-efficient and Accuracy-adjustable bfloat16 Multiplier Ratko Pilipović1, Patricio Bulić1, Uroš Lotrič1 Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia 1 Abstract: The approximate multipliers have been extensively used in neural network inference, but due to the relatively large error, they have yet to be successfully deployed in neural network learning. Recently, the bfloat16 format has emerged as a viable number representation for neural networks. This paper proposes a novel approximate bfloat16 multiplier with on-the-fly adjustable accuracy for energy-efficient learning in deep neural networks. The size of the proposed multiplier is only 62% of the size of the exact bfloat16 multiplier. Furthermore, its energy footprint is up to five times smaller than the footprint of the exact bfloat16 multiplier. We demonstrate the advantages of the proposed multiplier in deep neural network learning, where we successfully train the ResNet-20 network on the CIFAR-10 dataset from scratch. Keywords: approximate computing; deep neural networks; energy-efficient processing; bfloat16 multiplier Energijsko učinkovit približni množilnik v zapisu bfloat16 z nastavljivo natančnostjo Izvleček: Približni množilniki so se izkazali za zelo primerne pri sklepanju z nevronskimi mrežami, vendar zaradi relativno velike napake še niso bili uspešno uporabljeni pri učenju globokih nevronskih mrež. Pred kratkim se je za predstavitev realnih števil v nevronskih mrežah začel uveljavljati zapis bfloat16. V članku predlagamo nov približni množilnik v zapisu bfloat16 s sprotno nastavljivo natančnostjo za energetsko učinkovito učenje v globokih nevronskih mrežah. Velikost predlaganega množilnika je samo 62 % velikosti natančnega množilnika v zapisu bfloat16. Poleg tega je njegov energijski odtis do petkrat manjši od odtisa natančnega množilnika bfloat16. Uporabnost predlaganega množilnika predstavimo na primeru učenja globokih nevronskih mrež, kjer uspešno naučimo mrežo ResNet-20 na naboru podatkov CIFAR-10. Ključne besede: približno računanje; globoke nevronske mreže; energijsko učinkovito računanje; množilnik v zapisu bfloat16 * Corresponding Author’s e-mail: patricio.bulic@fri.uni-lj.si 1 Introduction between design efficiency and accuracy. Efficient designs come at the cost of accuracy reduction and vice versa. Nevertheless, approximate computing perfectly fits neural networks, which, to a certain extent, tolerate or even adapt to an error caused by noisy input data or erroneous computation. Widely used approaches in approximate computing are precision scaling and approximate arithmetic. Neural network capability of learning from data and generalising the gained knowledge makes them a very popular modelling tool in various application fields. The popularity growth in the last years can be attributed to the deep models, which pose considerable requirements to the processing hardware. Thus, new hardware solutions are being developed continuously to keep the processing hardware on par with the computing demands. In precision scaling [1], we use fewer bits to represent numeric values rather than executing all the required mathematical operations with the full representation. Several standards for the floating-point presentation recently appeared: IEEE 754-2019 for half-precision Approximate computing has emerged as a popular strategy for area- and energy-efficient circuit design, where the challenge is to achieve the best trade-off How to cite: R. Pilipović et al., “An Energy-efficient and Accuracy-adjustable bfloat16 Multiplier", Inf. Midem-J. Microelectron. Electron. Compon. Mater., Vol. 53, No. 2(2023), pp. 79–86 79 R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86 [2], posit format with dynamic range and mantissa [3] and Google’s bfloat16, targeting the machine-learning workloads [4]. Storing the numeric values with fewer bits reduces the size of arithmetic circuits and their complexity. Besides, it saves on-chip memory and reduces the amount of data that must be transferred, improving speed. tions. In [17], the authors proposed a 32-bit iterative approximate floating-point multiplier based on twodimensional pseudo-Booth encoding. The accuracy of the proposed multiplier is tuned by three parameters: iteration, encoder’s radix, and word length after truncation. To our knowledge, the only state-of-the-art approximate 16-bit bfloat multiplier is proposed in [15]. This variable-precision approximate multiplier uses the bfloat16 format for operand representation and the intermediate conversion of product exponent to the posit encoding to control the mantissa multiplication accuracy. All these multipliers were used only in the inference phase in deep learning models and in imageprocessing applications, where neglectable degradation in accuracy was observed. Multiplication represents a ubiquitous arithmetic operation in neural network processing. Moreover, multipliers are complex circuits that importantly affect a processing hardware’s area and energy footprint. Hence, the applications can benefit in terms of power and area consumption by replacing the exact multiplier with an approximate one. The approximate multiplier design can originate in the logarithmic approximation of numerical values [5-8] or non-logarithmic approaches, like discarding some stages in Booth multipliers [9-11]. Although most approximate multipliers are designed for fixed-point arithmetic, many floating-point designs, capable of presenting numerical values in a wider range, have appeared lately. A design that would suit most applications should be able to multiply with the required accuracy, not excluding exact computation, and accept a wide range of numeric values. In this paper, we propose an efficient and accuracy-adjustable approximate 16-bit multiplier for operands presented in the bfloat16 format, which does not require any hardware reconfiguration to adapt accuracy and demonstrates its applicability in the neural network inference and learning phases. There have been several attempts to use approximate integer multipliers in neural network learning [12-14]. The authors of these studies report that the learning was successful, but they mainly worked with tiny neural networks. To the best of our knowledge, there has yet to be a successful attempt to train large-scale neural networks using approximate multipliers. In neural network learning, we need higher precision arithmetic, so until now, neural networks have mainly been trained using the exact floating-point multipliers [3], [15]. In the remainder of the paper, we first detail the proposed BFILM multiplier design. Section 3 shows the hardware characteristics of the design and demonstrates the BFILM multiplier usability in neural network inference and learning. Lastly, we conclude the paper with the main findings. Common to most of the existing designs is that their accuracy can be adjusted at the design time. As such, they can perfectly fit the targeting application but fail for many others. However, many applications need adjustable accuracy during run time. In neural network processing, for example, we can use lower accuracy during the inference phase but need much higher accuracy during the learning phase. Moreover, some parts of an application may still require exact multiplication. For such an application, it would be beneficial to design a multiplier capable of handling all accuracy requirements, thus avoiding putting a plethora of multipliers on a chip and not exploiting them simultaneously. 2 The design of BFILM multiplier The proposed brain float iterative logarithmic multiplier (BFILM) operates on numerical values in the bfloat16 format. The advantage of representing the numerical value 0 in the bfloat16 format is, that it keeps one sign bit s(0) and the 8-bit exponent e(0) equal to the IEEE 754 single-precision floating-point format but shortens the mantissa m(0) to 7 bits. Thus, it enables using tiny numerical values, important in the neural network learning phase [18] for example. While the multiplier determines the sign and the exponent exactly, it follows the idea of the approximate iterative logarithmic multiplier to compute the mantissa. The number of steps, which determine the accuracy of the multiplier, can be changed on the fly. Several precision-tuning 32-bit floating-point multipliers for deep neural network processing have recently been proposed. The work [16] proposes the 32-bit floating-point approximate PAM multiplier with runtime customisation, which can successfully replace a single-precision floating-point multiplier in some deep neural networks and image-processing applica- Fig. 1 shows the structure of the BFILM multiplier, which takes operands 01 and 02 to compute the approximate product Papprox. The multiplier consists of a straightforward circuit for determining the sign of the product 80 R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86 and two loosely connected circuits for determining the product’s exponent and mantissa. An important component of the BFILM multiplier is the approximate mantissa multiplier that relies on the iterative logarithmic multiplier (ILM) [7]. Suppose we have two non-negative 8-bit operands x and y, expressed as the sum of the leading bit and the residu- 2.1 The exponent circuitry The exponent circuity in Fig. 1 incorporates two adders. We must add both operands’ exponents to get the product’s exponent. However, the bfloat16 format uses the offset-binary representation of the exponent, with the zero offset being 127. To correctly code the product’s exponent, we need an additional adder to subtract the offset. The logic connected to the carry input cin of the first adder covers the situations when the product’s exponent must be normalised due to the large approximate product Pa obtained from the mantissa multiplier. k x 2k x + rx and = y 2 y + ry , which multiply to um, = the product   k k p  xy  x 2 y  ry  x 2 y  xry (1) k  x 2 y  2kx ry  rx ry . By summing up the first-order Taylor expansions of  log 2 x  k x  log 2 1  rx 2 k x  2.2 The mantissa circuitry    k x  ln 1  rx 2 k x log 2 e (2)  k x  rx 2 k x log 2 e The mantissa circuitry in Fig. 1 comprises the mantissa multiplier and the mantissa normalizer. The mantissa stores only the fractional bits, to which we must prepend the leading one to get an 8-bit fixed point unsigned number at the input to the mantissa multiplier. The multiplication results is a product, given in 16-bit unsigned fixed-point format with two integer bits and 14 fractional bits, of which we take only the nine most significant bits to the output Pa of the mantissa multiplier. We form the product’s mantissa m(Papprox) regarding the integer part of the output Pa. When it is greater than one with Pa [8] set, we normalise the result by shifting the radix point one place to the left. To do so, we increment the product’s exponent and take the middle seven bits Pa [7:1]. In all other cases, normalisation is unnecessary, and the product’s mantissa equals the seven least significant bits Pa [6:0]. and log 2 y ≈ k y + ry 2 tion −ky log 2 e , we get the approxima-    k k k log 2 p   k x  k y   2  x y  rx 2 y  ry 2kx log 2 e   k x  k y   log 2 1  2    kx  k y  r 2 x ky   ry 2kx   (3) By taking the antilogarithm of log2 p approximation, we obtain an approximate product    k  k  k  kx  k y   1  2 x y rx 2 y  ry 2kx    ky  kx  k y  kx 2  rx 2  ry 2 pa  2  kx  ky  2  rx 2  ry 2 (4) kx k  x 2 y  ry 2kx which equals equation (1) with the last term omitted. Thus, computing the product approximation pa requires only two shifts and an addition, completely avoiding multiplication of the term rxry. The ILM core circuitry in Fig. 2 computes the approximate product and both residua. The leading one dek k tectors extract the leading one bits 2 x and 2 y and their characteristic numbers kx and ky from operands x and y. We need both leading one bit to compute the residua and the characteristic numbers to do the required shifts of the operand x and the residuum ry. The truncated barrel shifters output only the nine most significant bits required in further processing, thus importantly reducing their size and the size of the adder. Figure 1: The circuitry of the 16-bit bfloat multiplier. 81 R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86 initial ILM step (I = 1), the multiplexers pass the operands X and Y to the ILM core, while in the next ILM steps (I>1), the multiplexers feed the ILM core with residua rx and ry from the previous ILM step. The accumulator keeps the approximation of the mantissa product, which is in each ILM step increased by the value pa. To comply with the circuitry presented in Fig. 1, the accumulator needs to keep only the nine most significant bits. At this point, we would like to emphasize that the proposed multiplier does not require any hardware reconfiguration if we want to perform more than one ILM step. For example, when more ILM steps are required, we only need to feed the residua rx and ry (Fig. 2) back to the input of the ILM core as presented in Fig. 3. In this case, the multiplexers choose what goes to the ILM core: the new operands, X and Y, or the residua from the previous iteration, rx and ry. In the actual implementation, of course, we must add registers at the input of multiplexers, but these are not shown for simplicity. Figure 2: The circuitry of the ILM core. 3 Results The relative error of the product (p - pa)/p = rxry /p can be as high as 25 %. To reduce it, we can iteratively repeat the above procedure by multiplying residua rx and ry and adding the result to the current approximation. The procedure can be repeated until at least one residuum becomes zero, thus achieving an error as small as necessary. 3.1 Hardware performance We implement the multipliers in Verilog and synthesise them to the SkyWater PDK 130 cell library using OpenLane [19-21]. The library consists of a 130 nm technology with an operating voltage of 1.8 V, and five metal layers [22-23]. The timing constraints, used for all evaluated designs, specify clock-related parameters, which affect synthesis and timing analysis. We set a clock signal with a period of 10 ns, hence not violating a critical path. To evaluate the power, we use timing with a 100 MHz virtual clock (by definition, a virtual clock is a clock that has no real source in the design and is commonly used to specify delay constraints during static timing analysis), load capacitance equal to 33.442 fF (PDK default) and supply voltage equal to 1.8 V. The mantissa multiplier shown in Fig. 3 comprises the ILM core, two multiplexers, and an accumulator to iteratively refine the approximate mantissa product Pa. In the We analysed the hardware performance of the BFILM multiplier in terms of power, area, delay, and powerdelay-product (PDP) and compare it with the exact bfloat16 multiplier. Table 1 shows that the BFILM multiplier outperforms the exact multiplier in all hardware metrics; its energy consumption estimated through PDP is even more than five times smaller. Table 1: The synthesis results for the examined multipliers. Multiplier exact bfloat16 BFILM Figure 3: The circuitry of the approximate mantissa multiplier. 82 Delay Power [ns] [uW] 2.89 869 1.67 298 Area [um2] 6120 3796 PDP [fJ] 2590 498 R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86 Table 2 compares hardware characteristics of the stateof-the-art variable-accuracy bfloat16 multipliers. The results are given as relative values to the standard reference implementations of the exact bfloat16 multiplier. The BFILM multiplier, with its very slim design, outperforms the recently proposed BFLP16-prop multiplier [15] in all aspects. These results suggest that the BFILM multiplier should fit well with error-resilient applications where low-energy consumption is an important goal and where most of the time the BFILM multiplier with a small number of ILM steps could be used. An important feature of the BFILM multiplier is that we can control the product accuracy by adjusting the number of ILM steps without hardware modification, ultimately leading even to removing the exact multiplier from the circuitry. Table 2: Comparison of the bfloat16 multipliers regarding hardware gains relative to the exact bfloat16 multiplier. 3.2 Impact on neural network learning Multiplier exact bfloat16 BFLP16-prop [15] BFILM Delay [%] 100 104 58 Power [%] 100 58 33 Area [%] 100 81 62 PDP [%] 100 59 19 Convolutional neural networks achieve remarkable performance in visual recognition tasks [24]. However, the learning and inference of convolutional neural networks are computationally demanding tasks that involve many multiplications. Nevertheless, convolutional neural networks are error-tolerant models, making them perfect candidates for employing approximate multipliers. Therefore, we assess the influence of the proposed multiplier on the performance of the inference and learning phases. Since the BFILM multiplier does not require reconfiguration or additional hardware for more accurate processing, the multiplier’s size (area) and power are preserved for an arbitrary number of the ILM steps. Of course, with the additional ILM steps, it is necessary to observe that residua rx and ry must be multiplied once or twice and added to the final product. Therefore, in this case, the processing time required to calculate the product increases linearly with the number of the ILM steps and thus does also the energy consumption. We assess different configurations of the BFILM multiplier in terms of delay, energy consumption (PDP) and the mean relative error distance (MRED), and present them in Table 3. For easier comparison, the delay and energy consumption are given relative to the values of the exact bfloat16 multiplier. To evaluate the BFILM multiplier, we select the ResNet-20 convolutional neural network [25-26] and the CIFAR-10 dataset [27]. We change the number representation in the ResNet-20 convolutional neural network from the single-precision floating-point format to the bfloat16 format. In the experiments, we use the Caffe framework [28], where we replace the calls to the cuBLAS multiplication routines with the calls to our own GPU kernels, which emulate the proposed BFILM multiplier. The proposed multiplier with two or three ILM steps has a lower energy consumption than the exact bfloat16 multiplier and the BFLP16-prop multiplier [15]. Moreover, the BFILM multiplier with two ILM steps is not much slower than the state-of-the-art BFLP16-prop multiplier [15]. However, the BFILM multiplier with only one ILM step has a rather large error, which with two ILM steps comes close to the BFLP16-prop multiplier’s MRED, and then drops by order of magnitude with each additional ILM step. The neural network learns using the predetermined split of the dataset to train and test sets [27]. Before learning, we preprocess the images by subtracting their mean value. Besides, we quantify the ResNet-20 single-precision floating-point weights to the bfloat16 format representation by simply discarding the last 16 bits of the floating-point mantissa. In the learning phase, we optimize the multinomial logistic loss function [29] with the Nesterov momentum algorithm [30]. The learning starts with randomly initialised weights. In all experiments, we train the network for 64000 epochs. Table 3: Comparison of delay, PDP, and the MRED error for the different number of ILM steps in the BFILM multiplier. Multiplier exact bfloat16 BFLP16-prop [15] BFILM, 1 ILM step BFILM, 2 ILM steps BFILM, 3 ILM steps Delay [%] 100 104 58 115 173 PDP [%] 100 59 19 38 58 In the first experiment, we evaluate the influence of the proposed multiplier on the ResNet-20 classification accuracy. As the BFILM multiplier is configurable in terms of the number of steps affecting the multiplication error, we test several BFILM configurations. In the tested configurations, BFILM-1-1, BFILM-1-2, BFILM-2-2 and BFILM-2-3, the first number denotes the number of ILM steps in the inference phase, while the second number denotes the number of ILM steps used in the learning phase. MRED [10-3] 0 3.50 91.21 9.08 0.86 83 R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86 Table 4 shows the classification accuracy of the CIFAR-10 dataset. For each configuration, we list the average value and standard deviation over five runs. Significant multiplication error of BFILM-1-1 leads to low classification accuracy. Increasing the number of the ILM steps in the inference and learning phase improves classification accuracy. For example, with BFILM-2-2 and BFILM-2-3, the classification accuracy is almost the same as with the exact bfloat16 multiplier. Table 4: Performance of the ResNet-20 convolutional neural network on the CIFAR-10 dataset using bfloat16 multipliers. Multiplier exact bfloat16 BFILM-1-1 BFILM-1-2 BFILM-2-2 BFILM-2-3 Test set classification accuracy [%] 91.50 ± 0.10 86.32 ± 1.26 90.98 ± 0.15 91.30 ± 0.30 91.40 ± 0.20 Figure 4: Varying configuration of BFILM during the learning phase. Also, we can see from the results for BFILM-1-1 and BFILM-1-2 that increasing the number of the ILM steps in the learning phase positively affects classification performance. On the other hand, a further increase in the number of steps in the inference phase from BFILM-1-2 to BFILM-2-2 has much less impact. Moreover, according to Table 3, BFILM-1-2 has a very small energy footprint and thus could be sufficient for neural network inference and learning. 4 Conclusion In this paper, we proposed a novel approximate bfloat16 multiplier with adjustable accuracy, which can be achieved without any hardware reconfiguration. Instead, the proposed BFILM multiplier iteratively uses an approximate logarithmic multiplier core to reduce the error. This way, we avoid using additional error refinement circuits, keeping the design small and energy efficient. The primary purpose of the proposed design is to use it in deep neural network processing in the inference and learning phases. We apply the BFILM multiplier in the ResNet-20 convolutional neural network to classify the CIFAR-10 dataset. We demonstrate the impact of various BFILM configurations on the neural network learning process and classification accuracy. The results show that we can easily adjust the multiplier’s accuracy according to the application’s requirements. The main advantage of the on-the-fly adaptation of the BFILM multiplier comes to expression during the learning phase. The results prove that we can start with one ILM step in the inference and learning phase to save energy and later, when model performance improves, increase the number of the ILM steps to refine the result further. In future work, we aim to develop an algorithm that could optimize the learning process in terms of speed and efficiency by automatically adapting the ILM steps to the BFILM multiplier when needed. The second experiment highlights the advantage of the on-the-fly accuracy adaptation of the BFILM multiplier, which can help in faster and more energy-efficient neural network learning. The idea is to start with one ILM step in the inference and learning phase to save energy and later, when model performance improves, increase the number of the ILM steps to further refine the result. Fig. 4 shows the outcome of the learning process on the training and testing set for five separate runs, each with randomly initialised neural network weights. For the loss (red) and the accuracy (green), we show the span of obtained values and the curve averaged over all runs. We see that with the BFILM-1-1 configuration, the model improves rapidly and reaches a classification accuracy of more than 60 % in only 10000 epochs. At this point, we use an additional ILM step in the learning phase (BFILM-1-2) to improve the model’s convergence and achieve more than 99.4 % of the accuracy of the exact bfloat16 multiplier. However, if the accuracy still needs to be increased for some applications, we can enhance the model by training it with additional ILM steps. 84 R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86 5 Acknowledgments 8. This research was supported by Slovenian Research Agency under Grants P2-0359 (National research program Pervasive computing), P2-0241 (Synergy of the technological systems and processes) and by Slovenian Research Agency and Ministry of Civil Affairs, Bosnia and Herzegovina, under Grant BI-BA/21-23-033 (Bilateral Collaboration Project). 9. 6 Conflict of Interest 10. The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results. 11. 7 References 1. 2. 3. 4. 5. 6. 7. 12. G. Armeniakos, G. Zervakis, D. Soudris, and J. Henkel, ‘‘Hardware approximate techniques for deep neural network accelerators: A survey,’’ ACM Comput. Surv., mar 2022. https://doi.org/10.1145/3527156. “IEEE standard for floating-point arithmetic,”” 2019, IEEE Std 754-2019 (Revision of IEEE 7542008). R. Murillo, A. A. Del Barrio Garcia, G. Botella, M. S. Kim, H. Kim, and N. Bagherzadeh, “Plam: a posit logarithm-approximate multiplier,” IEEE Transactions on Emerging Topics in Computing, pp. 1–1, 2021. H. Kim, ‘‘A low-cost compensated approximate multiplier for bfloat16 data processing on convolutional neural network inference,’’ ETRI Journal, vol. 43, no. 4, pp. 684–693, 2021. https://onlinelibrary.wiley.com/doi/abs/10.4218/etrij.2020-0370. J. N. Mitchell, ‘‘Computer multiplication and division using binary logarithms,’’ IRE Transactions on Electronic Computers, vol. EC-11, no. 4, pp. 512– 517, Aug. 1962. V. Mahalingam and N. Ranganathan, ‘‘Improving accuracy in Mitchell’s logarithmic multiplication using operand decomposition,’’ IEEE Transactions on Computers, vol. 55, no. 12, pp. 1523–1535, Dec. 2006. https://doi.org/10.1109/TC.2006.198. Z. Babić, A. Avramović, and P. Bulić, ‘‘An iterative logarithmic multiplier,’’ Microprocessors and Microsystems, vol. 35, no. 1, pp. 23–33, 2011. https://doi.org/10.1016/j.micpro.2010.07.001. 13. 14. 15. 16. 17. 18. 85 M. S. Kim, A. A. D. Barrio, L. T. Oliveira, R. Hermida, and N. Bagherzadeh, ‘‘Efficient Mitchell’s approximate log multipliers for convolutional neural networks,’’ IEEE Transactions on Computers, vol. 68, no. 5, pp. 660–675, Dec. 2019. https://doi.org/10.1109/TC.2018.2880742. V. Leon, G. Zervakis, D. Soudris, and K. Pekmestzi, ‘‘Approximate hybrid high radix encoding for energy-efficient inexact multipliers,’’ IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 3, pp. 421–430, Nov. 2018. https://doi.org/10.1109/TVLSI.2017.2767858. H. Waris, C. Wang, and W. Liu, ‘‘Hybrid low radix encoding-based approximate Booth multipliers,’’ IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 12, pp. 3367–3371, Feb. 2020. https://doi.org/10.1109/TCSII.2020.2975094. H. Waris, C. Wang, W. Liu, J. Han, and F. Lombardi, ‘‘Hybrid partial product-based high-performance approximate recursive multipliers,’’ IEEE Transactions on Emerging Topics in Computing, vol. 10, no. 1, pp. 507–513, 2022. https://doi.org/10.1109/TETC.2020.3013977. U. Lotrič and P. Bulić, ‘‘Applicability of approximate multipliers in hardware neural networks,’’ Neurocomputing, vol. 96, pp. 57–65, 2012 [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0925231212003311 T. Y. Cheng, Y. Masuda, J. Chen, J. Yu, and M. Hashimoto, ‘‘Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training,’’ Integration, vol. 74, pp. 19–31, 2020. https://doi.org/10.1016/j.vlsi.2020.05.002. R. Pilipović, V. Risojević, J. Božič, P. Bulić, and U. Lotrič, ‘‘An approximate GEMM unit for energyefficient object detection,’’ Sensors, vol. 21, no. 12, 2021. https://doi.org/10.3390/s21124195 H. Zhang and S. B. Ko, ‘‘Variable-precision approximate floating-point multiplier for efficient deep learning computation,’’ IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, pp. 2503–2507, 5 2022. https://doi.org/10.1109/TCSII.2022.3161005. C. Chen, W. Qian, M. Imani, X. Yin, and C. Zhuo, ‘‘PAM: A piecewise-linearly-approximated floating-point multiplier with unbiasedness and configurability,’’ IEEE Transactions on Computers, vol. 71, pp. 2473–2486, 10 2022. https://doi.org/10.1109/TC.2021.3131850. A. Towhidy, R. Omidi, and K. Mohammadi, ‘‘On the design of iterative approximate floating-point multipliers,’’ IEEE Transactions on Computers, 2022. https://doi.org/10.1109/TC.2022.3216465. A. Y. Romanov, A. L. Stempkovsky, I. V. Lariushkin, G. E. Novoselov, R. A. Solovyev, V. A. Starykh, I. I. R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. Romanova, D. V. Telpukhov, and I. A. Mkrtchan, ‘‘Analysis of posit and bfloat arithmetic of real numbers for machine learning,’’ IEEE Access, vol. 9, pp. 82 318–82 324, 2021. https://doi.org/10.1109/ACCESS.2021.3086669. A. A. Ghazy and M. Shalan, ‘‘OpenLANE: The Open-Source Digital ASIC Implementation Flow,’’ in 2020 Workshop on Open-Source EDA Technology (WOSET), 2020, last accessed 27 September 2022 .Available: https://woset-workshop.github. io/PDFs/2020/a21.pdf OpenLane, ‘‘Openlane EDA Toolset.’’ 2022, last accessed 27 September 2022. Available: https:// github.com/The-OpenROAD-Project/OpenLane M. Chupilko, A. Kamkin, and S. Smolov, ‘‘Survey of open-source flows for digital hardware design,’’ in 2021 Ivannikov Memorial Workshop (IVMEM), 2021, pp. 11–16. T. Edwards, ‘‘Google/SkyWater and the Promise of the Open PDK,’’ in 2020 Workshop on OpenSource EDA Technology (WOSET), 2020, last accessed 27 September 2022. Available: https:// woset-workshop.github.io/PDFs/2020/a03.pdf ‘‘Google SkyWater Open Source PDK.’’ 2022, last accessed 27 September 2022. Available: https:// github.com/google/skywater-pdk A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Imagenet classification with deep convolutional neural networks,’’ in Advances in Neural Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds., vol. 25. Lake Tahoe, NV, USA: Curran Associates, Inc., Dec. 2012, pp. 1097–1105. K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image recognition,’’ in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. Y. He, X. Zhang, and J. Sun, ‘‘Channel pruning for accelerating very deep neural networks,’’ in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 1398–1406. A. Krizhevsky, ‘‘Learning multiple layers of features from tiny images,’’ University of Toronto, Toronto, Tech. Rep., Apr. 2009. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, ‘‘Caffe: Convolutional architecture for fast feature embedding,’’ in Proceedings of the 22nd ACM International Conference on Multimedia, ser. MM ’14. New York, NY, USA: Association for Computing Machinery, 2014, p. 675–678. Available: https://doi.org/10.1145/2647868.2654889 J. S. Long and J. Freese, Regression Models for Categorical Dependent Variables using Stata, 3rd 30. Edition. StataCorp LP, 2014. Available: https:// www.stata.com/bookstore/regression-modelscategorical-dependent-variables I. Sutskever, J. Martens, G. Dahl, and G. Hinton, ‘‘On the importance of initialization and momentum in deep learning,’’ in Proceedings of the 30th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, S. Dasgupta and D. McAllester, Eds., vol. 28, no. 3. Atlanta, Georgia, USA: PMLR, 17–19 Jun 2013, pp. 1139–1147. Available: https://proceedings.mlr. press/v28/sutskever13.html 6 VOLUME 11, 2023 Copyright © 2023 by the Authors. This is an open access article distributed under the Creative Commons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Arrived: 22. 06. 2023 Accepted: 21. 07. 2023 86 Original scientific paper https://doi.org/10.33180/InfMIDEM2023.204 Journal of Microelectronics, Electronic Components and Materials Vol. 53, No. 2(2023), 87 – 102 A New Design Optimization Methodology of Fully Differential Dynamic Comparator Leila Khanfir1, Jaouhar Mouine*2 Laboratory of Analysis, Design and Control of Systems, University of Tunis El Manar, National Engineering School of Tunis, Tunis, unisia 2 Department of Electrical Engineering, College of Engineering, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia 1 Abstract: The need to reduce the time to market for high-performance integrated circuits has become a primary concern in modern electronics design. Many efforts are currently being made to streamline the design process for increasing complexity circuits while providing optimal performances, especially for nanoscale technologies. This paper presents a new and effective methodology for the design of fully differential comparators to achieve a high-performance operation using dynamic topology and nanoscale technology. The proposed methodology is not process dependent and can be applied to similar conventional comparator structures to optimize the operation speed while ensuring good offset cancellation, efficient noise immunity, and reduced design time and complexity. The design steps include theoretical analysis and simulation-based optimization of the comparator speed, as well as offset and noise reduction within a minimal design time. All the analog and digital building blocks are designed using dynamic topologies, including the clock generator, to ensure high speed and synchronized operation. The resulting circuit is a new two-stage dual clock fully differential comparator. Compared with its equivalent counterparts, it provides improved operation speed, and reduced offset voltage and kickback noise. This comparator is designed in the TSMC 65 nm CMOS process. Its performance shows that it achieves a 1.25 GHz operation speed, presents less than 9 mV offset error, and generates a kickback noise of less than 40 mV with a 10 kΩ input resistance during the reset phase only. It consumes 213 µW from a 1.2 V power supply at 1.25 GHz. Keywords: fully differential dynamic comparator; kickback noise; offset self-calibration; clock generator; finite state machine. Nova metodologija optimizacije zasnove polnega diferencialnega dinamičnega komparatorja Izvleček: Potreba po skrajšanju časa za trženje visoko zmogljivih integriranih vezij je postala glavna skrb pri sodobnem načrtovanju elektronike. Trenutno potekajo številna prizadevanja za racionalizacijo postopka načrtovanja vedno bolj zapletenih vezij ob zagotavljanju optimalnih zmogljivosti, zlasti za tehnologije v nanometrski razsežnosti. V tem članku je predstavljena nova in učinkovita metodologija za načrtovanje polnih diferencialnih komparatorjev za doseganje visoko zmogljivega delovanja z uporabo dinamične topologije. Predlagana metodologija ni odvisna od procesa in jo je mogoče uporabiti za podobne konvencionalne strukture komparatorjev, hkrati pa zagotovi dobro izničevanje odmikov, učinkovito odpornost proti šumom ter skrajša čas in zapletenost načrtovanja. Koraki načrtovanja vključujejo teoretično analizo in na simulaciji temelječo optimizacijo hitrosti delovanja komparatorja ter odpravo kompenzacije in šuma v minimalnem času načrtovanja. Vsi analogni in digitalni gradniki so zasnovani z uporabo dinamičnih topologij, vključno z generatorjem ure, da se zagotovi visoka hitrost in sinhronizirano delovanje. Tako nastalo vezje je nov dvostopenjski dvotaktni polni diferencialni komparator. V primerjavi z enakovrednimi primerki zagotavlja večjo hitrost delovanja ter manjšo kompenzacijsko napetost in šum povratnega udarca. Ta komparator je zasnovan v 65 nm postopku CMOS podjetja TSMC. Njegovo delovanje kaže, da dosega hitrost delovanja 1,25 GHz, ima manj kot 9 mV napako odmika in ustvarja šum odboja manj kot 40 mV z vhodno upornostjo 10 kΩ. Pri 1,25 GHz porabi 213 µW iz 1,2-voltnega napajanja. Ključne besede: fully differential dynamic comparator; kickback noise; offset self-calibration; clock generator; finite state machine. * Corresponding Author’s e-mail: *j.mouine@psau.edu.sa How to cite: L. Khanfir et al., “A New Design Optimization Methodology of Fully Differential Dynamic Comparator", Inf. Midem-J. Microelectron. Electron. Compon. Mater., Vol. 53, No. 2(2023), pp. 87–102 87 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 1 Introduction FDDC when it came to the process and mismatch variations and noise unbalance. The suppression of the sensed common noise was less efficient. In addition, the comparison was performed over two clock cycles, which affected the operation speed. Moreover, a static second stage was added to the comparator to increase the gain, making the comparators even slower. Another FDDC was used in [7] to implement a SAR-assisted noise-shaping pipeline ADC. The proposed structure included self-calibrated current sources to compensate for mismatches and operated with two synchronized clocks. The circuit design achieved good performance. However, the proposed comparator was specific to the designed ADC and the operation speed was very low. The scaling of silicon technologies has been one of the primary factors that have allowed for outpacing the exponential increase of performance demand over the past few decades. Transistor scaling increases the integration density and operation speed. At the same time, the resulting circuits are more sensitive to random and systematic errors, such as offset and noise. Additional circuitry for error compensation and noise suppression is then needed, leading to a drastic increase in design time and effort. Therefore, in modern circuit design, optimization methodologies to improve performances have become mandatory not only to answer to the increasing design constraints, but also to compensate for increased errors and noises while optimizing the time to market. Recently, optimization methodologies have become a major research field in MOS circuit design [1]–[3]. As for offset compensation, mismatches are usually calibrated off-chip to reduce the design complexity [4], [5]. In [7], a background calibration for interstage offset was proposed to compensate for comparator mismatches. Even if the operation speed was not altered, there were “dead zones” in the calibration scheme that reduced its efficiency. Moreover, the proposed scheme mainly relies on the overall system architecture and can hardly be reproduced with a different circuit design. Dynamic comparators are largely used in advanced mixed signal systems, such as analog to digital converters (ADCs). The design constraints of these systems are usually stringent, depending closely on those of the comparator. To improve the immunity of ADCs to sensed common noise, a specific variant of the dynamic comparator is usually used [4]– [7]; it is a six-terminal circuit that compares an input voltage difference to a reference voltage difference [8] and is commonly called a differential pair comparator or fully differential dynamic comparator (FDDC). However, it is slower than the common four-terminal-like circuit and is more sensitive to kickback noise, as well as process and mismatch variations [9]. Achieving high performance and good noise immunity with a six-terminal dynamic comparator requires more design effort and time than with the common four-terminal one. Therefore, design methodologies could help designers significantly reduce design time and efforts. The comparator gain is also an important feature to implement high-resolution ADCs. It is usually increased by using preamplification stages or multistage comparators. In [5], a three-stage comparator was used, but only the first one was dynamic. Thus, the comparator gain was high, whereas the operation speed was low. Likewise, a two-stage dynamic comparator, in which only the first stage is dynamic, was also presented in [10]. In [11], a three-stage, fully dynamic comparator was proposed. However, the presented structure was not fully differential, and the three stages operated over the same clock period. The current paper presents a new two-stage fully dynamic fully differential comparator. The decision is made over the entire clock period. Also, additional circuitry is added to generate synchronized clocks, to reduce kickback noise, and to compensate for mismatches. The whole system is fully dynamic without a considerable increase in design complexity. It achieves a fully differential comparison, optimal operation speed, good immunity to kickback noise, and self-calibrated offset voltage. The proposed design is process independent and can be used in different applications. An FDDC was employed in [4] for its low kickback noise, good power efficiency, and simple dynamic structure. To reduce mismatch effects on loop stability, the authors kept the comparator gain at low values, leading to a considerable decrease in the operation speed. As for immunity to comparator noise, the authors applied a noise-shaping successive approximation register quantizer to all stages in a pipelined ADC. Although the proposed technique has advantages other than the noise immunity of the comparator, it remains complex and specific to the designed ADC. In [5], a charge distribution FDDC was used to implement a levelcrossing ADC. It was constructed with two separate comparators to compare the differential input voltage to a differential reference voltage. The two separate comparators were more sensitive than an all-in-one Section 2 presents the proposed system architecture of the FDDC, including clock generation, kickback noise immunity, and offset calibration. The new two-stage FDDC is presented and discussed in section 3. Its operation is also detailed and compared with the onestage-like circuit. Section 4 describes the proposed 88 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 circuit and how it ensures immunity to kickback noise while also detailing the clock generator design. Section 5 presents the proposed design technique for a digital offset self-calibration scheme using full custom dynamic circuits. Section 6 presents the simulation results and circuit characterization. A comparison to state-ofthe-art performances is also addressed. MOS devices, which makes the circuit more sensitive to process and mismatch variations, especially when designed in nanometer-scale technologies. Moreover, because of the dynamic operation of the comparator, there are large voltage variations in the internal nodes between devices. These variations are coupled through parasitic capacitors to the comparator inputs as a voltage signal creating a disturbance that is usually called kickback noise. This switching noise is added to the analog input signal and affects the comparison results. Kickback noise cannot be removed, but there are a few techniques to reduce its effects on the decision process [12], [13]. 2 Proposed system architecture The comparator is typically a one-bit ADC. When the difference between the compared voltages is about a few hundred millivolts or more, the decision process is usually accurate and fast. However, as the input voltage decreases to a few millivolts and less, the decision process becomes much slower and more sensitive to the input signal quality, as well as to circuit nonidealities such as offset and switching noises. Indeed, analog signals usually present noise. Noise is random and common to comparator inputs. On the other hand, a dynamic comparator is usually designed with small Fig. 1(a) shows a four-terminal comparator, which is known as the strong-arm latch comparator and has been largely used in ADC design [14]. It presents two inputs and two outputs. One input is generated from an external voltage source, while the other comes from a resistive ladder. This affects the two inputs with different noise levels, making the comparison process only effective when the sensed voltage is greater than the difference between the two input noise signals. In contrast, a six-terminal comparator is shown in Fig. 1(b); this is called the differential pair comparator [8], [15] or FDDC [16], [17], and has been largely used in pipeline and SAR ADCs [18]. This comparator presents four inputs and two outputs. The inputs are a differential analog input signal and differential reference voltage. The two outputs are complemented: a positive output OP and negative output OM. The positive output OP goes high when the differential analog input voltage VIN+ - VIN- is greater than the reference voltage difference VREF+ - VREF-: if VIN   VIN -   VREF   VREF -   0 then OP  '1' and OM  '0 ' else OP  '0 ' and OM  '1' (1) Thus, the common noise in each differential input is cancelled separately, which considerably improves the comparator precision. This section describes the toplevel architecture of the proposed FDDC, including immunity to kickback noise and self-calibration of the offset voltage. Fig. 2 describes the proposed system. The symbol shown in Fig. 2(a) presents the input and output terminals of the system. Fig. 2(b) illustrates the clock diagram of the external and internal clock signals, while Fig. 2(c) depicts the top-level architecture. The proposed comparator is a new two-stage FDDC. The clock generator produces two synchronized clock Figure 1: Strong-arm latch comparator (a) dynamic comparator (b) fully differential dynamic comparator. 89 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 signals clk and clks to ensure the operation of the first and second stages, respectively. To reduce the input noise, an RC circuit can be added at the comparator inputs as a first-order filter, but at the price of a reduced operation speed. In the proposed solution, the resistance of a CMOS switch and parasitic capacitor Cp at the comparator inputs together form an RC filter. These two components are too small to affect the comparator speed but also too small to ensure the cancellation of the kickback noise. Therefore, in the proposed scheme, the switches, together with the input parasitic capacitors, are used as track-and-hold circuit blocks, which are controlled to reduce the effect of noise on the decision process. Two clock signals clkn and clkn’ are used to control the switches to operate only during the comparator reset time (when clk = `0’) before beginning a new cycle. Thus, kickback noise appears at the comparator inputs for a limited period, during which the decision process cannot be affected. crements one of the two N-bit control signals d+ and d- by 1 to compensate for the mismatches in the comparator as well as in the switches at the comparator inputs. This process continues as long as Q+, Q-, calib and calib’ remain unchanged. The offset regulator design is detailed in section 5. The clock generator provides four synchronized clock signals: clk, clks, clkn, and clkn’. These clock signals ensure a three-phase operation comparator: track-and-hold, decision, and reset. The circuit is designed so that the track-and-hold, as well as a part of the decision process, are performed during the reset time, which improves the comparator speed compared with the state-of-theart method. The comparator is described in detail in section 3, while noise suppression and clock generation are presented in section 4. Figure 2: Proposed system (a) symbol view (b) clock diagram (c) architecture. To compensate for the comparator offset errors, a three-phase operation system is proposed. First, the initial reset phase is controlled using the external signal reset. When this signal is high, the two N-bit outputs d+ and d- of the two counters are initialized to zero. Thus, the initial reset phase allows for initializing the capacitor banks to equal initial charges. This represents the initial state S0 of the two FSMs used in the self-calibration process. At that time, the eight input switches are configured to connect the four comparator inputs IN+, IN-, REF+, and REF- to the differential inputs VIN+, VIN-, VREF+, and VREF-, respectively. Second, a calibration phase is controlled by two complementary external signals calib and calib’. This phase occurs only once after the initial reset phase. When calib and calib’ are set to ‘1’ and ‘0’, respectively, eight switches (in blue in Fig. 2(c)) that are placed at the system inputs disconnect the comparator inputs IN+, IN-, REF+, and REF- from the differential inputs VIN+, VIN-, VREF+, and VREF-, and connect them to the common mode reference voltages VCM, which ensures equal charges at the input parasitic capacitors. VCM is the mean value of the input range. During the calibration phase, the comparator outputs Q+ and Qare applied to the offset regulator. At each clock cycle, according to Q+ and Q- levels, the clock generator in- 3 New two-stage fully differential comparator The operation speed is a primary constraint in the comparator design. The comparison speed can be defined as the time required to provide a valid output decision. A dynamic comparator operates under a clock signal clk alternating decision and reset phases in each clock cycle. The two phases of decision and reset usually correspond to the two clock levels ‘1’ (on) and ‘0’ (off ). Thus, denoting the decision and reset times by ton and toff, respectively, the total comparison time tclk is equal to: tclk  ton  toff (2) A track-and-latch circuit, basically a Set Reset (SR) latch, is usually added to the comparator outputs to retrieve static output signals. Thus, the decision time ton is typically the sum of two times: the comparison time tc needed by the dynamic comparator to produce a valid output, and the SR latch time tSR required by the SR latch to change state according to the comparator outputs. The decision time ton is then equal to: 90 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 not be complementary, and the comparison decision will not be valid. This happens when resolving small input values and when the PMOS threshold voltage |V THP| is larger than VDD/2, which is usually the case in scaled technologies like 65 nm and below. ton  tc  tSR (3) Once the SR latch state has changed, the comparator outputs can be reset to the initial value without affecting the SR latch state until the next decision process begins. Inserting (3) into (2), the total comparison time tclk is then defined in terms of the comparison time tc, the SR latch response time tSR and the reset time toff: In the present work, a new two-stage FDDC where the comparison speed is optimized with no restriction on technology use is proposed. Indeed, as shown in Fig. 3, each stage includes a positive feedback loop, which reduces the comparison time tc compared with [3]. In addition, the positive feedback loop in the second stage provides complementary outputs, regardless of the technology parameters used. Moreover, the two stages operate under two different clock signals as in [3], which reduces the total comparison time tclk to the sum of the decision time tc and the reset time toff as defined in (5). tclk  tc  tSR  toff (4) The comparison time tc depends on the internal capacitor sizes, internal feedback loops, and the value of the resolved input voltage. For a few hundred millivolts of the input voltage, tc can be small and reach nano and picoseconds according to the comparator structure. However, when resolving near 0 V input values, the comparator output evolution becomes slow and tc tends to infinity. Therefore, to sense micro and nanovolt input values in a reduced time, it is necessary to minimize the comparator internal capacitors by using small devices, and to improve the comparator structure by creating positive feedback loops, immunity to switching noises, and compensation for process and mismatch variations. The circuit operates as follows: in the first stage, a differential analog input voltage ΔVIN = (VIN+ - VIN-) and differential reference voltage ΔVREF = (VREF+ - VREF-) are applied to the four input pair transistors (M1-4). The voltages VIN+ and VREF- are applied to transistors (M1,4), which have a common drain. These transistors generate two currents and feed node X- with a current, which is the image of the sum of the two applied voltages (VIN+ + VREF-). Likewise, considering the circuit symmetry, transistors (M2,3) feed node X+ with a current, which is the image of the sum of the two applied voltages (VIN- + VREF+). When the clock signal clk is low, the tail transistors (M5,6) turn off, while the reset transistors (M11-14) turn on. This allows for initializing the latch nodes X+, X-, O+ and O- to VDD. Conversely, when clk goes high, the tail transistors (M5,6) close while the reset transistors (M11-14) open. At this time, the four input pair transistors feed the latch nodes X+ and X- with a differential current ΔIX = IX+ - IX-, which is the image of the voltage difference between the sums of the applied voltages. This voltage difference is denoted as ΔVINPUT and is equal to: A double tail and three-stage triple-latch comparators are designed with a 28 nm MOS process [11]. The first one is a two-stage double tail comparator that includes only one positive feedback loop, while the second one includes three positive feedback loops. The first one achieves a comparison time tc equal to 50 ps against 27 ps for the second comparator when resolving the 5 mV input value. Nevertheless, in the two comparators, the stages operate during the same clock period, which makes tc the sum of the response times of all stages put in a series. Moreover, there is no improvement for tSR and toff in the total comparison time tclk in (4). In [3], a two-stage dual-clock latch comparator is proposed. The comparator includes one feedback loop. However, the second stage is controlled by a second clock, reducing the on-time ton in (2) to tc only. Thus, the total comparison time tclk defined in (4) becomes: + tclk  tc  toff (5) + (6) The resulting ΔIX activates the latch transistors (M7-10) which operate as a strong positive feedback loop to regenerate the outputs O+ and O- to complementary logic levels. The generated outputs are then applied to the input transistors (Ms1,s2) of the second stage. Transistors (Mdi+(i=1..N)) and (Mdi-(i=1..N)) are two capacitor banks, each one including N binary-weighted charges. These capacitor banks are controlled by two N-bit inputs, d+ = (di+(i=1..N)) and d- = (di-(i=1..N)), and are used to compensate for process and mismatch variations. This specific Moreover, the second stage is built with a stack of two elements only, which reduces the total capacitor seen at the outputs of the first stage, leading to a minimal comparison time tc. The comparator is designed with a 180 nm MOS process and achieves a comparison time of 900 ps when resolving a 25 µV input value. However, the second stage operates when only one of the firststage outputs decreases to a threshold value. If both outputs reach this value, the second-stage outputs will 91 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 structure of the charges also reduces the switching noise and improves the operation speed [3]. (DFFs). The true single-Phase clock (TSPC) DFF presented in [19] is considered to design the clock generator in the proposed system in Fig. 2(c). It is a nine-transistor, three-stage DFF operating with one single clock signal and including no more than three stacked devices per stage. This circuit is shown in Fig. 4, where a reset command and inverter are added to the output. Considering the second stage, when clks is high, outputs Os+ and Os- are initialized to ‘0’ turning off transistors (Ms3,s4). As clks becomes low, reset transistors (Ms5,s6) open. As shown in Fig. 3(b), this happens at the end of the reset phase of the first stage, where both outputs O+ and O- are initialized to VDD. Thus, transistors (Ms1,s2) turn off like the other four ones. When one of the first stage outputs O+ and O- begins decreasing, transistor (Ms1) or (Ms2), respectively, begins operating to charge one of the output voltage Os+ and Os-, respectively, to VDD. When the applied input voltage difference is too small, both O+ and O- can decrease before regenerating to logic levels. Then, transistors (Ms3,s4) will operate as positive feedback to maintain one of the outputs to ‘0’ while the other one charges to VDD. Without these transistors, this may result in both outputs Os+ and Osat VDD. In this case, when these signals are applied to the SR latch, they create an undefined state, resulting in a wrong output Q+ and Q- decision. The last stage is a NOR gate SR latch. It maintains its state when the applied signals Os+ and Os- are initialized to ‘0’ and keeps or changes the state when the outputs are complemented, resulting in static outputs Q+ and Q-. Figure 4: True Single-Phase Clock DFF (a) symbol (b) circuit-level Design. This structure is convenient and should provide an operational speed greater than the comparator. A detailed description of the circuit can be found in [19]. 4 Clock generator and kickback noise suppression Fig. 5 shows the design details of the proposed clock generator. The circuit generates four signal outputs clk, clks, clkn and clkn’. Because the last two are complemen- The generation of synchronized clock signals is achieved by sequential circuits using data flip flops Figure 3: Proposed FDDC (a) circuit (b) clock diagram. 92 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 tary, the circuit states can only be defined according to the three outputs clk, clks, and clkn. These three outputs are denoted by vector c = (clk clks clkn) = (x x x), where x is equal to ‘1’ or ‘0’. As described in Fig. 5(a) and (b), the circuit goes through four states S0, S1, S2, and S3. First, the clock generator is initialized to state S0 with an external reset = ‘1’. This state corresponds to the sampleand-hold phase by connecting external signals to the comparator inputs (Fig. 1(c)). This also corresponds to the reset of the comparator first stage. Vector c is then equal to (0 0 1). Second, state S1 corresponds to the reset of the two comparator stages, for which c is equal to (0 1 0). Third, state S2 is the state where the comparator first-stage operation begins, which corresponds to c equal to (1 0 0). Fourth, state S3 is the continuation of state S2 with c still equal to (1 0 0). This last state is required because the first-stage operation is slower than the second one. Therefore, high and low levels of clk must last longer than those of clks and clkn. can significantly reduce the effect of kickback noise on the decision process. However, the clock generation in [13] used delay circuits, and outputs were not synchronized. Hence, the design was specific to the chosen clock timing, as well as to the technology used. In contrast, the proposed design generates synchronized outputs and can be reproduced without considering the technology used or transistor size. 5 Proposed offset self-calibration technique In Fig. 2(c), the proposed offset regulator receives the comparator static outputs Q+ and Q- and generates two N-bit outputs d+ and d-. These outputs are then used to control the 2N binary-weighted transistors (Mdi+) and (Mdi-) shown in Fig. 3(a). The least significant bit (LSB) transistor is set to minimal dimensions, while, for the other weighted transistors, the channel width is doubled until reaching the most significant bit (MSB) transistor. The main idea is to create a progressive charge imbalance to compensate the comparator offset as in [3], [20]. However, in [3], a high-level design methodology for the self-calibration scheme is proposed. As a result, the circuit is slow and large because of the large number of chained gates. Whereas in [20], the offset regulation is off chip and too complex for a circuit-level design. In the present work, the proposed The finite state machine (FSM) is depicted in Fig. 5(b). It has no inputs and generates the three outputs: clk, clks, and clkn. The gate-level and circuit-level synthesis are given in Fig. 5(c) and (d), respectively. An inverter is added to generate the complement of clkn. State S0 is the sample-and-hold state, while state S2 is the decision phase. Inserting states S1 and S3 in between S0 and S2 allows for reduction of kickback noise effects on the decision process. Indeed, as discussed in [13], isolating the decision process from the sample-and-hold phase Figure 5: Proposed clock generator design (a) clock diagram (b) Moore finite state machine (c) gate-level design (d) circuit-level design. 93 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 offset regulator is minimalist and could be easily designed at the circuit level. pacitor banks, two cases will not be used to avoid a significant variation in the capacitive compensation load: “all transistors are on” and “all transistors are off”, which correspond to d+ and d- equal to 0 and 2N-1, respectively. Therefore, the two N-bit control signals d+ and d- should be initialized to 1, for which all the binaryweighted transistors are on, except for the LSB transistor. Then, according to Q+ and Q- levels, d+ and d- incrementation will either be stopped by setting E+ and E- to ‘0’ or pursued by setting E+ and E- to ‘1’. The incrementation should stop before reaching 2N-1, for which all transistors are blocked. The case d+ and d- equal to 2N-2 turns off all the calibrating transistors, except the LSB one. The parasitic capacitors of the blocked transistors can be neglected compared with those of the on transistors. Figure 6: Block diagram of the proposed offset regulator. The proposed offset regulator block diagram is presented in Fig. 6. The circuit input stage is an FSM, which receives the comparator static outputs Q+ and Q- and generates two control digits e+ and e-. These two digits are combined into the calibration control signal calib using an AND gate to generate two digital signals: E+ and E-. These signals are then used as two enable input signals of two N-bit counters. The counters generate two N-bit control words to calibrate the two capacitor banks in the comparator shown in Fig. 3(a). In these ca- Each conducting transistor is then equivalent to a capacitor. As a result, when d+ and d- are equal to ‘1’, the N-1 largest capacitors are in parallel. This sets the calibrating capacitive load at the maximum value on both sides of the comparator. When applying a 0 V-input voltage, the comparator output Q+ is either high or low. When Q+ is high, the comparator is considered as exhibiting a positive offset voltage. To compensate for this offset, d- is incremented by 1 (d- = d-(initial) +1 = 2). This corresponds to a first step decrease of the capacitive load on the right side of the comparator with Figure 7: Proposed FSM design to control the two counters, (a) Moore FSM, (b) proposed circuit-level design. Figure 8: Proposed N-bit counter design (a) Moore FSM (b) module-level design (c) proposed circuit-level design to start the counter from 1. 94 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 6 Simulation and comparison respect to the left side. Hence, in the next comparison cycle, the positive offset voltage either decreases toward 0 V or becomes negative. If the offset voltage is still positive in the next cycle, that is Q+ is still high, d- is incremented again. This continues until the offset voltage becomes negative, that is Q+ becomes low, or until d- reaches 2N-2. The generation of e+ and e- according to Q+ and Q- levels is described by the FSM shown in Fig. 7(a). A reset command sets the system to state S0 where both digital outputs e+ and e- are set to ‘0’. When Q+ and Q- are equal to ‘1’ and ‘0’, respectively, the system enters state S1 where the outputs e+ and e- are set to ‘0’ and ‘1’, respectively. The system remains in that state until Q+ and Q- change to opposite logic levels. When this happens, the system enters state S2 where outputs e+ and e- are set to ‘1’ and ‘0’, respectively. This state allows for rebalancing the system once the offset voltage changes signs. Simulations have shown better results when the system is rebalanced twice by adding state S3. To validate the proposed design methodology and evaluate the proposed circuit performances, the proposed two-stage FDDC shown in Fig. 3 has been designed in the TSMC 65 nm CMOS process using standard-threshold MOS devices. The offset calibrating capacitor banks are set to six bits. The basic comparator shown in Fig. 1(b), followed by a NAND-based SR latch is also designed using the same standard-threshold devices and will be used on a comparison basis to show the advantages of the proposed structure. In the first simulation set, both FDDCs are simulated at room temperature under nominal operating conditions. They are powered by 1.2 V supply voltage and operate at a 1.25 GHz clock frequency. A first DC voltage source is set to -300 µV and connected to the differential input voltage, while a second DC voltage source is set to VCM Then, the system enters a final state S7 where both outputs e+ and e- are set back to ‘0’ again. Because the circuit is symmetrical, considering Q+ and Q- equal to ‘0’ and ‘1’, respectively, leads to states S4, S5 and S6 which are symmetrical to states S1, S2 and S3, respectively. Fig. 7(b) shows the proposed FSM circuit synthesis. It uses dynamic circuits and the DFF shown in Fig. 4. To generate static outputs e+ and e- with maximal operation speed, switched circuits with positive feedback are used. The generated outputs are used to control two N-bit counters. Fig. 8(a) shows the FSM of an N-bit counter. The module-level design of the counter is shown in Fig. 8(b), while the proposed circuit-level design is shown in Fig. 8(c). In the proposed circuit-level design, the first DFF is reset to ‘1’ instead of ‘0’ to initialize the Nbit counter to 1 instead of 0. Fig. 9 shows the modified DFF. However, to avoid the counter reaching 2N-1, the on time of the external signal calib is set to exactly 2N-2 cycles. Figure 9: First DFF of the N-bit counter (a) symbol (b) circuit-level design. Figure 10: Transient analysis of the fully differential dynamic comparator (a) basic comparator (b) proposed two-stage comparator. 95 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 = 950 mV and is connected to both reference inputs to set VREF = (VREF+ - VREF-) to 0 V. Fig. 10 shows the transient analysis results for both comparators. This figure is used to determine the decision time ton for both structures. In Fig. 10(a), the decision time ton of the basic comparator, as defined in (3), is equal to the difference between when clk goes high and when the negative output Q- of the SR latch crosses the mid supply voltage value (VDD/2 = 600 mV). In this first case, the output Q+ and Q- transition must happen during the clk on-time. Otherwise, the decision could not be made, and the comparator output would be invalid. In Fig. 10(a), Q+ transition happens slightly before clk transition. In the third simulation set, immunity to kickback noise is simulated using the circuit of Fig. 12(a). In that circuit, the stage preceding the comparator represents the Thevenin equivalent, with a Thevenin resistor RTH equal to 2 x 5 k. The comparator is the proposed FDDC, including switches and the clock generator, as detailed in Fig. 3. To assess the proposed design, simulations are performed with and without noise reduction. The transient evolution of the comparator input signals with and without noise reduction is shown in Fig. 12(b). In both cases, the kickback noise is about a few tens of millivolts. However, compared with the input signals without noise reduction (in red in Fig. 12(b)), input signals with noise reduction (in green) exhibit noise during the reset time only, whereas without noise reduction, noise is present during the entire decision cycle. Although the noise maximum level is not reduced, the circuit remains immune to kickback noise during the decision phase, which is essential to ensure high accuracy. In the proposed circuit, the decision time ton is equal to the comparison time tc, as discussed in section 3. The comparison time tc in Fig. 10(b), corresponds to the difference between when clk goes high and when the negative output Os- of the second stage crosses 600 mV. In this second case, Os- transition must happen during the clk on-time. However, since Os- logic level is maintained during the reset, Q+ and Q- transition could happen at any time of the clock cycle, even after clk transition. Figure 11: Transient evolution of the generated clocks. Thus, the decision time ton is equal to 400 ps and 360 ps in the basic and proposed comparators, respectively. The speed improvement of 40 ps in the proposed comparator is then about 10%, as in [3]. However, in [3], the two-stage comparator could not operate properly when powered by voltages equal to 1.2 V and below. The proposed design operation is independent of the technology used, as discussed in section 3. Figure 12: Kickback noise simulation (a) simulation circuit (b) kickback noise at the comparator inputs. In the fourth simulation set, the offset correction is simulated using the circuit shown in Fig. 13(a). The differential analog input is connected to a triangular voltage source VINPUT = VIN+ - VIN- with a slope equal to 1mV/10 ns. The differential reference inputs VREF+ and VREF- are connected to a common mode voltage source VCM = VREF+ = VREF- = 950 mV. In the second simulation set, the clock generator shown in Fig. 5(d) is simulated under 1.2 V with a 20 GHz input clock signal clkIN. The results are shown in Fig. 11. In this figure, four synchronized outputs clk, clks, clkn, and clkn’ are generated in accordance with the clock diagram of Fig. 5(a). The simulations show that the signal frequency can exceed 2.5 GHz. 96 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 Figure 14: Offset voltage compensation by capacitor banks. Figure 15: Simulation circuit of the FDDC, including an offset voltage VOS. Figure 13: Offset self-calibration simulation (a) simulation circuit (b) ideal transfer characteristic (c) real transfer characteristic. The differential input voltage VREF = VREF+ - VREF- is then equal to 0 V. Thus, considering the two voltage differences, VINPUT and VREF, the ideal transfer characteristic of the dynamic comparator would be similar to the one presented in Fig. 13(b). Here, both hysteresis and offset are null. However, in real conditions, the comparator always exhibits hysteresis and offset [21]. Fig. 13(c) shows the realistic transfer characteristic. The hysteresis window is centered on VM and delimited by trip points V TR+ and V TR-. The offset voltage VOS is defined as the difference between VM and VREF: Vos  VM  VREF (7) Figure 16: FSM input output signals of the offset regulator when VOS = 50 mV. This offset definition is used to evaluate offset voltage compensation using the comparator capacitor banks. Considering the circuit symmetry, simulations can be performed by holding d+ or d- at 1 while incrementing the other from 1 to N - 2. In Fig. 14, the offset voltage is determined by holding d- at 1 while incrementing d+ from 1 to N - 2 for N - 2 clock cycles. Fig. 16 shows the applied input signals reset, calib, and calib’. The reset action initializes both FSM outputs e+ and e- to ‘0’, which corresponds to state S0 of the FSM shown in Fig. 7(a). Then, with Q+ and Q- equal to ‘1’ and ‘0’, respectively, the FSM outputs e+ and e- become ‘0’ and ‘1’, respectively, which corresponds to state S1. After 35 clock cycles, Q+ and Q- change to the opposite logic levels, leading e+ and e- to change to ‘1’ and ‘0’, respectively. This change lasts two clock cycles, which corresponds to state S2 followed by state S3 in the FSM. After In the fifth simulation set, the operation of the offset regulator FSM shown in Fig. 7(a) is evaluated using the circuit shown in Fig. 15. In this circuit, an offset voltage equal to 50 mV is added in series with a positive comparator input. 97 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 these two cycles, the outputs e+ and e- are set back to ‘0’ which corresponds to the FSM last state S7. Two signals E+ and E-, which are identical to e+ and e-, are also generated. Indeed, because calib = 1, an AND logic operation between calib and e+ and e-, as shown in Fig. 6, results in the two signals E+ and E-. These signals are applied to the enable inputs of two 6-bit counters, leading to two offset calibration control signals d+ and d-, respectively. Fig. 17 shows the generated control signals, where d+ and d- are equal to 3 and 37, respectively. Figure 17: Six-bit offset control signals d+ and d- when VOS = 50 mV. Figure 18: Offset evaluation with VOS = 50 mV (a) trip point V TR+ (b) trip point V TR-. In Fig. 18, the trip points VTR+ and VTR- are determined as 741 µV and -77 µV, respectively. The offset voltage VOS is determined using (7) and is equal to 332 µV. Thus, the proposed self-calibration method has effectively reduced the offset voltage from 50 mV to a few hundred microvolts. system could achieve. Fig. 19 shows the resulting offset regulator FSM outputs. The system goes through states S0, S1, S2, and S3. However, the control signal calib is set to ‘0’ before the FSM reaches state S2, that is, before Q+ and Q- change to the opposite logic levels. Indeed, the control signal calib is used to disable the counter incrementation when it reaches 2N-2, as discussed in section In the sixth simulation set, the circuit in Fig. 15 is used again with an offset voltage equal to 150 mV to evaluate the maximum offset correction that the designed Table 1: Summary and comparison of the characteristics of fully differential dynamic comparators. Parameter Process Topology Meas/Sim Offset regulation VOS range/max (mV) Kickback noise during decision (mV) Comparison rate (GHz) Power (µW) Energy/Comp. (fJ/comp) 2002’ [8] 2014’ [23] CMOS 350 CMOS nm 90 nm Diff. pair 4-inputs Measured Simulated No No 80 33 1 0.1 1 580 51 5800 51 98 2016’ [22] 2017’ [24] CMOS CMOS 180 nm 40 nm Diff. pair Diff. pair Measured Simulated No Analog ±1 0.45 3.33 0.5 2100 373 630.63 746 2018’ [25] This work CMOS 180 CMOS nm 65 nm Diff. pair Diff. pair Simulated Simulated No Digital ±5 ±9 ≈0 1.3 1.25 265 213 203.85 170.4 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 Figure 19: FSM input output signals of the offset regulator when VOS = 150 mV. Figure 21: Offset evaluation with VOS = 150 mV (a) trip point V TR+ (b) trip point V TR-.times. Figure 20: offset control signals d+ and d- when VOS = 150 mV. 5. Therefore, the enable signal E- is no longer identical to e-. Fig. 20 shows the resulting counters outputs d+ and d-, which are equal to 1 and 62, respectively. Fig. 21 is used to determine the maximal offset correction. The obtained offset voltage after correction is 4.33 mV. Thus, the system can achieve a maximal offset correction of 145.67 mV. In the seventh simulation set, the offset voltage is determined while considering the process and mismatch variations. Fig. 22 shows the offset variation of the designed FDDC under mismatch variation with and without offset calibration with a 100-run Monte Carlo simulation. Without offset calibration, the offset voltage VOS has a maximum variation of ±160 mV. This offset is reduced to ±9 mV after calibration. Figure 22: Monte Carlo simulation of the offset voltage (a) without calibration (b) with calibration. 99 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 The proposed design achieves an effective self-calibration of the offset voltage. The standard deviation is reduced from 41.3 mV to 2.23 mV after calibration, resulting in a decrease of more than 18 times. The offset correction can be improved by increasing the channel length of the calibration transistors, as discussed in [3], or by increasing the number of charges in the capacitor banks. 2. 3. The proposed design performance is summarized in Table. 1. This table also presents the performance achieved in current related works on FDDCs. The proposed design is the only one that includes offset calibration and noise cancellation in FDDCs. It achieves the second-best energy efficiency after a 40 nm CMOS design [22]. However, in [22], no offset regulation is proposed, which would increase the consumed power and decrease the operation speed. 4. 5. 7 Conclusions The current paper presented a new and effective methodology design for FDDCs, including kickback noise immunity and offset self-calibration. In the proposed design, the kickback noise is almost null during the decision phase and less than 40 mV during the reset phase. Moreover, the proposed FDDC achieves an effective digital offset self-calibration, in which the offset voltage is reduced more than 18 times. The proposed circuit is designed with minimalist building blocks and consumes no more than 213 µW at a 1.25 GHz comparison rate. It achieves high performance compared with the current state-of-the-art achievements in terms of offset calibration, noise cancellation, operation speed, power consumption, and design simplicity. Moreover, the proposed design methodology is generic and independent of the technology used. 6. 7. 8. 8 Acknowledgments 9. The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number (IF-PSAU-2021/01/5237). 10. 9 References 1. M. I. Dewan and D. H. Kim, “NP-Separate: A New VLSI Design Methodology for Area, Power, and Performance Optimization,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 12, pp. 5111–5122, Dec. 2020, https://doi.org/10.1109/TCAD.2020.2966551. 11. 100 J. Atkinson, A. Bailey, and A. Tajalli, “Systematic Design of Loop Circuit Topologies Using C/IDS Methodology,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 30, no. 10, pp. 1538–1542, Oct. 2022, https://doi.org/10.1109/TVLSI.2022.3181969. L. Khanfir and J. Mouïne, “Design optimisation procedure for digital mismatch compensation in latch comparators,” IET Circuits, Devices & Systems, vol. 12, no. 6, pp. 726–734, 2018, https://doi.org/10.1049/iet-cds.2018.5153. S. Oh et al., “An 85 dB DR 4 MHz BW Pipelined Noise-Shaping SAR ADC With 1–2 MASH Structure,” IEEE Journal of Solid-State Circuits, vol. 56, no. 11, pp. 3424–3433, Nov. 2021, https://doi.org/10.1109/JSSC.2021.3086853. B. Yazdani and S. Jafarabadi Ashtiani, “A Low Power Fully Differential Level-Crossing ADC With Low Power Charge Redistribution Input for Biomedical Applications,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 3, pp. 864–868, Mar. 2022, https://doi.org/10.1109/TCSII.2021.3127279. C.-C. Lu and D.-K. Huang, “A 10-Bits 50-MS/s SAR ADC Based on Area-Efficient and Low-Energy Switching Scheme,” IEEE Access, vol. 8, pp. 28257– 28266, 2020, https://doi.org/10.1109/ACCESS.2020.2971665. Y. Song, Y. Zhu, C.-H. Chan, and R. P. Martins, “A 40MHz Bandwidth 75-dB SNDR Partial-Interleaving SAR-Assisted Noise-Shaping Pipeline ADC,” IEEE Journal of Solid-State Circuits, vol. 56, no. 6, pp. 1772–1783, Jun. 2021, https://doi.org/10.1109/JSSC.2020.3033931. L. Sumanen, M. Waltari, V. Hakkarainen, and K. Halonen, “CMOS dynamic comparators for pipeline A/D converters,” in 2002 IEEE International Symposium on Circuits and Systems. Proceedings, May 2002, vol. 5, p. V–V. https://doi.org/10.1109/ISCAS.2002.1010664. Y. Liu, S. Fang, and Y. Wang, “A Novel Time-Multiplexed Fully Differential Interface ASIC With Strong Nonlinear Suppression for MEMS Accelerometers,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–13, 2022, https://doi.org/10.1109/TIM.2022.3207795. K.-J. Moon, D.-R. Oh, M. Choi, and S.-T. Ryu, “A 28nm CMOS 12-Bit 250-MS/s Voltage-Current-Time Domain 3-Stage Pipelined ADC,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 12, pp. 2843–2847, Dec. 2020, https://doi.org/10.1109/TCSII.2020.2990910. BA. T. Ramkaj, M. J. M. Pelgrom, M. S. J. Steyaert, and F. Tavernier, “A 28 nm CMOS Triple-Latch Feed-Forward Dynamic Comparator With <27 ps / 1 V and <70 ps / 0.6 V Delay at 5 mV-Sensitivity,” L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 11, pp. 4404–4414, Nov. 2022, https://doi.org/10.1109/TCSI.2022.3199438. P. M. Figueiredo and J. C. Vital, “Kickback noise reduction techniques for CMOS latched comparators,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 53, no. 7, pp. 541–545, Jul. 2006, https://doi.org/10.1109/TCSII.2006.875308. L. Khanfir and J. Mouine, “Clock delay-based design for hysteresis programming and noise reduction in dynamic comparators,” Analog Integr Circ Sig Process, vol. 106, no. 2, pp. 409–419, Feb. 2021, https://doi.org/10.1007/s10470-020-01656-3. R. K. Siddharth, Y. Jaya Satyanarayana, Y. B. Nithin Kumar, M. H. Vasantha, and E. Bonizzoni, “A 1-V, 3-GHz Strong-Arm Latch Voltage Comparator for High Speed Applications,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 12, pp. 2918–2922, Dec. 2020, https://doi.org/10.1109/TCSII.2020.2993064. L. Sumanen, M. Waltari, and K. Halonen, “A mismatch insensitive CMOS dynamic comparator for pipeline A/D converters,” in ICECS 2000. 7th IEEE International Conference on Electronics, Circuits and Systems, Dec. 2000, vol. 1, pp. 32–35 vol.1. https://doi.org/10.1109/ICECS.2000.911478. V. Katyal, R. L. Geiger, and D. J. Chen, “A New High Precision Low Offset Dynamic Comparator for High Resolution High Speed ADCs,” in APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems, Dec. 2006, pp. 5–8. https://doi.org/10.1109/APCCAS.2006.342249. P. P. Gandhi and N. M. Devashrayee, “A novel low offset low power CMOS dynamic comparator,” Analog Integr Circ Sig Process, vol. 96, no. 1, pp. 147–158, Jul. 2018, https://doi.org/10.1007/s10470-018-1166-9. K. Ohhata et al., “A 900-MHz, 3.5-mW, 8-bit Pipelined Subranging ADC Combining Flash ADC and TDC,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 9, pp. 1777–1787, Sep. 2018, https://doi.org/10.1109/TVLSI.2018.2827943. J. Yuan and C. Svensson, “High-speed CMOS circuit technique,” IEEE Journal of Solid-State Circuits, vol. 24, no. 1, pp. 62–70, Feb. 1989, https://doi.org/10.1109/4.16303. P. Nuzzo, F. D. Bernardinis, P. Terreni, and G. V. der Plas, “Noise Analysis of Regenerative Comparators for Reconfigurable ADC Architectures,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 55, no. 6, pp. 1441–1454, Jul. 2008, https://doi.org/10.1109/TCSI.2008.917991. L. Khanfir and J. Mouïne, “Systematic Hysteresis Analysis for Dynamic Comparators,” Journal of 22. 23. 24. 25. Circuits, Systems, and Computers, vol. 28, no. 06, p. 1950100, Jun. 2019, https://doi.org/10.1142/S0218126619501007. V. Milovanović and H. Zimmermann, “A two-differential-input/differential-output fully complementary self-biased open-loop analog voltage comparator in 40 nm LP CMOS,” in 2014 29th International Conference on Microelectronics Proceedings - MIEL 2014, May 2014, pp. 355–358. https://doi.org/10.1109/MIEL.2014.6842163. M. Hassanpourghadi, M. Zamani, and M. Sharifkhani, “A low-power low-offset dynamic comparator for analog to digital converters,” Microelectronics Journal, vol. 45, no. 2, pp. 256–262, Feb. 2014, https://doi.org/10.1016/j.mejo.2013.11.012. S. Naghavi et al., “A 500 MHz low offset fully differential latched comparator,” Analog Integr Circ Sig Process, vol. 92, no. 2, pp. 233–245, Aug. 2017, https://doi.org/10.1007/s10470-017-0998-z. P. P. Gandhi and N. M. Devashrayee, “A novel low offsetlow power CMOS dynamic comparator,” Analog Integr Circ Sig Process, vol. 96, no. 1, pp. 147-158, Jul. 2018, https://doi.org/10.1007/s10470-018-1166-9. Copyright © 2023 by the Authors. This is an open access article distributed under the Creative Commons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Arrived: 05. 04. 2023 Accepted: 23. 08. 2023 101 L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102 102 Original scientific paper https://doi.org/10.33180/InfMIDEM2023.205 Journal of Microelectronics, Electronic Components and Materials Vol. 53, No. 2(2023), 103 – 117 Towards Smaller Single-point Failure-resilient Analog Circuits by Use of a Genetic Algorithm Žiga Rojec Department EDA, Faculty of Electrical Engineering, University of Ljubljana, Slovenia Abstract: Failure-resilient analog circuits are difficult to design, but artificial intelligence can help crawl the topology solution space. Using evolutionary computation-based topology synthesis we evolve analog arcus tangent computational circuits, resilient to any rectifying diode or resistor high-impedance single failure or removal. We encode analog circuit topologies as individuals with an upper-triangular incident matrix. Circuits are evolved using a combined technique utilizing parts of NSGA-II and PSADE, based on a special three-dimensional robustness function. We show that topology size for a failure-resilient circuit can be classes smaller than hand-made component-redundancy-based solutions. Our best failure-resilient topology comprises six diodes, three resistors, and a voltage offset source. Keywords: analog circuits, analog circuit synthesis, circuit optimization, failure-resilience, circuit robustness Manjšanje analognih vezij odpornih na odpoved poljubne komponente z uporabo genetskega algoritma Izvleček: Analogna vezja, ki so odporna na napake, je težko načrtovati. Pri prečesavanju prostora možnih topologij lahko pomaga umetna inteligenca. Z sintezo topologij, temelječi na evolucijskem algoritmu, smo razvili analogno računsko vezje za inverzni tangens, ki je odporno na visokoimpedančno okvaro posamezne komponente (diode ali upora) ali njene odstranitve. Topologija analognega vezja je v algoritmu zapisana v obliki zgornje-trikotne vpadne matrike. Vezja razvijemo z uporabo kombinirane metode z uporabo večkriterijskega optimizacijskega algoritma NSGA-II in PSADE, kjer je za usmerjanje sinteze razvita posebna tri-kriterijska funkcija robustnosti. V članku prikazujemo kako zmanjšati velikost topologije, odporne na odpoved komponente, na razrede manjšo velikost od ročno izdelanih robustnih topologij, ki temeljijo na redundanci posameznih komponent. Naš najboljši rezultat je analogno računsko vezje za inverzni tangens, ki je sestavljeno iz šestih diod, treh uporov in odmičnega napetostnega vira. Ključne besede: analogna vezja, sinteza analognih vezij, optimizacija vezij, odpornost na napake, robustnost vezij * Corresponding Author’s e-mail: ziga.rojec@fe.uni-lj.si 1 Introduction However, customer requirements might get even harder. When a device is targeted for use in harsh conditions (i. e., space exploration, aeronautical missions, automotive, robotics), we expect the product to be robust against extreme temperature swings, high ionizing and electromagnetic radiation levels, high working currents, and more. That kind of stress can lead to component faults and premature device failure. Furthermore, failed components in remote and unmanned missions could not be replaced easily. Design of an analog circuit is a challenging task, especially when the product has to meet high standards and fulfill tough requirements. Designers often use various simulation tools to predict temperature, humidity, and electromagnetic behavior during circuit operation. Furthermore, to predict the blueprint manufacturability and maximize the production yield, they also use statistical methods, such as Monte Carlo analysis [1]. How to cite: Ž. Rojec, “Towards Smaller Single-point Failure-resilient Analog Circuits by Use of a Genetic Algorithm", Inf. Midem-J. Microelectron. Electron. Compon. Mater., Vol. 53, No. 2(2023), pp. 103–117 103 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 Researchers have already focused on hardening electronic devices against failures per se [2]. The classical ways of doing that include component redundancy, overdesign, shielding and insulation, thermal management, and so on. Most of the time such solutions significantly increase the size, weight, and finally, the cost of the device. The upper methods usually aim to protect every circuit component as if it was the main breaking point of the system. man expert in the industry, AI might help in the rapid exploring of undiscovered topology space, thereby helping and speeding up the design process. Researchers have already proposed systems resilient to failures that occur in vivo. Meaning, the circuit has the ability to persist functional when one or more components fails during the circuit operation [3]–[6]. Such systems usually utilize duplicated circuit modules to form redundant sub-systems which are controlled by various voting mechanisms [3], [7]. However, the demultiplexer then becomes the weak part of the system. 1.1.1 Synthesis method Analog topology synthesis is an extremely non-linear and complex task, which is why most existing approaches in this field search topology with a method, based on the Darwinian selection of the fittest, i.e. evolutionary or genetic algorithm. Reviews of existing analog circuit synthesis techniques can be found in existing literature [21], [22]. However, we give a brief overview of existing topology synthesis efforts for extremely robust and failure-resilient analog circuits below. Somehow special are the works of Zebulum and Keymeulen, et. Al., who presented an evolutionary algorithm that is being run on the controlling unit of the circuit under failure, in vivo [4], [12]. This paper shows an alternative method of evolving failure-resilient analog circuits. Using an intensive evolutionary search, we can find novel analog circuit topologies that exhibit robustness to any electronic component (semiconductor diode or resistor) highimpedance failure or removal, without a dedicated active demultiplexing system. Evolutionary methods demonstrate a capacity to tackle unconventional challenges. One compelling reason that supports the continued relevance of evolutionary computation, even when compared to neural networks like GNNs, is that they do not always require prior training to align with the defined cost function. We show in this work, that by using an evolutionary topology synthesis tool, we can greatly reduce the size and the number of needed components to achieve failure-resilience of an analog circuit, compared to canonical hand-made design. However, emerging tools rooted in GNNs, like CktGNN, showcase impressive capabilities in generating robust circuit topologies [23]. 1.1.2 Synthesis goals and degrees of robustness Passive filters are usually the entry point for showing the performance of analog circuit synthesis tools. Most of the works on failure-resilience also experimented with the synthesis of robust passive analog filter circuits, dealing with various degrees of component faults. Resistor/capacitor/inductor removal was considered in [9], [15], while in addition [3], [7] also studied the complexity of partial and full short-circuit and high-impedance faults. Studies [24]–[27] only considered R/L/C parameter perturbation without full component failure. To the best of our knowledge, this is one of the few published works on the automated synthesis of a priori robust, failure-resilient nonlinear computational analog circuits [3], [4], [8]–[15], and also one of the first attempts of redundancy reduction by using evolutionary search. The paper is organized as follows. We summarize previous work on robust topology synthesis in Section 1.1 and describe our motivation in 1.2. We describe the applied topology synthesis technique in Section 2. Results are given in Section 3, summarized in 3.8 and concluded in Section 4. Other authors reported syntheses of compensator circuit [8] and inverter, amplifier, and oscillator [13] resilient to bipolar transistor removal, PID controller with R/L/C removal resilience [10], transistor-fault resilient amplifier [11], half-wave rectifier, NOR gate, and voltage-controlled oscillator for extreme temperature swings (in situ evolution) [12] 1.1 Previous work The discovery of novel circuit topologies has been done by hand for over a century. This is changing with the availability of novel tools, relying on artificial intelligence [16]. Since the beginning of this research area [17]–[19], computer-aided circuit synthesis has become human-competitive and trustworthy for fabrication [16], [20]. We believe, rather than replacing a hu104 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 - XNOR gate, analog multiplier, and inverter resilient to arbitrary faults in the controlling unit FPTA (Field Programmable Transistor Array) [4] the natural logarithm and square-root analog computational circuits resilient to semiconductor diode short-circuit or high-impedance malfunction [28] redundancy on a single-component level. In the case of an arctan circuit, every diode has to be paired in parallel and every resistor has to be (at least) tripled in parallel. Two diodes in parallel give a sub-circuit where, theoretically, any of the two diodes might enter highimpedance failure without transfer function transformation. Single resistor with resistance Rn has to be replaced with three parallel resistances 3 Rn to maintain 33% relative error of sub-circuit in case of one resistor entering high-impedance failure. 1.2 Motivation 1.2.1 Failure-resilience For this work, let us define failure-resilience as an analog circuit topology property, where any of the basic components (diode or resistor) can be removed or replaced with high-impedance failure, with the circuit showing minimal-to-zero deformation of nominal signal processing abilities. The voltage source and the 10 k W inputpullup resistor are excluded from the definition. Figure 4 shows a hand-designed topology that fulfills the failure-resilience criteria. Fair nominal response and narrow error range in failure cases are presented in Figure 5 and Figure 6. Evidently, the circuit topology hence the number of needed components goes offscale. While the nominal non-robust topology includes 10 resistors and 5 diodes (excluding the input resistor, see 1.2.1), the hand-made robust version comprises 30 resistors and 10 diodes. In CMOS technology, for example, resistors occupy large chip areas [30]. In addition, those resistances are multipliers of the nominal values, which further multiplies the needed area for fabrication. The circuit total cost would be above comparison to the nominal non-robust version. The methodology incorporates various failure scenarios using specialized “failure-defining” Spice models, as demonstrated in our prior work [28], where we successfully synthesized analog circuits resilient to both high-impedance and short-impedance failures in semiconductor diodes. In this paper, we primarily concentrate on minimizing topologies that are fully resilient to high-impedance failures. However, due to high computational costs, we do not address short-circuit failures for all component types in this paper; this topic is left for future research. However, novel studies of analog topology synthesis imply, that number of needed components for failureresilience might somehow be lower than expected in hand-made designs [3], [7]. The possible reason for that phenomenon is that open-ended topology synthesis allows component-level redundancy to be replaced with system-level redundancy. 1.2.2 Size of failure-resilient circuits Failure-robustness comes with a cost. It is generally paid by (often significantly) higher total number of needed components for the same nominal task as a non-robust circuit would perform. For a system to survive such rigorous change, as one or any component removal/failure, redundant elements and connections must be available in the system. 1.2.3 Topology size as a synthesis constraint In this study, we explored the lower limits of topology size for a failure-resilient computational analog circuit. We show, that for the arcus-tangent circuit, the topology could be reduced from 40 critical components in hand-made design down to 8 components by evolutionary-based synthesis. This also has fewer components than used hand-made non-robust design (15). Let us consider an example of a non-linear, computational analog circuit from Figure 1. The circuit outputs an inverse tangent of input voltage signal between 0 and 10 V. It is a hand-designed linear voltage divider, with diodes used to switch between five linear segments, which closely interpolate the mathematical function [29]. Due to its simplicity, the topology is often used instead of the amplifier-chain summing circuit. If any of the components in the dotted square (except for the voltage source) fails (or is removed), the circuit’s transfer function severely changes as seen in Figure 2 with absolute error range plot and Figure 3 with relative error plots. The most common and straightforward approach to achieving failure-resilience property is to introduce Figure 1: Canonical hand-designed piece-wise linear arctan computational circuit topology. 105 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 Our study provides step-by-step size-reducing results for further investigation and a better understanding of underlying mechanisms. Primary contribution of this paper lies in the demonstration of a novel application of evolutionary methods, resulting in the attainment of system robustness that has not been observed in any existing systems or circuits within the literature. Figure 5: Hand-designed failure-resilient arctan circuit: nominal response (black) covers the arctan function. The range of various failure responses is given in blue. Figure 2: Hand-designed non-robust arctan circuit: nominal response (black) completely covers the arctan function. The range of various failure responses is given in blue. Figure 6: Hand-designed failure-resilient arctan circuit: relative error curves of nominal (solid) and component failures (dotted and dashed). 2 Methods In this section, we provide details of the methods used in this circuit synthesis. The applied approach is mostly based on [28]. Figure 3: Relative error curves of nominal (solid) and component failures (dotted and dashed). 2.1 Analog Circuit Representation Upper-triangular incident matrix is a well-proven method of encoding an analog circuit topology [22], [28], [31]. It is based on a fixed set of available component terminals. Each building block can comprise one or more input/output terminals (see Figure 7). Usually, the building-block terminals are located on the left side of the fixed set, and outer connections are located on the right-side of the set. The set is then mirrored in two dimensions, forming a connection matrix, where the logical one represents an existing zero-impedance Figure 4: Hand-designed piece-wise linear arctan computational circuit, robust to any single component high-impedance failure or removal. 106 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 ! ! ! ! ! ! C1 ! ! ! ! Inner-connections Vin ! R2 R1 Forbidden sector without connections ! offspring 1 R2 ! N=1 offspring 2 ! ! Outer-connections R1 C1 R2 R1 8 shows two examples of newly-created offspring with one terminal (N=1) and three terminal (N=3) information being exchanged. Note that in the applied algorithm, the number of exchanged terminal connections N is a randomly-chosen number from the set {1,2,3}. parents connection between the terminals on both axes. The matrix is filled with logical ones on a diagonal so that by definition, every terminal is connected to itself. Only the upper matrix triangle is used to exclude half of the redundant mirror connections from the bottom triangle, to reduce the effective matrix size, without sacrificing any topology search space [31], [32]. Additionally, in the inner-connections sector of the matrix, we allow every possible connection, while in the outer-connection section only one positive logical value is allowed per line, filtering-out any connections between outer terminals. N=3 Figure 8: Topology crossover examples. For better illustration, parent no. 2 is a full upper-triangular matrix [31]. Vout The value vector is being optimized using two different methods. The first one is a reproduction mechanism, inspired by a well-known intermediate crossover [33]. The choice between topology-matrix or value-vector crossover is initiated by the evolutionary algorithm. In one case offspring will inherit a modified topology and in another a modified parameter. C1 Figure 7: An example of an upper-triangular matrix, representing a simple T-shaped analog circuit topology [31]. Another parameter tuning technique in this work is an established PSADE (Parallel Simulated Annealing and Differential Evolution) [34]. Due to its computational expensiveness (yet effectiveness), it is triggered only every 10th generation on one to three best individuals. Components with adjustable parameters (i.e., resistances, capacitances, transistor widths and lengths, etc.) have their values organized in a separate array, called value vector. While the topology matrix is purely binary, the value vector is a numeric entity. 2.3 Fitness function 2.2 Genetic Reproduction and Sizing The fitness function should encompass the desired properties of the circuit. Additionally, it should filter out individuals with unwanted properties and help to guide the searching algorithm through the valley of local minima. We will briefly review the applied fitness function below, but the full justification of chosen criteria is given in [28]. For evolutionary computation and mimicking natural genetic reproduction, we use the topology-matrix crossover technique, described in [31]. Every terminal is connected to other terminals via the logical values that reside on a column and a row, intersecting the diagonal element, that represents the connection to itself. By exchanging the two lines of the matrix with another topology matrix, the information of the terminal connecting with the rest of the circuit is transferred. Figure In the case of open-ended topology synthesis, the fitness function definition is rather complex and com107 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 prises several stages. The first is an evaluation of the circuit’s transfer function, i.e. signal processing quality, using a DC analysis in Spice simulator. In the case of arctan circuit design (let us denote the mathematical function as g) we calculate the root mean square error (RMSE) between Vout (Vin) and g(Vin). We call the result fitness and denote it as f. harden the circuit against the D0 removal or high-impedance failure. Let us have four additional diodes to fulfill that requirement (one would be enough, but we assume the search algorithm does not know that). The search algorithm can encounter a topology with four diodes with no effect on the nominal transfer function (example in Figure 9 (right)). Still, if D0 fails, the circuit does, too. However, if any of D1-4 fails, the circuit still delivers the transfer function. It appears as only 20% of critical components (diodes) cause a fatal scenario for the circuit. The latter circuit might get promoted because of its better “robustness” value. Obviously, this is not the case, because D1-4 are not electrically connected and do not play any role in signal processing. That kind of circuit has to be ranked out since it does not contribute to real circuit robustness. Calculation of failure-resilient circuit fitness needs to be carried out for every predicted failure scenario. In our work, failure-resilience is defined as the high impedance failure of any resistor or semiconductor diode (see 1.2.1). In the case of 30 resistors and 10 diodes, the total number of RMSE calculations must be 41 – that is one for nominal (no failure) scenario fnom, and 40 for every critical device failed, multiplied by the number of failure types considered (only one failure type in this case). Vector f comprises all RMSE results: f   f nom , f1,1 , ..., f1, F , ..., f N , F  (1) where N is the total number of critical components and F is the total number of failure types [28]. Failure-resilient circuit evaluation is carried out in multiple dimensions, and forms a three-dimension robustness vector r: Figure 9: False-robustness problem [28]. Inclusiveness [28] successfully unfolds the false-robustness problem. Using modified diode models and SPICE simulator commands we determine which of the components are electrically connected (included) and have an effect on signal processing. Inclusiveness (denoted by I) is calculated as a ratio between the number of all critical and included components. Having an updated robustness definition:  f nom    r   f max  (2) f    where fnom is RMSE result of no-failure, nominal circuit topology, fmax is the maximum of vector f and αf is the standard deviation of the same vector [28]. Vector r gives insight into a single failure-resilient candidate nominal performance performance in case of worse single-point failure and statistical failure scattering.  f nom    r   f max  I f    (3) circuits with greater inclusiveness are promoted over the circuits with floating or flawed connected components. However, this can lead the synthesis to build larger circuits with excessive redundancy, so component number limits must be set elsewhere in the algorithm. In our case, the top number of available devices is set in the pre-defined component set, which also defines the topology-matrix size. Note that only the inclusiveness of diodes was considered in our work. This separation gives a chance to the NSGA-II algorithm to non-dominantly sort the individuals into Paretofronts and by that maintain the genetic diversity, thus avoiding premature convergence. In the specific case of a failure resilient circuit synthesis, a practitioner might encounter a false-robustness phenomenon, which we explain below. Let us consider an example of a simple diode half-wave rectifier (Figure 9, left). If D0 fails or is removed, the rectifier is no longer working, and statistically, one critical component (diode) makes a 100% chance of circuit failure. Imagine a topology modification, that would 2.4 Synthesis algorithm The search and sorting algorithm utilize major ideas from NSGA-II [35]. 108 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 The evolutionary algorithm is initiated by a randomly generated population. Then every individual is evaluated according to the fitness/robustness from Section 0. Sorting is performed in three steps, following NSGA-II. In the first step, individuals that do not dominate each other (are not beaten in any combination of objectives) are assigned to a front (i.e. Pareto front). The remaining individuals are put in a second, third, etc., front, with the same non-dominance criteria. A new generation assembly is the second step. We aggregate the new generation starting with individuals from the 1st front, and continue with available individuals from further fronts. Because a union of parents and offspring is usually larger than available space in the new generation, there is a front of individuals, that does not fit as a whole to the new generation. A selection between non-dominated individuals needs to be undertaken. This is done in the third step, the crowding distance calculation. The crowding distance is the distance between two neighboring points (i.e. individuals) along each of the objective axes. Ranking individuals with higher crowding distance helps to a more even distribution in a front of individuals. We repeat the synthesis algorithm until at least one of the stopping criteria (i.e., design requirements, max. number of generations, timeout.) is met. When ten generations have passed, we run a PSADE [34] parameter optimization on three of the best circuits from the population and thus fine-tune the ambitious individuals. Figure 10 summarizes the main synthesis algorithm steps. 2.5 Finding minimal topology Our objective was to evolve circuits with consistent performance even if devices are removed. Initially, we aimed to incorporate as many “redundant” components as possible. However, circuit size doesn’t always reflect actual functional contributions, leading to “dummy” or electrically connected but non-functional components. To address this, we introduced “Inclusiveness” to prevent circuits dominated by dangling sub-circuits, enhancing evolutionary outcomes. Individuals with greater inclusiveness measure propagate more effectively. Our experimentation revealed a paradox when maximizing redundancy while minimizing circuit size simultaneously. Hence, we perform separate stages for minimizing and maximizing circuit schematics. We are listing two more reasons, why the size of circuit schematics is not another objective of NSGA-II search. After the assembly of the new generation, a parent selection process takes place. With the tournament, some randomly selected individuals are chosen from the generation. The selected individuals compete based on their front number (lower is better) and crowding distance (higher is better). Two tournaments take place to choose two future parents. Having selected two parents, their genetic material gets reproduced. This can be done by mating their genetic material as in 0 or by mutating it. Control over mating/mutation is a statistical probability, set at the beginning of the algorithm. Similarly, a probability parameter controls whether the topological or parametric part of the gene will be mated/mutated. Our topology representation method using an uppertriangular incident matrix limits arbitrary extensions during evolution runs. Varying matrix sizes in the evolutionary pool cause inconsistent crossovers and mating patterns. The third concern relates to the computational complexity of NSGA-II and evaluating circuits under different failure scenarios. A variable maximum component number during evolution would increase computational effort, impacting NSGA-II’s performance and circuit robustness evaluation. As a result, we chose not to experiment with variable component numbers to minimize computational burden. Initial population Evaluation Sorting (calculate fitness/robustness) (according to rank and crowding-distance) Tournament (parent selection) Reproduction (offspring creation) 3 Results Offspring evaluation END True Criteria met? False 10th generation? Our experiment comprised eight independent topology searches. For each synthesis we predefined the set of available components, that is Nd diodes and Nr resistors that are subject to possible high-impedance failure. Voff and a Rin input resistor (the latter was nonoptional) were also available with each synthesis but were excluded from failure consideration. False True PSADE parameter opt. on 3 of best individuals Figure 10: The applied evolutionary algorithm flowchart [31]. 109 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 The main part of the experiment was discovering the possibilities of finding topologies with fewer components than in hand-designed examples (e.g., from Figure 4), that perform arcus tangent analog calculation and exhibit the failure-resilience property (1.2.1). (roughly 15 hours). The outcome is presented in Figure 11. The final topology comprises all 12 available diodes. Some resistors were excluded from the final topology since they do not have any signal-processing effect (such as short-connected resistors, or resistors connected to simulator-helper nodes). The voltage source was also not included in the final design. We excluded some of the components already from topology schematics in Figure 11. The genetic algorithm parameters were fixed through the experiment and are summarized in Table 1. Table 1: Genetic algorithm properties. Parameter Population Tournament Mating prob. Topology reproduction prob. We summarize the circuit performance in three parameters: nominal topology RMSE is 0.312, the worst failure RMSE is 0.370 and the standard distribution of all cases (nominal and failures) is 0.026. One can visualize those results in Figure 12 and Figure 13. Value 1000 3 0.6 0.8 Resistance values were limited to the range between 10 and 100 kW, and voltage source with DC range of 0 to 6 V. Every synthesis was conducted on an i9 HP desktop, utilizing 16 computational threads on 8 processor cores. 3.1 Synthesis with a max of 12 diodes, 12 resistors With the ambition to cut the number of needed components for the circuit, we gave the first upper limit of Ndmax = 12 and Nrmax=12. This is already a significant cut of the total number of components (Nd + Nr) in comparison to hand designed example from Figure 4 which comprises 40 components. The algorithm can, however, synthesize a topology with fewer elements. Figure 12: Synthesized arctan computational circuit (Ndmax = 12, Nrmax = 12): nominal response (black), arctan function (red, dashed-dotted). The range of various failure responses is given in blue. Starting with a random population, without any prior knowledge available in the population itself, we let the combined NSGA-II algorithm run for 306 generations Figure 13: Synthesized arctan computational circuit (Ndmax = 12, Nrmax = 12): relative error curves of nominal (solid) and component failures (dotted and dashed). Figure 11: Synthesized arctan computational circuit (Ndmax = 12, Nrmax = 12), robust to any single component high-impedance failure or removal. Together with a voltage source, six available resistors were not used in the final circuit. That is why we con110 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 ducted our experiment with tighter device component limits. 3.2 Synthesis with a max of 10 diodes, 10 resistors The next synthesis was limited to Ndmax = 10 and Nrmax=10. We stopped the algorithm after 822 generations (that was after 33h). The outcome is presented in Figure 14. The final topology comprises all 10 available diodes. Two resistors were not included in the final topology. Figure 16: Synthesized arctan computational circuit (Ndmax = 10, Nrmax = 10): relative error curves of nominal (solid) and component failures (dotted and dashed). 3.3 Synthesis with a max of 8 diodes, 8 resistors We proceed with Ndmax = 8 and Nrmax= 8. We stopped the algorithm after 432 generations (11h). The outcome is presented in Figure 17. The final topology comprises 6 diodes and 6 resistors that can fail during the circuit operation. Two resistors and two diodes were not included in the final topology. Figure 14: Synthesized arctan computational circuit (Ndmax = 10, Nrmax = 10), robust to any single component high-impedance failure or removal. Circuit performance: nominal topology RMSE is 0.158, the worst failure RMSE is 0.270 and the standard distribution of all cases (nominal and failures) is 0.032. One can visualize failure ranges in Figure 15 and Figure 16. This circuit performs better than the one from the previous synthesis, according to the three observables. It also comprises 2 diodes less and four resistors more. Figure 17: Synthesized arctan computational circuit (Ndmax = 8, Nrmax = 8), robust to any single component high-impedance failure or removal. Circuit performance: nominal topology RMSE is 0.149, the worst failure RMSE is 0.152 and the standard distribution of all cases (nominal and failures) is 0.017. One can visualize failure ranges in Figure 18 and Figure 19. Figure 15: Synthesized arctan computational circuit (Ndmax = 10, Nrmax = 10): nominal response (black), arctan function (red, dashed-dotted). The range of various failure responses is given in blue. 111 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 Figure 18: Synthesized arctan computational circuit (Ndmax = 8, Nrmax = 8): nominal response (black), arctan function (red, dashed-dotted). The range of various failure responses is given in blue. Figure 20: Synthesized arctan computational circuit (Ndmax = 6, Nrmax = 6), robust to any single component high-impedance failure or removal. Figure 19: Synthesized arctan computational circuit (Ndmax = 8, Nrmax = 8): relative error curves of nominal (solid) and component failures (dotted and dashed). Because the algorithm kept solving the problem using less than the maximum of available components, we proceed and further tighten the Ndmax and Nrmax criteria. Figure 21: Synthesized arctan computational circuit (Ndmax = 6, Nrmax = 6): nominal response (black), arctan function (red, dashed-dotted). The range of various failure responses is given in blue. 3.4 Synthesis with a max of 6 diodes, 6 resistors 3.5 Synthesis with a max of 5 diodes, 5 resistors We stopped the Ndmax = 6 and Nrmax= 6 synthesis after 2340 generations (48 h). The Ndmax = 5 and Nrmax= 5 synthesis was stopped after 2582 generations (36 h). Figure 20 shows the outcome. The final topology uses all available diodes and three out of six available resistors. As shown in Figure 23, the final topology comprises all available components. Circuit performance: nominal topology RMSE is 0.106, the worst failure RMSE is 0.110 and the standard distribution of all cases (nominal and failures) is 0.008. One can visualize failure ranges in Figure 21 and Figure 22. Although the synthesis comprises only ten critical components (plus voltage source and input resistor), the performance was not yet diminished. The nominal 112 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 Figure 22: Synthesized arctan computational circuit (Ndmax = 6, Nrmax = 6): relative error curves of nominal (solid) and component failures (dotted and dashed). Figure 25: Synthesized arctan computational circuit (Ndmax = 5, Nrmax = 5): relative error curves of nominal (solid) and component failures (dotted and dashed). 3.6 Synthesis with a max of 4 diodes, 4 resistors Searching for the bottom limit, we conducted the Ndmax = 4 and Nrmax= 4 synthesis. We finished it after 1077 generations and 12h. The final topology comprised 4 resistors and 4 diodes (Figure 26). Figure 23: Synthesized arctan computational circuit (Ndmax = 5, Nrmax = 5), robust to any single component high-impedance failure or removal. Figure 26: Synthesized arctan computational circuit (Ndmax = 4, Nrmax = 4), robust to any single component high-impedance failure or removal. The nominal topology RMSE is 0.173, the worst failure RMSE is 0.217 and the standard distribution of all cases is 0.028. See failure ranges in Figure 27 and Figure 28. We have discovered, that this synthesis is a probable bottom limit in our experiment. To illustrate, how a smaller design poorly fits the requirement, we show one more synthesis. Figure 24: Synthesized arctan computational circuit (Ndmax = 5, Nrmax = 5): nominal response (black), arctan function (red, dashed-dotted). The range of various failure responses is given in blue. topology RMSE is 0.108, the worst failure RMSE is 0.165 and the standard distribution of all cases is 0.022. See failure ranges in Figure 24 and Figure 25. 113 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 3.7 Synthesis with a max of 3 diodes, 3 resistors Using limits Ndmax = 3 and Nrmax= 3 synthesis, we finished the search after 3188 generations (11h). See Figure 29 for the topology. The nominal topology RMSE is 0.497, the worst failure RMSE is 0.507 and the standard distribution is 0.010. Failure ranges are shown in Figure 30 and Figure 31. We can observe a two-piece approximation of the arctan function, which yields high RMSE. Figure 27: Synthesized arctan computational circuit (Ndmax = 4, Nrmax = 4): nominal response (black), arctan function (red, dashed-dotted). The range of various failure responses is given in blue. Figure 30: Synthesized arctan computational circuit (Ndmax = 3, Nrmax = 3): nominal response (black), arctan function (red, dashed-dotted). The range of various failure responses is given in blue. Figure 28: Synthesized arctan computational circuit (Ndmax = 4, Nrmax = 4): relative error curves of nominal (solid) and component failures (dotted and dashed). Figure 31: Synthesized arctan computational circuit (Ndmax = 3, Nrmax = 3): relative error curves of nominal (solid) and component failures (dotted and dashed). Figure 29: Synthesized arctan computational circuit (Ndmax = 3, Nrmax = 3), robust to any single component high-impedance failure or removal. 3.8 Result Summary Table 1 summarizes the experiment results. Surprisingly, tightening the number of available diodes and resistors has led to improved circuit performance in 114 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 both nominal functionality and robustness, with its best at Nd=6, Nr=3. Although initial syntheses involved searches over Ndmax > 6, Nrmax > 3 topology space, the Nd = 6, Nr = 3 best solution was not discovered in these. thesis tool, we introduced novel topologies of analog arcus tangent circuit. The most compact one comprises six diodes, three resistors, a voltage source, and an input resistor. Each of the diodes and the three resistors can fail or be removed, with almost no computational error. Table 2: Results of a conducted experiment. Every row is an independent topology synthesis with different num. of component limits. The first row is the handmade robust design. Ndmax N/A 12 10 8 6 5 4 3 Nrmax N/A 12 10 8 6 5 4 3 Nd 10 12 10 6 6 5 4 2 Nr 30 6 8 6 3 5 4 3 fnom 0.116 0.312 0.158 0.149 0.106 0.108 0.173 0.497 fmax 0.262 0.370 0.270 0.152 0.110 0.165 0.217 0.507 Based on this research, we can conclude that the integration of system redundancy for single-point failures was achieved by imposing a strict limitation on the maximum size of available components. We showed, that to achieve such resilience, surprisingly low number of electrical components is needed. sf 0.047 0.026 0.032 0.017 0.008 0.022 0.028 0.010 In the realm of CMOS design, reducing the number of components doesn’t necessarily translate to cost savings on its own. However, we conducted a brief analysis of the total resistance for both robust circuits, encompassing both hand-crafted and synthesized designs. Total resistance can provide a rough estimate of circuit area in certain CMOS processes. For instance, the total resistance of a hand-designed circuit (as shown in Fig. 4) amounts to approximately 219 kΩ, whereas the resistance of the best synthesized circuit totals around 20 kΩ (a difference of a decade). There might be several reasons for that phenomenon. The first, most obvious one, is an enormous search space for topology search. Within one synthesis run, we cannot sample every possible circuit, but rather crawl the space using the evolutionary search. This is why two evolutionary syntheses with the same goal but different initial settings might not produce the same outcome. Furthermore, reducing the number of components can have a direct impact on cost savings in the realm of discrete electronics, such as PCBs. In the domain of discrete resistors, the resistance value itself does not significantly affect the cost of the device, assuming factors like manufacturer, package, power rating, and tolerance remain the same. With this in mind, the minimization of robust topologies emerges as a pivotal factor in achieving cost-effective and highly reliable circuits. The second reason is more related specifically to the robustness definition in our experiment. As noted, our problem definition does not reward circuits with fewer components, but rather the opposite. Inclusiveness (see 2.3) rewards circuits that electrically include all available components to push means of redundancy into the circuit and avoid false robustness. During the synthesis, while the objectives might already be met with requirements, the inclusiveness criteria might draw the search toward more included components, which makes the search too wide and lasting long. We conclude, that with such-defined search problem, the hard limits on the topology size and the number of available components are key to an efficient small-size failure-resilient topology search. In comparison to previous experiments, this study considers not only diodes, but also resistors to be a possible point of failure. We experimented with evolutionary search for circuits that are robust to both, short-circuit and open-circuit failures in all possible failure points (components), including some experiments including transistors. However, we acknowledge that further investigation and modified approaches are required to address this specific problem effectively. We believe our work will inspire further practitioners in the field of analog circuit topology synthesis. 4 Conclusions Using the topology synthesis tools, we can find topologies, that exhibit novel properties, such as failure tolerance. We showed that failure-resilience in analog circuits can be achieved with smaller-than-expected topologies, by introducing system-level redundancy instead of much more expensive component-level redundancy. Using an evolutionary-based topology syn- 5 Supplementary material The source code of the synthesis tool is available online at https://github.com/zigarojec/MatrixCircEvolutions. 115 Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 6 Acknowledgments I would like to thank my colleagues from the EDA department of the Faculty of Electrical Engineering, the University of Ljubljana for all the support in my work. 9. 7 Conflict of Interest We can declare no conflict of interest in this work. 10. 8 References 1. 2. 3. 4. 5. 6. 7. 8. Á. Bűrmen in H. Habal, „Computing Worst-Case Performance and Yield of Analog Integrated Circuits by Means of Mesh Adaptive Direct Search“, Inf. MIDEM, let. 45, str. 160–170, jun. 2015. Y. Deval, H. Lapuyade, in F. Rivet, „Design of CMOS integrated circuits for radiation hardening and its application to space electronics“, v 2019 Ieee 13th International Conference on Asic (asicon), F. Ye in T. A. Tang, Ur., New York: Ieee, 2019. Pridobljeno: 10. avgust 2022. [Na spletu]. Dostopno na: https:// www.webofscience.com/wos/woscc/full-record/ WOS:000541465700105 M. Liu in J. He, „An Evolutionary Negative-Correlation Framework for Robust Analog-Circuit Design Under Uncertain Faults“, Ieee Trans. Evol. Comput., let. 17, št. 5, str. 640–665, okt. 2013, https://doi.org/10.1109/TEVC.2012.2228208. D. Keymeulen, R. S. Zebulum, Y. Jin, in A. Stoica, „Fault-tolerant evolvable hardware using fieldprogrammable transistor arrays“, Ieee Trans. Reliab., let. 49, št. 3, str. 305–316, sep. 2000, https://doi.org/10.1109/24.914547. M. Xue in J. He, „Evolutionary topology programming for analog circuit fault tolerant design“, v 2013 25th Chinese Control and Decision Conference (CCDC), maj 2013, str. 3391–3396. https://doi.org/10.1109/CCDC.2013.6561534. S. Askari, M. Nourani, in A. Namazi, „Fault-tolerant A/D converter using analogue voting“, IET Circuits Devices Amp Syst., let. 5, št. 6, str. 462–470, nov. 2011, https://doi.org/10.1049/iet-cds.2011.0042. K.-J. Kim, A. Wong, in H. Lipson, „Automated synthesis of resilient and tamper-evident analog circuits without a single point of failure“, Genet. Program. Evolvable Mach., let. 11, št. 1, str. 35–59, mar. 2010, https://doi.org/10.1007/s10710-009-9085-2. R. S. Zebulum, M. Vellasco, M. A. Pacheco, in H. T. Sinohara, „Evolvable hardware: On the automatic synthesis of analog control systems“, v 11. 12. 13. 14. 15. 16. 116 2000 IEEE Aerospace Conference. Proceedings (Cat. No.00TH8484), mar. 2000, str. 451–463 let.5. https://doi.org/10.1109/AERO.2000.878521. K.-J. Kim in S.-B. Cho, „Combining Multiple Evolved Analog Circuits for Robust Evolvable Hardware“, v Intelligent Data Engineering and Automated Learning - IDEAL 2009, E. Corchado in H. Yin, Ur., v Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2009, str. 359–367. https://doi.org/10.1007/978-3-642-04394-9_44. G. A. Hollinger in D. A. Gwaltney, „Evolutionary design of fault-tolerant analog control for a piezoelectric pipe-crawling robot“, v Proceedings of the 8th annual conference on Genetic and evolutionary computation, v GECCO ’06. New York, NY, USA: Association for Computing Machinery, jul. 2006, str. 761–768. https://doi.org/10.1145/1143997.1144133. Q. Ji, Y. Wang, M. Xie, in J. Cui, „Research on FaultTolerance of Analog Circuits Based on Evolvable Hardware“, v Evolvable Systems: From Biology to Hardware, L. Kang, Y. Liu, in S. Zeng, Ur., v Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2007, str. 100–108. https://doi.org/10.1007/978-3-540-74626-3_10. R. S. Zebulum, A. Stoica, D. Keymeulen, L. Sekanina, R. Ramesham, in X. Guo, „Evolvable hardware system at extreme low temperatures“, v Evolvable Systems: From Biology to Hardware, J. M. Moreno, J. Madrenas, in J. Cosp, Ur., Berlin: Springer-Verlag Berlin, 2005, str. 37–45. Pridobljeno: 17. avgust 2021. [Na spletu]. Dostopno na: https://www.webofscience.com/wos/ woscc/summary/3e9863eb-c395-495e-94b03f857c12151a-048a98b8/date-descending/1 P. Layzell in A. Thompson, „Understanding Inherent Qualities of Evolved Circuits: Evolutionary History as a Predictor of Fault Tolerance“, v Evolvable Systems: From Biology to Hardware, J. Miller, A. Thompson, P. Thomson, in T. C. Fogarty, Ur., v Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2000, str. 133–144. https://doi.org/10.1007/3-540-46406-9_14. S. Ando in H. Iba, „Analog Circuit Design with Variable Length Chromosomes“, str. 8. K.-J. Kim in S.-B. Cho, „Automated synthesis of multiple analog circuits using evolutionary computation for redundancy-based fault-tolerance“, Appl. Soft Comput., let. 12, št. 4, str. 1309–1321, apr. 2012, https://doi.org/10.1016/j.asoc.2011.12.002. A. Mirhoseini idr., „A graph placement methodology for fast chip design“, Nature, let. 594, št. 7862, str. 207-+, jun. 2021, https://doi.org/10.1038/s41586-021-03544-w. Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. W. Kruiskamp in D. Leenaerts, „Darwin: Analogue circuit synthesis based on genetic algorithms“, Int. J. Circuit Theory Appl., let. 23, št. 4, str. 285–296, 1995, https://doi.org/10.1002/cta.4490230404. J. R. Koza, F. H. Bennett III, D. Andre, M. A. Keane, in F. Dunlap, „Automated Synthesis of Analog Electrical Circuits by Means of Genetic Programming“, Trans Evol Comp, let. 1, št. 2, str. 109–128, jul. 1997, https://doi.org/10.1109/4235.687879. H. Y. Koh, C. H. Sequin, in P. R. Gray, „OPASYN: a compiler for CMOS operational amplifiers“, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., let. 9, št. 2, str. 113–125, feb. 1990, https://doi.org/10.1109/43.46777. T. McConaghy, P. Palmers, M. Steyaert, in G. G. E. Gielen, „Trustworthy Genetic Programming-Based Synthesis of Analog Circuit Topologies Using Hierarchical Domain-Specific Building Blocks“, IEEE Trans Evol. Comput., let. 15, str. 557–570, 2011. S. E. Sorkhabi in L. Zhang, „Automated topology synthesis of analog and RF integrated circuits: A survey“, Integration, let. 56, str. 128–138, 2017. Ž. Rojec, Á. Bűrmen, in I. Fajfar, „Analog circuit topology synthesis by means of evolutionary computation“, Eng. Appl. Artif. Intell., let. 80, str. 48–65, apr. 2019, https://doi.org/10.1016/j.engappai.2019.01.012. Z. Dong, W. Cao, M. Zhang, D. Tao, Y. Chen, in X. Zhang, „CktGNN: Circuit Graph Neural Network for Electronic Design Automation“, predstavljeno na The Eleventh International Conference on Learning Representations, sep. 2022. Pridobljeno: 8. avgust 2023. [Na spletu]. Dostopno na: https:// openreview.net/forum?id=NE2911Kq1sp J. He, K. Zou, in M. Liu, „Section-representation scheme for evolutionary analog filter synthesis and fault tolerance design“, v Third International Workshop on Advanced Computational Intelligence, avg. 2010, str. 265–270. https://doi.org/10.1109/IWACI.2010.5585181. S. Li, W. Zou, in J. Hu, „A Novel Evolutionary Algorithm for Designing Robust Analog Filters“, Algorithms, let. 11, št. 3, str. 26, mar. 2018, https://doi.org/10.3390/a11030026. J. Hu, X. Zhong, in E. D. Goodman, „Open-ended robust design of analog filters using genetic programming“, v Proceedings of the 7th annual conference on Genetic and evolutionary computation, v GECCO ’05. New York, NY, USA: Association for Computing Machinery, jun. 2005, str. 1619–1626. https://doi.org/10.1145/1068009.1068283. S. Ando in H. Iba, „Analog circuit design with a variable length chromosome“, v Proceedings of the 2000 Congress on Evolutionary Computation. 28. 29. 30. 31. 32. 33. 34. 35. CEC00 (Cat. No.00TH8512), jul. 2000, str. 994–1001 let.2. https://doi.org/10.1109/CEC.2000.870754. Ž. Rojec, I. Fajfar, in Á. Burmen, „Evolutionary Synthesis of Failure-Resilient Analog Circuits“, Mathematics, let. 10, št. 1, Art. št. 1, jan. 2022, https://doi.org/10.3390/math10010156. A. K. Kenneth, „Piecewise Linear Circuits“, mar. 2004. F. Maloberti, „Design of CMOS Analog Integrated Circuits“. Ž. Rojec, J. Olenšek, in I. Fajfar, „Analog Circuit Topology Representation for Automated Synthesis and Optimization“, Inf. Midem-J. Microelectron. Electron. Compon. Mater., let. 48, št. 1, str. 29–40, mar. 2018. G. Györök, „Crossbar network for automatic analog circuit synthesis“, v 2014 IEEE 12th International Symposium on Applied Machine Intelligence and Informatics (SAMI), jan. 2014, str. 263–267. https://doi.org/10.1109/SAMI.2014.6822419. D. G. Tomasz, Genetic Algorithms Reference. Tomasz Gwiazda, 2006. J. Olenšek, T. Tuma, J. Puhan, in Á. Bűrmen, „A new asynchronous parallel global optimization method based on simulated annealing and differential evolution“, Appl. Soft Comput., let. 11, št. 1, str. 1481–1489, 2011, https://doi.org/10.1016/j.asoc.2010.04.019. K. Deb, A. Pratap, S. Agarwal, in T. Meyarivan, „A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II“, Trans Evol Comp, let. 6, št. 2, str. 182–197, apr. 2002, https://doi.org/10.1109/4235.996017. Copyright © 2023 by the Authors. This is an open access article distributed under the Creative Commons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Arrived: 17. 02. 2023 Accepted: 06. 10. 2023 117 Boards of MIDEM Society | Organi društva MIDEM MIDEM Executive Board | Izvršilni odbor MIDEM President of the MIDEM Society | Predsednik društva MIDEM Prof. Dr. Barbara Malič, Jožef Stefan Institute, Ljubljana, Slovenia Vice-presidents | Podpredsednika Prof. Dr. Janez Krč, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Dr. Iztok Šorli, Mikroiks d.o.o., Ljubljana, Slovenia Secretary | Tajnik Olga Zakrajšek, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia MIDEM Executive Board Members | Člani izvršilnega odbora MIDEM Prof. Dr. Slavko Bernik, Jožef Stefan Institute, Slovenia Assoc. Prof. Dr. Miha Čekada, Jožef Stefan Institute, Ljubljana, Slovenia Prof. DDr. Denis Đonlagić, UM, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia Prof. Dr. Leszek J. Golonka, Technical University, Wroclaw, Poljska Prof. Dr. Vera Gradišnik, Tehnički fakultet Sveučilišta u Rijeci, Rijeka, Croatia Mag. Leopold Knez, Iskra TELA, d.d., Ljubljana, Slovenia Mag. Mitja Koprivšek, ETI Elektroelementi, Izlake, Slovenia Asst. Prof. Dr. Gregor Primc, Jožef Stefan Institute, Ljubljana, Slovenia Prof. Dr. Janez Trontelj, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Asst. Prof. Dr. Hana Uršič Nemevšek, Jožef Stefan Institute, Ljubljana, Slovenia Dr. Danilo Vrtačnik, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Supervisory Board | Nadzorni odbor Prof. Dr. Franc Smole, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Prof. Dr. Drago Strle, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Igor Pompe, retired Court of honour | Častno razsodišče Darko Belavič, Jožef Stefan Institute, Ljubljana, Slovenia Dr. Miloš Komac, retired Dr. Hana Uršič Nemevšek, Jožef Stefan Institute, Ljubljana, Slovenia Informacije MIDEM Journal of Microelectronics, Electronic Components and Materials ISSN 0352-9045 Publisher / Založnik: MIDEM Society / Društvo MIDEM Society for Microelectronics, Electronic Components and Materials, Ljubljana, Slovenia Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale, Ljubljana, Slovenija www.midem-drustvo.si