ISSN 0352-9045

Journal of Microelectronics,
Electronic Components and Materials
Vol. 53, No. 2(2023), June 2023
Revija za mikroelektroniko,
elektronske sestavne dele in materiale
letnik 53, številka 2(2023), Junij 2023

UDK 621.3:(53+54+621+66)(05)(497.1)=00			

ISSN 0352-9045

Informacije MIDEM 2-2023

Journal of Microelectronics, Electronic Components and Materials
VOLUME 53, NO. 2(186), LJUBLJANA, JUNE 2023 | LETNIK 53, NO. 2(186), LJUBLJANA, JUNIJ 2023
Published quarterly (March, June, September, December) by Society for Microelectronics, Electronic Components and Materials - MIDEM.
Copyright © 2023. All rights reserved. | Revija izhaja trimesečno (marec, junij, september, december). Izdaja Strokovno društvo za
mikroelektroniko, elektronske sestavne dele in materiale – Društvo MIDEM. Copyright © 2023. Vse pravice pridržane.
Editor in Chief | Glavni in odgovorni urednik
Marko Topič, University of Ljubljana (UL), Faculty of Electrical Engineering, Slovenia
Editor of Electronic Edition | Urednik elektronske izdaje
Kristijan Brecl, UL, Faculty of Electrical Engineering, Slovenia
Associate Editors | Odgovorni področni uredniki
Vanja Ambrožič, UL, Faculty of Electrical Engineering, Slovenia
Arpad Bürmen, UL, Faculty of Electrical Engineering, Slovenia
Danjela Kuščer Hrovatin, Jožef Stefan Institute, Slovenia
Matija Pirc, UL, Faculty of Electrical Engineering, Slovenia
Franc Smole, UL, Faculty of Electrical Engineering, Slovenia
Matjaž Vidmar, UL, Faculty of Electrical Engineering, Slovenia
Editorial Board | Uredniški odbor
Mohamed Akil, ESIEE PARIS, France
Giuseppe Buja, University of Padova, Italy
Gian-Franco Dalla Betta, University of Trento, Italy
Martyn Fice, University College London, United Kingdom
Ciprian Iliescu, Institute of Bioengineering and Nanotechnology, A*STAR, Singapore
Marc Lethiecq, University of Tours, France
Teresa Orlowska-Kowalska, Wroclaw University of Technology, Poland
Luca Palmieri, University of Padova, Italy
Goran Stojanović, University of Novi Sad, Serbia
International Advisory Board | Časopisni svet
Janez Trontelj, UL, Faculty of Electrical Engineering, Slovenia - Chairman
Cor Claeys, IMEC, Leuven, Belgium
Denis Đonlagić, University of Maribor, Faculty of Elec. Eng. and Computer Science, Slovenia
Zvonko Fazarinc, CIS, Stanford University, Stanford, USA
Leszek J. Golonka, Technical University Wroclaw, Wroclaw, Poland
Jean-Marie Haussonne, EIC-LUSAC, Octeville, France
Barbara Malič, Jožef Stefan Institute, Slovenia
Miran Mozetič, Jožef Stefan Institute, Slovenia
Stane Pejovnik, UL, Faculty of Chemistry and Chemical Technology, Slovenia
Giorgio Pignatel, University of Perugia, Italy
Giovanni Soncini, University of Trento, Trento, Italy
Iztok Šorli, MIKROIKS d.o.o., Ljubljana, Slovenia
Hong Wang, Xi´an Jiaotong University, China
Headquarters | Naslov uredništva
Uredništvo Informacije MIDEM
MIDEM pri MIKROIKS
Stegne 11, 1521 Ljubljana, Slovenia
T. +386 (0)1 513 37 68
F. + 386 (0)1 513 37 71
E. info@midem-drustvo.si
www.midem-drustvo.si
Annual subscription rate is 160 EUR, separate issue is 40 EUR. MIDEM members and Society sponsors receive current issues for free. Scientific Council for Technical Sciences of
Slovenian Research Agency has recognized Informacije MIDEM as scientific Journal for microelectronics, electronic components and materials. Publishing of the Journal is cofinanced by Slovenian Research Agency and by Society sponsors. Scientific and professional papers published in the journal are indexed and abstracted in COBISS and INSPEC
databases. The Journal is indexed by ISI® for Sci Search®, Research Alert® and Material Science Citation Index™. |
Letna naročnina je 160 EUR, cena posamezne številke pa 40 EUR. Člani in sponzorji MIDEM prejemajo posamezne številke brezplačno. Znanstveni svet za tehnične vede je
podal pozitivno mnenje o reviji kot znanstveno-strokovni reviji za mikroelektroniko, elektronske sestavne dele in materiale. Izdajo revije sofinancirajo ARRS in sponzorji društva.
Znanstveno-strokovne prispevke objavljene v Informacijah MIDEM zajemamo v podatkovne baze COBISS in INSPEC. Prispevke iz revije zajema ISI® v naslednje svoje produkte:
Sci Search®, Research Alert® in Materials Science Citation Index™.
Design | Oblikovanje: Snežana Madić Lešnik; Printed by | tisk: Biro M, Ljubljana; Circulation | Naklada: 1000 issues | izvodov; Slovenia Taxe Percue | Poštnina plačana pri pošti 1102 Ljubljana

Journal of Microelectronics,
Electronic Components and Materials
vol. 53, No. 2(2023)

Content | Vsebina
Original scientific papers

Izvirni znanstveni članki

Y. Li, D. Sang, M. Li, X. Li, T. Wang, B. O. Mohammed:
A New Quantum-Based Building Block for
Designing a Nano-Circuit with Lower Complexity

57

Y. Li, D. Sang, M. Li, X. Li, T. Wang, B. O. Mohammed:
Nov gradnik na kvantni osnovi za načrtovanje nano
vezja z manjšo kompleksnostjo

R. K. Pandey, V. Bhadauria,, V.K.Singh:
High-Gain Super Class-AB Bulk-driven
Sub-threshold Low-Power CMOS
Transconductance Amplifier for Biomedical
Applications

65

R. K. Pandey, V. Bhadauria,, V.K.Singh:
Ojačevalnik prevodnosti CMOS z nizko močjo in
velikim ojačenjem Super Class-AB za
biomedicinske aplikacije

R. Pilipović, P. Bulić, U. Lotrič:
An Energy-efficient and Accuracy-adjustable
bfloat16 Multiplier

79

R. Pilipović, P. Bulić, U. Lotrič:
Energijsko učinkovit približni množilnik v zapisu
bfloat16 z nastavljivo natančnostjo

L. Khanfir, J. Mouine:
A New Design Optimization Methodology of Fully
Differential Dynamic Comparator

87

L. Khanfir, J. Mouine:
Nova metodologija optimizacije zasnove polnega
diferencialnega dinamičnega komparatorja

Ž. Rojec:
Towards Smaller Single-point Failure-resilient
Analog Circuits by Use of a Genetic Algorithm

103

Ž. Rojec:
Manjšanje analognih vezij odpornih na odpoved
poljubne komponente z uporabo genetskega
algoritma

Front page:
Topology evolution of analogue electrical circuits
using evolutionary algorithms (Ž. Rojec)

Naslovnica:
Razvoj topologije analognih električnih vezij z
uporabo evolucijskih algoritmov (Ž. Rojec)

55

56

Original scientific paper
https://doi.org/10.33180/InfMIDEM2023.201

Journal of Microelectronics,
Electronic Components and Materials
Vol. 53, No. 2(2023), 57 – 64

A New Quantum-Based Building Block for
Designing a Nano-Circuit with Lower Complexity
Yao Li1, Dong Sang1, Min Li1, Xiaofang Li1, Tiantian Wang1, Bayan Omar Mohammed2
School of Computing, Weifang University of Science and Technology, Weifang Shandong, China
Development Center for Research and Training, College of Science and Technology, University of
Human Development, Sulaimani, Kurdistan Region, Iraq
1
2

Abstract: Next-generation nano-scale computational systems are being hampered by two significant obstacles: shrinking transistor
size and power dissipation. Moore’s law does not hold when transistor size reaches the atomic level. So, it becomes necessary to
investigate alternative technologies that surpass traditional Complementary Metal Oxide Semiconductor (CMOS) technology’s physical
constraints. Quantum Dot Cellular Automata (QCA), a transistor-free computational paradigm, is thought to be the best alternative
to CMOS technology for designing nano-scale logic circuits. However, not many designs cut energy usage and offer straightforward
access to inputs and outputs. Moreover, adders, the primary component in logic circuits and digital arithmetic, are crucial in
developing several efficient QCA designs. In this context, the 4-bit Ripple Carry Adder (RCA) is a straightforward type of adder that
can help produce circuits with minimal necessary space and power consumption because of its exceptional qualities. The synthesis
of high-level logic further demonstrates the design’s effectiveness. The outcomes of QCADesigner demonstrated that the proposed
circuits are less complicated and use less power than earlier designs compared to conventional design approaches.
Keywords: Nanotechnology; Quantum-dot cellular automata; XOR gate; Majority voter gate; Full adder; Ripple Carry Adder

Nov gradnik na kvantni osnovi za načrtovanje nano
vezja z manjšo kompleksnostjo
Izvleček: Računalniške sisteme naslednje generacije v nano merilu ovirata dve pomembni oviri: zmanjševanje velikosti tranzistorjev
in razprševanje energije. Moorov zakon ne velja, ko velikost tranzistorja doseže atomsko raven. Zato je treba raziskati alternativne
tehnologije, ki presegajo fizikalne omejitve tradicionalne tehnologije kovinsko oksidnih polprevodnikov (CMOS). Quantum Dot Cellular
Automata (QCA), računska paradigma brez tranzistorjev, naj bi bila najboljša alternativa tehnologiji CMOS za načrtovanje logičnih vezij
nano velikosti. Vendar pa ni veliko zasnov, ki bi zmanjšale porabo energije in omogočile neposreden dostop do vhodov in izhodov.
Poleg tega so seštevalniki, glavna komponenta v logičnih vezjih in digitalni aritmetiki, ključni pri razvoju več učinkovitih zasnov QCA.
V tem kontekstu je 4-bitni Ripple Carry Adder (RCA) enostavna vrsta seštevalnika, ki lahko zaradi svojih izjemnih lastnosti pomaga pri
izdelavi vezij z minimalno potrebnim prostorom in porabo energije. Sinteza logike visoke ravni dodatno dokazuje učinkovitost zasnove.
Rezultati programa QCADesigner so pokazali, da so predlagana vezja manj zapletena in porabijo manj energije kot prejšnje zasnove v
primerjavi z običajnimi pristopi načrtovanja.
Ključne besede: nanotehnologija; kvantni točkovni celični avtomati; vrata xor; vrata večinskega volivca; popolni seštevalnik; ripple
carry adder
* Corresponding Author’s e-mail: liyao@wfust.edu.cn

1 Introduction

a different technology for the next Integrated Circuits
(ICs) and diode-based technologies [2-4]. To address
the issues with CMOS technology [5], VLSI designers
are looking into a number of other technologies, including Quantum-dot Cellular Automata (QCA), single

High leakage power and sub-node scaling of 22 nm
technology are issues that transistor-based technologies must deal with [1]. These problems motivate designers of Very Large-Scale Integration (VLSI) to create

How to cite:
Y. Li et al., “A New Quantum-Based Building Block for Designing a Nano-Circuit with Lower Complexity", Inf. Midem-J. Microelectron.
Electron. Compon. Mater., Vol. 53, No. 2(2023), pp. 57–64
57

Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64

electron transistors, and tunnel field effect transistors.
Compared to competing technologies, QCA technology provides a number of advantages, including a smaller footprint need, quick switching times, and reduced
power dissipation [6].

context of QCA, “micro” refers to the individual components or elements of the system, namely the quantum
dots. These quantum dots are the building blocks of
QCA and serve as the basic units of information processing [13]. In the cell, a tunnel junction connecting
two pairs of quantum dots allows for the passage of
two electrons between them. The two electrons are
positioned in the cell at opposite ends because of Coulombic repulsion [14, 15]. In the context of QCA (Quantum-dot Cellular Automata), nonlinear and linear refer
to different types of behavior exhibited by the system.
Linear QCA refers to a system where quantum dots’ behavior can be described using linear operations, similar to classical digital logic gates, while nonlinear QCA
involves more complex interactions between quantum
dots, resulting in nonlinearity due to quantum effects
like Coulomb interactions and electron tunneling [16].
There is no cell-to-cell tunneling; tunneling only takes
place within the cell. Bisectional behavior results from
the interaction of the discrete electronic charge, Coulombic repulsion, and quantum confinement. Binary
“0” and “1” with polarisations of “1” and “+1”, respectively, can be represented by the two charge configurations. A QCA “wire” is a chain of cells contiguous to one
another, as opposed to a physical wire, as depicted in
Figure 1 (b). As there are no electron tunnels between
cells, QCA offers a method of information transfer without current flow [17].

QCA is a very intriguing and well-liked technology for
creating nano-scale logic circuits. There are no transistors in the QCA technology. The QCA cell, which comprises 4 quantum dots, is the fundamental unit of QCA
[7]. This method is energy-efficient since there is no actual charge movement between QCA cells. Logical values are determined based on the electrons’ location in
quantum dots. Due to Coulombic contact, electrons in
a QCA cell are situated at the opposing corners. There
are two logics 1 or 0 values in each cell. On the other
hand, these advantages led researchers to develop a
number of projects that explain how to construct QCA
circuits [8]. Adders, SRAM [9], ALUs, switching, encoderdecoders [10], reversible logic, and memories are just
a few of the recently invented circuits. Full adders play
a very prominent part in digital circuits since they are
employed in the creation of logical and mathematical
processes [11]. Therefore, building a QCA-based adder
with reduced space, shorter delays, straightforward access to inputs and outputs, and lower complexity will
be more crucial than ever [11]. This paper uses a novel,
low-complexity, and low-power three-layer full adder
circuit to suggest a new QCA-based ripple carry adder
(RCA) design for improving the previous designs. With
simple access to inputs and outputs, XOR and majority gates were used to create an RCA circuit, and the
results were compared to earlier designs. All protected
Nano-communication networks [12] are designed using adders and RCA designs. QCADesigner-E as a usually used tool for power analysis, will be utilized in this
paper for simulation and assessment.

1.2 QCA Logic Gates
The fundamental gates of QCA are inverters and threeinput majority gates. A majority gate comprises 4 cells
that achieve the function of M (a, b, and c) = ab+bc+ac,
as shown in Figure 2 (a) [18]. Cells are placed diagonally
from one another to achieve the inversion functionality, as shown in Figure 2 (b). Inverters and majority
gates make up a universal set that can be employed to
implement any logic operation. By setting one of the

The structure of this essay is as follows. The background
of QCA is presented in Section 2, with a focus on its distinctive cells. The 4-bit RCA’s detailed architecture is
shown in Section 3. Section 4 displays the simulation’s
findings. Finally, the paper is concluded in the last section.

2 QCA background and related works
This section discusses the important and basic parts of
this technology and the best previous works related to
the subject.

1.1 QCA Cells and Wires
Figure 1: Structure of basic QCA: (a) QCA cells, and (b)
QCA wire.

A QCA cell is a square nanostructure with four quantum dots (micro), roughly as shown in Figure 1(a). In the
58

Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64

majority gate inputs to “0,” for instance, AND (a, b) = M
(a, b, 0) = ab, a two-input AND gate is realized. In the
same manner, an OR gate is implemented by setting
one input to “1,” i.e., OR (a,b) =M (a,b,1) = ab + b 1 + a 1
= a + b [19].

with minimum complexity and high speed. It has a delay of 1.25 clock cycles and 209 cells in a 0.3 µm2 area.
Also, the fundamental QCA and QCA-based digital design concepts have been put out by Chan, et al. [24].
The creation of straightforward digital logic utilizing
certain QCA approaches has been discussed in this article. The four-bit ripple adder has been provided using
a combinational notion from the traditional RCA and
the CLA. These circuits were implemented utilizing the
5-input majority gate, which theoretically can lower
the latency of the traditional QCA-based RCA. The recommended adder has a latency of 3.25 clock cycles, an
area of 2.5 µm2, and 1246 cells. The designed structures
have been verified using the QCADesigner. Finally,
Hashemi and Navi [25] suggest a reliable QCA and an
RCA full adder circuit based on a successful five-input
majority gate. These circuits have employed a robust
crossover design in comparison to similar designs. Owing to the full adder circuit’s efficient architecture, it has
been employed for RCA design in a variety of scales.
The coherent and bistable simulation engines of the
QCADesigner have used to simulate the suggested designs. The proposed RCA uses 442 cells with an area of
1 µm2 and a delay of 2 clock cycles.

Figure 2: Structure of basic QCA: (a) Three-input majority gate, and (b) Inverter gate.

1.3 QCA Clocking
To drastically reduce metastability issues and enable
long pipelines, adiabatic switching is used for QCA
clocking. One-half of the wire is used for signal transmission during each clock cycle, and the other half is
left unpolarized [20]. The cells in the active clock zone
that is still present cause the newly activated cells to
become polarized during the subsequent clock cycle,
which deactivates half of the previously active clock
zone [21]. As a result, signals continue from one clock
zone to the next. Four-phase clock signals are used to
control four different circuit areas. Each zone of the
clock signal has four states: high, low, low to high, and
high. When the status changes from high to low, the
cell starts to calculate and keeps the value while the
state is low. The cell is released when the clock is in the
low-to-high state and not operating [22].

2 Proposed design
This part presents and simulates new designs and effective architectures for a one-bit QCA full adder and
four-bit QCA RCA. One-bit QCA full adder block diagram is illustrated in Figure 3, and the exploited full
adder’s QCA-based layout with a three-input majority
gate and three-input XOR gate is shown in Figure 4.
This complete adder comprises 15 cells and uses 0.5
clock cycles to generate outputs with a 0.01 µm2 area
and simple input and output connectivity. This threelayer implementation of a QCA full adder uses ordinary
QCA cells. Input cells are A, B, and C, and output cells
are COUT and SUM. In this design, the first layer acts as
an XOR gate and is used to generate the SUM, while the
second layer is utilised to transmit values to the third
layer, where all of the circuit’s inputs are applied and
the COUT output is generated.

1.4 Related work
This section reviews numerous significant and useful
recommendations for the design of sophisticated and
straightforward QCA RCA circuit designs. Abedi, et al.
[23]. propose a cross-level QCA architecture in a full
adder QCA design. Additionally, supplied proposed a
RCA that is based on this design. Using QCADesigner,
these designs have been accuracy-tested and assessed.
Compared to earlier methods, conventional evaluation methodology and particular cost function QCA
were applied for superior performance. The suggested
RCA has a delay period of 1.75 clock cycles and uses
262 cells in a 0.208 µm2 area. Also, Balali and Rezai [14]
proposed a QCA structure for the full adder to create
a high-speed, efficient, and reliable four-bit RCA using
the QCA technology. Their modeling results have demonstrated that there are significant increases in circuit
speed and latency. To verify the accuracy of these designs, QCADesigner was employed. The four-bit RCA
that is suggested in the QCA technology is designed

Figure 3: QCA-based full adder diagram
The proposed adder can easily implement the higher
adder designs. Higher adders, such as 4-bit RCA, have
been designed using this Complete adder with fewer
QCA cells, which is entirely distinct from earlier ver59

Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64

Figure 4: QCA-based full adder layouts and layers
Table 1: Simulation parameters

sions. The proposed four-bit RCA design is illustrated
in Figure 5 with its structure. Also, a four-bit QCA-based
RCA that uses four one-bit full adder QCA-based circuits as its structural unit is also depicted in Figure 6.
The 72 cells in the suggested four-bit QCA-based RCA
have an area of 0.11 µm2 and a delay of 1.75 clock cycles. All of the inputs and outputs on this three-layer
circuit are accessible. There are five outputs (S0-S3,
COUT) and 9 inputs (A0-A3, B0-B3, C). The outputs in
this design are easily accessible because they are not
encircled by other cells. To transfer signal output, this
structure does not need a wire in other words. Thus, it is
simple to feed the outputs to another QCA input.

Parameter
Cell size
Radius of
effect

Bistable approxima- Coherence Vection engine Value
tor engine Value
18 *18 nm2
18 *18 nm2
65 nm

Relative per12.9000000
mittivity
Clock high
Clock low
Clock amplitude factor
Clock shift
Layer separation
Maximum
iterations
per sample
Number of
samples
Convergence tolerance

Figure 5: The proposed schematic for 4-bit RCA

80 nm
12.9000000

9.8e−22J
3.8e−23J

9.8e−22J
3.8e−23J

2.000000

2.000000

0.000000e+000

0.000000e+000

11.5000 nm

11.5000 nm

100

-

12800

-

0.001000

-

The constructed full adder circuit simulation results
are shown in Figure 7. All possible states have been
applied to the circuit’s inputs, and the outputs have
created the desired outcomes, as shown in the correct
table. Both outputs are formed concurrently after two
clock cycles. The third layer of this full adder, designed
in three layers, receives the three inputs and processes
them to produce the COUT output from the third layer
and the SUM output from the first layer. The accuracy
of the suggested designs was demonstrated by these
simulations, which were run using the default settings.
Tables 2 and 3 compare the supplied full adder and
RCA circuit cell, latency, and space usage to the best
previous designs.

Figure 6: Three layers of the proposed QCA-based 4-bit
RCA

3 Simulation tool and results
The software QCADesigner-E is used in this paper to
simulate the suggested design [26]. Fast design, layout,
and simulation of QCA circuits are made possible by
QCADesigner software. Table 1 contains all of the simulation parameters for the simulated objects. The default
parameters for all simulation measures and conditions
are used in this tool [27].
60

Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64

Table 2: Comparisons among the designs
Area
(µm2)
Proposed design
0.01
Ahmadpour, et al. [28]
0.01
Seyedi and Navimipour [6] 0.01
Sarmadi, et al. [29]
0.04
Sayedsalehi, et al. [30]
0.02
Designs

Cells
15
20
22
30
33

Delay
(Clock cycle)
0.5
0.5
0.75
1.0
0.75

Figure 8 displays the simulation results for the QCAbased RCA circuit. The circuit generates the proper output when subjected to every possible condition. Actually, Figure 8 displays the outcomes of the simulation
for the variables A0, A1, A2, A3, B0, B1, B2, B3, and C. As
depicted in the figure, the circuit receives input from
all potential states and generates the desired output.
Also, simulation results show strong polarization of the
output cells for this circuit.

Figure 8: Simulation result of the proposed RCA
Table 3: Comparisons among the RCA designs
Area
(µm2)
Proposed design
0.11
Balali and Rezai [31]
0.3
Sonare [32]
0.51
Rashidi and Rezai [33]
0.14
Abedi, et al. [23]
0.208
Mohammadi, et al. [34]
0.24
Labrado and Thapliyal [35] 0.3
Designs

Figure 7: Simulation outcomes of the proposed design

61

Cells
72
209
366
175
262
237
295

Delay
(Clock cycle)
1.75
1.25
2/5
1
1.75
1.5
1.5

Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64

Table 4: Comparison of total and average energy dissipation

Designs
Proposed full adder design
Ahmadpour, et
al. [28]
Seyedi and Navimipour [6]
Sarmadi, et al.
[29]
Sayedsalehi, et al.
[30]
Proposed RCA
design
Balali and Rezai
[31]
Sonare [32]
Rashidi and Rezai
[33]
Abedi, et al. [23]
Mohammadi, et
al. [34]
Labrado and
Thapliyal [35]

The suggested full adder consists of 15 cells and
achieves output generation in 0.5 clock cycles. It occupies an area of 0.01 µm2 and features straightforward
input and output connectivity. Additionally, the suggested four-bit QCA-based RCA incorporates 72 cells,
covering an area of 0.11 µm2. The RCA exhibits a delay
of 1.75 clock cycles. In this study, QCADesigner-E assessed the total power dissipation of the QCA structure.
These circuits have one of the best power consumption
rates and are easily accessible to the inputs and outputs. In the future, high-speed adders can be designed
that play an essential role in multi-layer designs and
further improve computational performance. Highperformance QCA circuits and an n-bit ripple carry adder can be created at the nanoscale using the given
effective architectures. The suggested concept may
therefore have a fundamental impact on the development of high-speed circuits as well as other forms of
adders, such as complete subtractors and borrow ripple subtractors.

Power and energy analysis
Total energy
Average energy
dissipation (eV) dissipation (eV)
1.458

1.057

1.25

1.15

1.55

1.56

1.80

1.87

1.69

1.55

2.80

2.63

2.48

2.59

2.74

2.70

3.02

2.98

3.56

3.15

2.89

3.12

2.485

2.84

5 Conflict of Interest
The authors declare that they have no conflicts of interest.

6 References

Additionally, we compared the suggested designs in
Table 4 to the best current designs in terms of Total energy dissipation (eV) and Average energy dissipation in
order to better comprehend and compare circuits (eV).
It is obvious that the current design is the most energyefficient one.

1.

2.

4 Conclusion and future works
A new and emerging technology that plays a significant
role in nanotechnology and has been researched for
years is QCA technology. Considering the advantages
of QCA, such as fast switching time, low power requirement, and high device density, it can be a good alternative. According to the cases mentioned in this article,
this technology has been used to implement adder
circuits. In fact, it creates an innovative architecture for
a 1-bit QCA full adder. Then, applying this innovative
full adder layout, a high-speed adder is developed as
a 4-bit RCA. Our study effort is shown to provide fewer
cells and smaller areas with realistic simulation results
compared to the newly published collector architecture. The presented multi-layer architecture is significantly more durable than the conventional full adder.

3.

4.

5.

62

T. Tuncer, E. Avaroglu, M. Türk, and A. B. Ozer,
“Implementation of non-periodic sampling true
random number generator on FPGA,” Informacije
Midem, vol. 44, pp. 296-302, 2014.
M. A. S. Bhuiyan, “CMOS series-shunt single-pole
double-throw transmit/receive switch and low
noise amplifier design for internet of things based
radio frequency identification devices,” Informacije MIDEM, vol. 50, pp. 105-114, 2020.
H. Tian, J. Liu, Z. Wang, F. Xie, and Z. Cao, “Characteristic Analysis and Circuit Implementation of a
Novel Fractional-Order Memristor-Based Clamping Voltage Drift,” Fractal and Fractional, vol. 7, p.
2, 2022.
J. Xiang, W. Yang, H. Liao, P. Li, Z. Chen, and J.
Huang, “Design and thermal performance of thermal diode based on the asymmetric flow resistance in vapor channel,” International Journal of
Thermal Sciences, vol. 191, p. 108345, 2023.
S. Li, J. Chen, X. He, Y. Zheng, C. Yu, and H. Lu,
“Comparative study of the micro-mechanism of
charge redistribution at metal-semiconductor
and semimetal-semiconductor interfaces: Pt (Ni)MoS2 and Bi-MoS2 (WSe2) as the prototype,” Applied Surface Science, vol. 623, p. 157036, 2023.

Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64

6.

7.

8.

9.

10.

11.
12.

13.

14.

15.

16.

17.

S. Seyedi and N. J. Navimipour, “An optimized design of full adder based on nanoscale quantumdot cellular automata,” Optik, vol. 158, pp. 243256, 2018.
S. Seyedi and N. J. Navimipour, “Designing a
three-level full-adder based on nano-scale quantum dot cellular automata,” Photonic Network
Communications, vol. 42, pp. 184-193, 2021.
S. Seyedi and N. Jafari Navimipour, “Designing a
multi‐layer full‐adder using a new three‐input
majority gate based on quantum computing,”
Concurrency and Computation: Practice and Experience, vol. 34, p. e6653, 2022.
A. Yan, J. Xiang, A. Cao, Z. He, J. Cui, T. Ni, et al.,
“Quadruple and Sextuple Cross-Coupled SRAM
Cell Designs With Optimized Overhead for Reliable Applications,” IEEE Transactions on Device and
Materials Reliability, vol. 22, pp. 282-295, 2022.
W. Dang, S. Liao, B. Yang, Z. Yin, M. Liu, L. Yin, et al.,
“An encoder-decoder fusion battery life prediction method based on Gaussian process regression and improvement,” Journal of Energy Storage, vol. 59, p. 106469, 2023.
A. Kamaraj and P. Marichamy, “Design of faulttolerant reversible floating point division,” Informacije MIDEM, vol. 48, pp. 161-172, 2018.
Z. Qu, X. Liu, and M. Zheng, “Temporal-Spatial
Quantum Graph Convolutional Neural Network
Based on Schrödinger Approach for Traffic Congestion Prediction,” IEEE Transactions on Intelligent Transportation Systems, 2022.
X. Jianhua, D. Liangming, C. Zhou, H. Zhao, J.
Huang, and T. Sulian, “Heat Transfer Performance
and Structural Optimization of a Novel Microchannel Heat Sink,” Chinese Journal of Mechanical Engineering= Ji xie gong cheng xue bao, vol.
35, 2022.
A. Kamaraj, P. Marichamy, and R. Abirami, “MULTI-PORT RAM DESIGN IN QCA USING LOGICAL
CROSSING,” Informacije MIDEM, vol. 51, pp. 49-61,
2021.
J. Gao, H. Sun, J. Han, Q. Sun, and T. Zhong, “Research on recognition method of electrical components based on FEYOLOv4-tiny,” Journal of
Electrical Engineering & Technology, vol. 17, pp.
3541-3551, 2022.
S. Xu, H. Dai, L. Feng, H. Chen, Y. Chai, and W. X.
Zheng, “Fault Estimation for Switched Interconnected Nonlinear Systems with External Disturbances via Variable Weighted Iterative Learning,”
IEEE Transactions on Circuits and Systems II: Express Briefs, 2023.
S. Seyedi, B. Pourghebleh, and N. Jafari Navimipour, “A new coplanar design of a 4‐bit ripple carry adder based on quantum‐dot cellular autom-

18.
19.

20.

21.

22.

23.

24.

25.
26.

27.

28.

29.

30.

63

ata technology,” IET Circuits, Devices & Systems,
vol. 16, pp. 64-70, 2022.
R. M. Macrae, “Mixed-valence realizations of quantum dot cellular automata,” Journal of Physics and
Chemistry of Solids, vol. 177, p. 111303, 2023.
M. Kikelj, B. Lipovšek, and F. Smole, “Orthodox
Theory Monte-Carlo Simulation of Single-Electron Logic Circuits,” Informacije MIDEM, vol. 48,
pp. 241-247, 2018.
J. Wang, J. Tian, X. Zhang, B. Yang, S. Liu, L. Yin, et
al., “Control of time delay force feedback teleoperation system with finite time convergence,” Frontiers in Neurorobotics, vol. 16, 2022.
A. Asthana, A. Kumar, and P. Sharan, “N× N Clos
Digital Cross-Connect Switch Using Quantum Dot
Cellular Automata (QCA),” Computer Systems Science & Engineering, vol. 45, 2023.
S. Riyaz and V. K. Sharma, “Design of reversible
Feynman and double Feynman gates in quantum-dot cellular automata nanotechnology,” Circuit world, vol. 49, pp. 28-37, 2023.
D. Abedi, G. Jaberipur, and M. Sangsefidi, “Coplanar full adder in quantum-dot cellular automata
via clock-zone-based crossover,” IEEE transactions
on nanotechnology, vol. 14, pp. 497-504, 2015.
S. T. Y. Chan, C. F. Chau, and A. bin Ghazali, “Design of a 4-bit ripple adder using Quantum-dot
Cellular Automata (QCA),” in Circuits and Systems
(ICCAS), 2013 IEEE International Conference on,
2013, pp. 33-38.
S. Hashemi and K. Navi, “A novel robust QCA fulladder,” Procedia Materials Science, vol. 11, pp.
376-380, 2015.
M. Patidar, U. Singh, S. K. Shukla, G. K. Prajapati,
and N. Gupta, “An ultra-area-efficient ALU design
in QCA technology using synchronized clock
zone scheme,” The Journal of Supercomputing,
vol. 79, pp. 8265-8294, 2023.
D. Manna, C. Mukherjee, A. Banerjee, M. Dhar,
S. Panda, and B. Maji, “Towards Energy-Efficient
Cost-Effective Toffoli Gate Design using Quantum
Cellular Automata,” in 2023 IEEE Devices for Integrated Circuit (DevIC), 2023, pp. 56-60.
S.-S. Ahmadpour, M. Mosleh, and S. R. Heikalabad,
“A revolution in nanostructure designs by proposing a novel QCA full-adder based on optimized
3-input XOR,” Physica B: Condensed Matter, vol.
550, pp. 383-392, 2018.
S. Sarmadi, S. Sayedsalehi, M. Fartash, and S. Angizi, “A structured ultra-dense QCA one-bit fulladder cell,” Quantum Matter, vol. 5, pp. 118-123,
2016.
S. Sayedsalehi, M. H. Moaiyeri, and K. Navi, “Novel
efficient adder circuits for quantum-dot cellular
automata,” Journal of Computational and Theoretical Nanoscience, vol. 8, pp. 1769-1775, 2011.

Y. Li et al.; Informacije Midem, Vol. 53, No. 2(2023), 57 – 64

31.

32.
33.
34.

35.

M. Balali and A. Rezai, “Design of Low-Complexity
and High-Speed Coplanar Four-Bit Ripple Carry
Adder in QCA Technology,” International Journal
of Theoretical Physics, vol. 57, pp. 1948-1960, July
01 2018.
N. Sonare, “Design and Simulation Study of Coplanar Full Adder and Ripple Carry adder using
Quantum Dot Cellular Automata,” 2018.
H. Rashidi and A. Rezai, “High-performance full
adder architecture in quantum-dot cellular automata,” The Journal of Engineering, vol. 1, 2017.
M. Mohammadi, M. Mohammadi, and S. Gorgin,
“An efficient design of full adder in quantum-dot
cellular automata (QCA) technology,” Microelectronics Journal, vol. 50, pp. 35-43, 2016.
C. Labrado and H. Thapliyal, “Design of adder and
subtractor circuits in majority logic-based fieldcoupled QCA nanocomputing,” Electronics letters, vol. 52, pp. 464-466, 2016.

Copyright © 2023 by the Authors.
This is an open access article distributed under the Creative Commons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted
use, distribution, and reproduction in any medium,
provided the original work is properly cited.
Arrived: 16. 11. 2022
Accepted: 09 .07. 2023

64

Original scientific paper
https://doi.org/10.33180/InfMIDEM2023.202

Journal of Microelectronics,
Electronic Components and Materials
Vol. 53, No. 2(2023), 65 – 77

High-Gain Super Class-AB Bulk-driven Subthreshold Low-Power CMOS Transconductance
Amplifier for Biomedical Applications
Rakesh Kumar Pandey1, Vijaya Bhadauria2 and V.K.Singh3
Dr. A.P.J. Abdul Kalam Technical University (AKTU), Lucknow, India
Motilal Nehru National Institute of Technology (MNNIT), Prayagraj, India
3
Institute of Engineering and Technology (IET), Lucknow, India
1
2

Abstract: This article describes a high-gain sub-threshold region-operated bulk-driven (BD) super class-AB power-efficient singlestage operational transconductance amplifier (OTA) with enhanced unity gain frequency (UGF). The proposed amplifier has a BD
adaptively biased flipped voltage follower (FVF) differential input pair functioning in class-AB mode to raise the dynamic current and
subsequently raise the UGF, and slew rate. Additionally, the core circuit of the proposed OTA employs partial positive feedback (PPF)
to magnify the circuit’s effective input transconductance and gain. Moreover, the circuit’s overall gain is moved up by using three
additional low-power current mirror loads, two of which are FVF current mirrors and one of which is a self-cascode current mirror,
placed at the output. The proposed OTA circuit and its traditional counterpart are developed and simulated on the Cadence Spectre
tool by exploiting UMC 0.18μm CMOS process technology, both circuits are biased with a minimal supply of 0.5V. The simulation
results exhibit that the proposed circuit delivers 72.35dB open loop DC gain, 61.33º phase margin, and 18.706 kHz UGF with a
consumption of only 62.82nW power. The performance outcomes ensured the suitability of the proposed OTA circuit for biomedical
applications.
Keywords: Adaptive biasing, Bulk-driven OTA, FVF, Partial Positive Feedback, Self-cascode

Ojačevalnik prevodnosti CMOS z nizko močjo
in velikim ojačenjem Super Class-AB za
biomedicinske aplikacije
Izvleček: V članku je opisan enostopenjski operacijski transkonduktančni ojačevalnik (OTA) z visokim ojačenjem, ki deluje v
podpražnem območju in je voden preko substrata (BD), ki je energetsko učinkovit in ima povečano frekvenco enotnega ojačenja
(UGF). Predlagani ojačevalnik ima BD adaptivno pristranski diferencialni vhodni par z obrnjenim napetostnim sledilnikom (FVF), ki
deluje v načinu razreda AB za povečanje dinamičnega toka in posledično povečanje UGF in hitrosti premikanja. Poleg tega jedro vezja
predlaganega ojačevalnika OTA uporablja delno pozitivno povratno zvezo (PPF) za povečanje učinkovite vhodne transkonduktivnosti
in ojačitve vezja. Poleg tega se celotno ojačenje vezja poveča z uporabo treh dodatnih tokovnih zrcal z nizko porabo, od katerih sta dve
tokovni zrcali FVF, eno pa je tokovno zrcalo s samokaskodo, ki je nameščeno na izhodu. Predlagano vezje OTA in njegovo tradicionalno
analogno vezje sta razvita in simulirana v orodju Cadence Spectre z uporabo 0,18μm CMOS procesne tehnologije UMC, obe vezji
sta obremenjeni z minimalnim napajanjem 0,5 V. Rezultati simulacije kažejo, da predlagano vezje zagotavlja 72,35 dB DC ojačitve v
odprti zanki, 61,33º fazno razliko in 18,706 kHz UGF s porabo samo 62,82 nW energije. Rezultati delovanja so zagotovili primernost
predlaganega vezja OTA za biomedicinske aplikacije.
Ključne besede: prilagodljiva pred napetost, množično voden OTA, FVF, delna pozitivna povratna zanka, samo-kaskoda
* Corresponding Author’s e-mail: rakesh18.pnd@gmail.com

How to cite:
R. K. Pandey et al., “High-Gain Super Class-AB Bulk-driven Sub-threshold Low-Power CMOS Transconductance Amplifier for Biomedical
Applications", Inf. Midem-J. Microelectron. Electron. Compon. Mater., Vol. 53, No. 2(2023), pp. 65–77
65

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

1 Introduction

proach is no longer used since it lowers output voltage
swings. A self-cascode (SC), as described in references
[5, 11, 14, 16], is an excellent approach to carry a strong
DC gain and great output swing. It consists of two transistors but is handled as a single composite transistor.
When SC loads are used, the output impedance roughly increases by a factor of 10, which is comparable to
a gain improvement of about 20 dB. The composite
SC loads don’t require any extra bias sources to drive
the cascode transistors and hence maximize the voltage gain. Some authors utilize partial positive feedback
(PPF) techniques mentioned in [5, 12, 16, 17, 23-29] to
improve the input core bulk- transconductance and
hence improve the small-signal performances of bulkdriven OTAs, but the enhancement of the large-signal
performances is not mentioned in these techniques.

Over the last few years, due to the advancement of
CMOS technology, there is a continuous requirement
for portable handset electronic devices like laptops,
notepads, wireless sensor networks, mobile phones,
biomedical implantable devices, etc in our everyday
lives. The medical field is also drastically changing
towards portability to continuously monitor patient
health [1-5], this makes the attraction in the evaluation of ultra-low-voltage, low-power circuit designs for
portable applications. Analog circuit designers are still
rigorously working in this field and illustrating different
design techniques for low voltage in the literature [6-8].
As the most fundamental block in an analog circuit,
the OTA plays a pivotal role in analog front-end circuits
used in biomedical data acquisition systems. Electromyograms (EMG), Electroencephalograms (EECG),
Electrocardiograms (ECG), and other bio-potential
signals are low voltage (amplitude in mV), and lowfrequency signals with only a few kHz range. Rail-to-rail
input/output swing, high DC gain, low noise, high linearity, and minimal power consumption are the basic
requirements of the OTA used in biomedical applications [3]. Achieving these characteristics using a low
power supply in deep submicron technologies is really
a challenge. The conventional gate-driven technique
is unsuitable for application under a 1V environment
because of the threshold voltage constraint; OTA’s restricted linear range and high power consumption are
two of its biggest flaws. The weak-inversion design
technique is well-suited to reduce power consumption,
as the necessary drain-to-source voltage (VDS) for strong
inversion is 250 mV, which is decreased to about 78 mV
[5, 9-11]. An alternate approach for operating rail-torail is to use the bulk-driven (BD) technique, which can
prevail over the aforementioned linearity and threshold voltage restrictions. The bulk-driven technique in
combination with the sub-threshold technique is preferable for biomedical applications, as the combined effect of both techniques increases linearity and reduces
power consumption. Although the bulk-driven technique increases input common-mode range (ICMR), it
reduces open-loop DC gain, and UGF and raises inputreferred noise, since the gate transconductance (gm) is
(2.5–5) times higher than the bulk transconductance
(gmb) [12-16]. A number of bulk-driven OTA designs
are described to improve the above-mentioned disadvantages of reduced bulk-transconductance under
sub-1V environments in the literature [10-19], and also
discussed in the references [20-22] in extremely low
voltage conditions with very little power loss. Despite
being the most power-efficient, single-stage amplifiers cannot deliver enough gain, in order to provide
high gain, cascode techniques are used earlier. This ap-

This paper presents an improved bulk-driven low-power single-stage super class-AB OTA [12, 26, 30-32], operated in a sub-threshold region, which has been termed
as super class-AB bulk-driven sub-threshold (SBDST)
OTA in the whole paper. The proposed amplifier utilizes
an adaptive bias technique in the input differential pair
based on a BD-FVF [26, 31] functioning in the classAB mode to improve the dynamic current and unity
gain frequency. The partial positive feedback (PPF)
technique has been exploited in the core circuit to improve the overall effective input transconductance and
hence gain of the circuit. In addition to the improvement of input transconductance, output impedance
also increases by using low power and high performance three current mirrors at the output, hence, the
overall gain of the circuit further raises. The proposed
SBDST OTA offers significant open loop DC gain, UGF,
and slew-rate while exploiting minimal power, by utilizing the aforementioned techniques.
This paper is structured as follows: The study of conventional OTA is covered in Section 2, along with a thorough
circuit description of both the proposed and conventional OTA. Section 3 discusses the proposed OTA’s intricate
circuit analysis. In Section 4, the simulation outcomes of
conventional and proposed OTA including Monte Carlo,
process corner analysis, and layout are covered. Section
5 compares the proposed OTA’s performance to those of
the other previously reported designs, and Section 6 finally brings to a conclusion of the paper.

2 Circuit Descriptions
2.1 Conventional Bulk-driven Sub-threshold (BDST)
OTA
The conventional BDST OTA, in which the input core circuit is designed using bulk-input PMOS transistors PI1a66

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

PI1b, is depicted in Fig. 1. The differential input transistor
pair working in the sub-threshold region is biased by
using transistor PB; the drain current (IDS) of a transistor
operating in sub-threshold is expressed as [11]:
 VGS  VTn 
nKT 

 W  q
I DS = Is   e
 L 

[1  e

  VDS 
 q KT 



The output impedance Rout of the BDST OTA is provided
as:
Rout = [ro4,bN ||(ro3,bP + (ro3,bN||ro2,bP))] 			
since,

] 		

(1)

(4)

ro3,bP>> (ro3,bN||ro2,bP)

therefore, equation (4) is simplified as:

where IS and K are the characteristic current of the subthreshold and Boltzmann constant, n is the slope of
the curve in the sub-threshold region, T is the absolute
temperature and q is the charge of electron respectively.

Rout = (ro4,bN||ro3,bP) 					(5)

The transistors are in saturation in the sub-threshold region if VDS ≥ 3V T, where (V T = KT/q) is the thermal equivalent voltage and its value at 27º is 26 mV.

AV,BDST = Gm,BDST .Rout
AV,BDST = K1.gmb1,a/b .(ro4,bN||ro3,bP) 			

Effective transconductance and the circuit’s output
impedance combine to provide the open loop dc gain
(AV), which is expressed as:

The main problem of conventional BDST OTA is a very
low open-loop gain, it is only about 32 dB. A non-linear
current mirror is employed here to increase the amplifier’s slew rate and unity gain frequency (UGF), which
can be evaluated with capacitive load CL by the following equations:

Applying the condition VDS ≥ 3V T in (1), then the term

e

  VDS 
q

 KT 

 1, hence the equation (1) simplifies to
 VGS -VTn 

nKT 

 W  q
I DS =Is   e
 L

(6)

			(2)

UGFBDST =

K1.g mb1,a/b

			(7)

The output of differential input pair consists of NMOS
transistor pair (N1a-3a - N1b-3b), form the non-linear current mirrors with a current transfer ratio K1 = 2. These
current mirrors known as the adaptive loads [26], are
loads of the input transistor pair. The output of the
adaptive loads is routed to the summing stage, which
is at the circuit’s output, to raise the output impedance.
The summing stage of the conventional OTA uses PMOS
transistors (P2a-2b - P3a-3b) as a current mirror to boost the
largely dc gain of the circuit. The conventional BDST
OTA’s effective transconductance is provided by:

The values of UGF and slew rate of the conventional
amplifier are 1.637 kHz, and 0.92V/ms respectively,
which are quite low. Therefore, some structural change
is required in the amplifier to get better the whole performance of the conventional BDST OTA concerning
open-loop gain, slew rate, UGF, etc.

Gm,BDST = K1.gmb1,a/b 				(3)

2.2 Proposed SBDST OTA

where gmb1 represents the bulk-transconductance of input transistors.

We proposed the super class-AB bulk-driven subthreshold (SBDST) OTA, which is depicted in Fig. 2, to
enhance the performance of conventional BDST OTA.
Its input core makes use of two identical adaptively biased BD-FVF pair, eliminating the bias current source
of the conventional circuit, which is supplied by PB in
Fig. 1.

SR BDST =

2πCL

2.K1.I B
				(8)
CL

The best possible dimensions of transistors are selected to function the proposed SBDST in a sub-threshold
region so that the circuit can obtain low power operation i,e., below 100nW. To extend the input commonmode range (0 to VDD) of the circuit bulk-driven differential pair is used in the input stage, however, gate
transconductance (gm) is (2.5–5) times more than the
bulk transconductance (gmb) [12]. Consequently, the

Figure 1: Conventional Bulk-driven sub-threshold
(BDST) OTA

67

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

circuit’s effective transconductance decreases, and
hence, the gain and UGF of the circuit are also considerably low. To overcome these limitations an adaptive
biased super class-AB [26, 31, 32] is incorporated into
the input core. The BD-FVF pair at the input consists of
bulk-driven transistors PI1a-PI1b, diode-connected transistors PI3a-PI3b connected in negative feedback [31, 32],
and the current source made by transistors NI5a-NI5b.
The input core of the SBDST is made up of adaptively
biased input differential pair PI2a-PI2b and adaptive
loads. Non-linear current mirrors (N1a-3a - N1b-3b) with a
current ratio of 1:K1 are called adaptive loads. The
source terminal of FVF pair is the output node that has
very low impedance, given by

cascode current mirror. In composite SC structure, the
aspect ratio of the cascode transistor and the transistor connected to the supply is set to 20, to operate in
saturation in the sub-threshold region [11, 16]. Hence,
by raising the input stage’s transconductance and the
output stage’s output impedance, the proposed SBDST
OTA’s overall performance is enhanced.

1
. The FVF is
g m3a g m1a ro1a
Figure 2: Proposed SBDST OTA

capable to source a significant amount of current even
greater than the bias current Ib on the variation of differential input voltage because of the low impedance
at the output node. Hence, the FVF pair in combination
with the adaptive biased differential pair makes the
proposed OTA function in class-AB. This combination
eliminates the limitations of traditional BDST OTA.

3 Explanation of the proposed SBDST
OTA
This section describes the SBDST OTA’s overall transconductance, voltage gain, UGF, and stability.

Differential input signals Vin- and Vin+ are applied across
the bulk terminal of transistors PI1a-PI1b as well as to
the bulk terminal of adaptive biased differential pair
PI2a-PI2b. Due to the voltage follower action of FVF, the
transistor PI1a source terminal is also Vin-. This terminal is
named C as shown in Fig.2 and is also connected to the
source terminal of the transistor PI2a, hence, the total
signal voltage that appears across the transistor PI2a is
VBS = [Vin+ - Vin-] = [Vin+ - (-Vin+)] = 2 Vin+. Similarly, the VBS of
the transistor PI2b is 2Vin-. Therefore, the proposed SBDST
OTA’s effective transconductance is twice as much as
that of the traditional OTA and is equal to 2gmb1,a/b.

3.1 Effective transconductance and UGF
The half sub-circuit of the input core of SBDST and its
small signal AC equivalent circuit are depicted in Fig.
3a and b.
The input signals Vin- and Vin+ are applied to the bulk
terminal of the transistors PI1a and PI2a respectively, assuming the voltage at their source terminals is VC, and
(a)

Since the transconductance of the SBDST increases, so
the gain and UGF are also increase. To further enhance
the transconductance, and gain, the PPF technique
has been introduced using transistors N4a and N4b at
adaptive load ends of the input core, shown in Fig. 2
inside the box colored green. The PPF loop increases
the overall input transconductance but with a little loss
of phase margin (PM), as it generates a non-dominate
pole at node E of the SBDST OTA. Therefore, a small
compensation capacitor CC is used between the drain
of NI5b and the output node.

(b)

In addition to the improvement of transconductance,
output impedance is also enhanced using three current mirrors at the output of the circuit. Among these
mirrors, two are highly-effective FVF current mirrors
[5] with a current gain factor of 1.25, and one is a self-

Figure 3: (a) Input core’s half circuit of SBDST OTA, (b)
half circuit small signal equivalent model
68

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

the transistor PI3a gate voltage is VD. The drain terminal
of PI1a is also connected to point D, so its drain voltage
is also VD. A constant DC source Ib made by transistors
NI5a-NI5b biases the transistor PI1a. As a result, zero AC
small signal current passes through the input transistor
PI1a [31].

As the circuit is symmetry, therefore the input core
transconductance is given as:

G m, input core =

G m, effective |SBDST = K1

The small signal current through transistor PI1a at node
D is given by:

VC  VD   0
ro1,a

VC  VE 
ro 2,a

(9)

given by K1 =

UGF =

(10)

g

m1, a

 g mb1,a VC   g mb1,aVin

 VC 

 g mb1,aVin

g

m1, a

 g mb1,a 

		

m1, a

 g mb1,a 

g m2, N

G m, effective |SBDST
2πCL

=

K1 2g mb1
1-α  2πCL

m 2, a

(14)

AV|SBDST = Gm,effective|SBDST.Rout
⇒ AV|SBDST = K 2g mb1 [(gm7,bP ro7,bP ro5,bP)||(gm6,bN ro6,bN ro7,bN)] (22)

 g mb 2,a  (15)

1-α 
3.3 Stability analysis
1

The proposed SBDST OTA introduces a dominant pole
(р1) at the output node owing to its capacitive load and
output impedance hence, its frequency is not influenced by the PPF loop and is given as:

(16)

Putting the above relations into Eq. (15), then the value
of Io can be simplified as:

I o  2 g mb1,aVin 				

(21)

Hence, the proposed SBDST OTA’s overall voltage gain
is given by:

Since the transistors PI1a and PI2a are identical, therefore
gm1,a = gm2,a and gmb1,a = gmb2,a 		

(20)

The SBDST OTA’s open loop voltage gain is obtained by
multiplying the circuit’s output impedance and effective transconductance. The entire circuit’s output impedance at the output node is provided by:
Rout = (gm7,bP ro7,bP ro5,bP)||(gm6,bN ro6,bN ro7,bN)

g

.

(12)

Putting the value of VC from Eq. (13) to Eq. (14) and solving it in terms of Vin, then the equation becomes:

g

g m4, N

3.2 Voltage gain

(13)

I o  g mb 2,aVin  VC  g m 2,a  g mb 2,a 

I o  g mb 2,aVin 

g m2, N

and α =

(11)

Eq. (10) can be written as:

g mb1,aVin

g m3, N

Due to its enhanced effective transconductance value,
the proposed OTA provides a significantly higher UGF
than the traditional BDST OTA, as shown by expression (20).

So, Eq. (9) can be approximated as:

⇒

(19)

The UGF of the SBDST OTA is given by:

Neglecting the output resistance term from both equations since its value is very high.

g m1,a VC   g mb1,a Vin  VC   0

2g mb1
		
1-α 

where K1 and α are the aspect ratios of the transistors

and the output current Io contributed by PI2a is expressed as:

I o  g m 2,a  VC   g mb 2,a Vin  VC  

(18)

Considering the current gain K1 = 2 of the non-linear
current mirror together with the partial positive feedback technique employing the transistors N4a, and N4b
in the input core, the effective overall transconductance of the proposed SBDST OTA is provided by:

And all the AC signal current of PI3a is the output AC
small signal current Io which flows through the transistor PI2a [12, 26,31,32]. The effective overall transconductance of the proposed OTA is calculated from the
following equations,

g m1,a  VC   g mb1,a  Vin  VC  

Io
= 2g mb1 		
Vin

p1 =

(17)
69

1
			
R out  CC +CL 

(23)

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

where CC is the small compensation capacitor and CL is
the load capacitance.

comes demonstrate that the open loop DC gain, UGF,
and phase margin of the proposed SBDST OTA are
72.356 dB, 18.7057 kHz, and 61.3255º respectively. This
result of the proposed OTA exposes that the improvement in DC gain is 2.26 times and in UGF is 11.42 times
than the conventional circuit, with a little loss of phase
margin. A compensation capacitor of value, CC = 0.4pF
is used in the SBDST OTA, to increase its phase margin
above 60º.

The PPF technique employed in the input core causes
the non-dominant pole (P2) at the drain terminal of
N2,a/b to shifts towards a lower value, and its value given
in [12], is expressed as:

p2 =

g

m2,N

 g m4,N 
CP

			

(24)

The overall effective input core transconductance is exposed in Fig. 5, the proposed SBDST OTA accomplishes
a significantly greater effective input core transconductance of 1.76μS compared to 158.5nS of conventional bulk-driven OTA.

where CP indicates the parasitic capacitance at the
above-mentioned drain terminal node. This node has
a higher impedance due to PPF action. The lower secondary pole value in (24) limits the maximum possible
UGF. To ensure a stable phase margin a small compensation capacitor CC is placed between the drain of NI5b
and the high-impedance output node.

4 Simulation results
Using 180nm CMOS process technology, the traditional BDST and proposed SBDST OTAs are driven by only
0.5V supply for a load capacitor of 15pF and are simulated in the Cadence Virtuoso simulator. The bias current Ib of the BD-FVF pair in SBDST OTA is fixed to 10nA
and the total stand-by-current under the sub-threshold
region of operation is 124nA while that of conventional
OTA is 100nA, and the reference temperature for both
is 27ºC. The bias current used for biasing the high-performance FVF current mirror is 1.2nA. In the design, all
the MOSFETs have an optimum value of aspect ratio to
lower the influence of channel length modulation and
input referred noise of the circuit. Additionally, the bias
voltage Vb1 has been chosen properly to bias the transistor N2,a/b in the triode region, so that the combination
of transistors (N1a-3a – N1b-3b) works as a non-linear mirror.

Figure 5: Effective input core transconductance of
BDST OTA and SBDST OTA
One of the most crucial factors of an OTA is noise, it is an
undesired signal that frequently combines with the desired signal as a result of fluctuations in the power supply or component mismatches, producing unwanted
output. In addition to this, the thermal, as well as flicker
noise of MOS transistors itself, adds to the overall noise
density. Since the range of biosignals is 10mHz ≤ fbio ≤
1kHz, hence the flicker noise predominates more in bi-

Figure 4: Simulated AC plot of BDST OTA and SBDST
OTA
Figure 6: Plot of input referred noise voltage of BDST
OTA and SBDST OTA

Figure 4 shows the AC responses of conventional BDST
OTA and proposed SBDST OTA, the simulation out70

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

omedical applications. So, the design of the OTA circuit
must assure minimum input-referred output noise for
biomedical applications. Figure 6 highlights the inputreferred noise produced at the input pair terminals of
the proposed and conventional OTA, the SBDST OTA
and BDST OTA are found to have input-referred noise
(IRN) values of 0.959 μV/√Hz and 1.347 μV/√Hz, respectively, at 1 kHz. Its value is less in the proposed SBDST
OTA due to the enhancement in the overall effective
transconductance of the input pair.

respectively on the non-inverting input terminal in
Fig. 8 at 250 Hz frequency. The simulation’s outputs are
revealed in Fig. 10(a) and (b), the result displays that
the proposed OTA provides (12.55 mV−491.3 mV) and
(12.56 mV−399.37 mV) of output signal swing respectively. The output voltage swing in response to a sinusoidal transient is nearly rail-to-rail.

The PSRR ± and CMRR values must be very large to
reject unwanted signals which are generated by variations in the power supply, these unwanted signals are
common to both inputs. Figure 7 shows the result of
CMRR and PSRR ± of the SBDST OTA, it is found that the
proposed SBDST OTA at 1 mHz provides a high CMRR,
PSRR+, and PSRR− of values 161.48 dB, 86.17 dB, and
69.22 dB respectively.
Figure 9: Large-signal pulse response to 0.5Vpp at 250
Hz square wave for proposed SBDST OTA
(a)

Figure 7: CMRR, PSRR (+/−) of proposed SBDST OTA

(b)

Figure 8 shows a unity gain closed loop structure of the
suggested SBDST OTAs by shorting its inverting input
to output to achieve the transient response of largesignals. The output response is highlighted in Fig. 9 for
a 15pF capacitive load when a step input signal of 0.5V
peak-to-peak voltage (Vpp) at 250Hz frequency is applied at the non-inverting input of OTA. It is found that
the value of an average slew rate of the SBDST OTA is
2.07 V/ms.

Figure 10: Sinusoidal transient response of the proposed SBDST OTA for (a) Vin,pp = 0.5V with Vcm= 0.25V, (b)
Vin,pp = 0.4V with Vcm= 0.2V
The proposed OTA’s input common-mode range (ICMR)
is evaluated by performing its DC sweep analysis in a
non-inverting voltage buffer configuration with a
15pF capacitive load, and the simulation’s output is exposed in Fig. 11(a), and the variation of error voltage
(Vout-Vin) over the whole input (0 to VDD) voltage range
is displayed in Fig. 11(b). It has been found that the error voltage generated at 0V input is 12.63 mV, while at
0.5V input is 8.8 mV only. Thus, it is ensured from the
DC sweep results revealed in Fig. 11(a) and (b), that
the proposed SBDST OTA is linear over a wide range of
ICMR.

Figure 8: Unity gain configuration of the SBDST OTA
Sinusoidal transient response is evaluated by applying two sinusoidal input signals of 0.5Vpp and 0.4Vpp
with common-mode voltages (Vcm) of 0.25V and 0.2V,
71

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

(a)

(a)

(b)

(b)

(c)
Figure 11: (a) DC sweep for ICMR of the SBDST OTA, (b)
Error voltage (Vout - Vin) in DC sweep

(d)

Figure 12: Plot of THD against amplitude for SBDST
OTA

Figure 13: Simulation results of Monte Carlo iteration
of (a) DC gain, (b) PM, (c) UGF, (d) total power consumption for 300 samples

A 250 Hz sine wave input signal with varying peak-topeak amplitudes from 50mV to 500mV has been used
to assess the nonlinearity of the SBDST OTA in a unit
gain configuration. The simulation result is highlighted
in figure 12. At 200 mV (pp), the total harmonic distortion is -60.91dB, and up to 466 mV(pp) amplitude of the
input sine wave, the SBDST OTA ensures that the THD
value is less than -40dB.

Table 1: Performance result of proposed SBDST OTA
under Monte Carlo simulation using 300 samples
Parameters
Open loop DC gain (dB)
Phase margin (degree)
UGF (kHz)
CMRR (dB) @ 1mHz
PSRR+ (dB) @ 1mHz
PSRR− (dB) @ 1mHz
SR(av) (V/ms)
IRN (μV/Hz0.5) at 1kHz
Total current (nA)
Total Power (nW)

The robustness of the OTA is determined by the deviation of its performance parameters from process and
mismatch. For 300 samples, Monte Carlo simulations
are utilized to assess the proposed SBDST OTA’s robustness. The statistical data of such analysis is shown in
Fig. 13 in the form of a histogram.

Mean (μ)
72.41
61.94
18.84
151.7
86.31
69.28
2.078
0.96
124.5
62.27

SD (σ)
726m
531.8m
1.462k
10.66
1.399
883.6m
140.1m
6.999n
13.55n
6.773n

Integrated circuits (ICs) must be so designed by manufacturers that after fabrication, PVT (process, voltage,
and temperature) fluctuations have no effect on ICs.
Deviations in manufacturing conditions like dopant
concentrations, temperature, pressure, and variations
in the semiconductor fabrication process cause “process

In addition to this, Monte Carlo simulations of the
whole parameters of the SBDST OTA have been tabularized in Table 1. Table 1’s outcomes demonstrate that
the proposed OTA delivers low standard deviation (SD)
for all the performance parameters and hence is insensitive to process variations.
72

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

variation”. The other key factors for process variation are
variations in metal thickness, oxide thickness, UV light
wavelength, faults in the manufacturing process, and
variations in transistors characteristics [12]. There may
be a chance of voltage fluctuation also in some circumstances, so the proposed OTA’s simulation results should
also be verified by varying the supply voltage.

Figure 15 depicts the layout of the single-stage SBDST
OTA. The proposed OTA takes up (76 x 81) μm2 area,
including the area of the compensation capacitor, and
the post-layout outcome of the AC response of the
SBDST OTA is exposed in Fig. 16. The simulation outcomes of post-layout express that the open loop DC
gain, UGF, and phase margin are 72.281 dB, 18.329 kHz,
and 61.635º respectively. The results of pre-layout and
post-layout AC responses expose that there is a high
degree of proximity. This proximity supports the usability and design of this SBDST OTA.

Figure 14 shows the five process corners (TT, FF, SS,
FNSP, and SNFP) effects at 27 ºC on the gain and phase
margin of the proposed SBDST circuit. The SS corner
has the largest DC gain, measuring 76 dB, and the FNSP
corner has the lowest DC gain, measuring 65.59 dB.
To check the sensitivity of the proposed OTA against
the variations of PVT, corner analysis for five different
corners at temperatures (−14 ºC, 27 ºC, and 60 ºC) has
been done, and the performance of OTA has been also
verified by varying ±10% supply voltage. The simulation results of all the performance parameters against
fluctuations of PVT are tabulated in Table 2 and Table 3.
(a)

(b)
Figure 15: Proposed SBDST OTA’s layout

Figure 14: Process corners effect on DC gain and phase
margin at room temperature
Table 2: Simulation results on the variation of supply voltage
Parameters
Open loop DC gain (dB)
Phase margin (degree)
UGF (kHz)
CMRR (dB) @ 1mHz
PSRR+ (dB) @ 1mHz
PSRR− (dB) @ 1mHz
IRN (μV/Hz0.5) at 1kHz
Total current (nA)
Total Power (nW)

VDD − 10%
63.33
68.43
7.4
126.88
71.13
61.66
0.96
102.9
46.3

VDD + 10%
77.64
56.83
27.46
115.1
94.4
73.28
0.94
144
79.2

Figure 16: Post-layout AC plot of BDST OTA

73

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

Table 3: Simulation results of the performance of SBDST OTA on variations of process and temperature
Parameters
Open loop DC gain (dB)
Phase margin (degree)
UGF (kHz)
CMRR (dB) @ 1mHz
PSRR+ (dB) @ 1mHz
PSRR− (dB) @ 1mHz
SR(av) (V/ms)
IRN (μV/Hz0.5) at 1kHz
Total Power (nW)
Parameters
Open loop DC gain (dB)
Phase margin (degree)
UGF (kHz)
CMRR (dB) @ 1mHz
PSRR+ (dB) @ 1mHz
PSRR− (dB) @ 1mHz
SR(av) (V/ms)
IRN (μV/Hz0.5) at 1kHz
Total Power (nW)
Parameters
Open loop DC gain (dB)
Phase margin (degree)
UGF (kHz)
CMRR (dB) @ 1mHz
PSRR+ (dB) @ 1mHz
PSRR− (dB) @ 1mHz
SR(av) (V/ms)
IRN (μV/Hz0.5) at 1kHz
Total Power (nW)

Different corners at temperature −14 ºC
TT
FF
SS
79.41
76.17
80.68
53.37
54.04
53.29
9.08
23.57
2.68
135.18
131.26
149.96
98.44
91.51
102.45
73.09
70.34
74.68
1.22
2.5
0.58
0.93
0.81
1.31
18.14
58.99
4.83
Different corners at temperature 27 ºC
TT
FF
SS
72.36
68.06
76
61.96
65.42
59.69
19.11
34.77
8.47
161.49
142.19
131.91
86.17
79.86
93.27
69.21
65.56
72.41
2.06
3.66
1.12
0.95
0.92
1.08
62.81
162.98
21.79
Different corners at temperature 60 ºC
TT
FF
SS
65.35
60.48
69.72
69.83
73.96
65.86
23.15
33.87
13.87
133.58
139.19
124.83
76.11
70.17
82.49
63.69
59.27
67.59
2.74
4.32
1.66
1.03
1
1.09
137.94
316.13
55.62

FOM La 

5 Performance comparison and
discussions

UGF  MHz   CL  pF 
IT   A 

79.08
58.79
5.21
109.27
91.49
72.24
1.58
1.03
10.85

FNSP
73.63
54.1
10.16
133.98
85.1
68.85
0.84
0.88
29.58

SNFP
75.99
61.28
15.41
107.02
89.12
71.72
2.91
0.94
42.62

FNSP
65.59
68.07
14.39
136.82
74.44
63.69
1.41
0.96
88.66

SNFP
70.18
65.74
25.77
102.66
81.76
67.47
3.9
0.96
102.48

FNSP
14.08
102.17
0.093
137.38
19.35
14.08
1.8
1.21
175.65

SRav V /  s   CL  pF 
IT   A 

(26)

The proposed SBDST OTA performance parameters are
compared with some of the other recent BD OTAs and
reported in Table 5. According to Table 5, the proposed
SBDST OTA has offered the largest DC gain, PSRR+/−
among others and also has maximum CMRR except
that of [19] only. The proposed SBDST OTA’s large-signal response (FOMLa) is comparable to only [14] in comparison to the other remaining OTAs given in Table 5,
but it has provided the highest small-signal response
(FOMSm) as compared to other reported OTAs, with the
exception of [20].

Table 4 lists the performance parameters of proposed
OTAs. To verify the overall performance of OTA in terms
of the responses to small- and large-signals, two popular figures of Merit (FOMSm, FOMLa) are specified in [22,
27, 28], and are given in equations (25) and (26) respectively.

FOM Sm 

SNFP

(25)

74

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

Table 4: Proposed SBDST OTA performance parameters
Parameters
Power supply (V)
Load capacitance (pF)
Technology
DC gain
Phase margin (º)
UGF (kHz)
CMRR (dB) @ 1mHz
PSRR+ (dB) @ 1mHz
PSRR− (dB) @ 1mHz
IRN @ 1kHz(μV/Hz0.5)
SR average (V/ms)
Total current (nA)
Total power (nW)
FOMSm
FOMLa

weak-inversion region, powered by 0.5V of power supply. The proposed architecture of the amplifier employs a BD-FVF that is based on an adaptively biased
differential input pair operating in the class-AB mode
to improve dynamic current and unity gain frequency.
Additionally, it employs a partial positive feedback
technique in the differential pair’s core to increase the
gain of the circuit. Further, the gain of the circuit is increased by using a low-power, high-performance current mirror load based on FVF at the amplifier’s output.
The suggested OTA’s simulation results show that the
amplifier uses just 62.82nW of power and has a DC gain
of 72.35 dB, a phase margin of 61.32º, and a UGF of 18.7
kHz. For a 250 Hz input sine wave of 200 mV (pp), the
SBDST OTA in its unity gain configuration offers -60.91
dB total harmonic distortion. The obtained outcomes
of the amplifier ensured that the proposed SBDST OTA
is appropriate for biomedical signal processing, audio
signal processing, and low-frequency sensors.

SBDST OTA
0.5
15
0.18μm
72.35
61.33
18.7
161.48
86.17
69.22
0.95
2.07
125.64
62.82
2.23
0.247

7 Conflict of Interest

6 Conclusions

The authors declare that they have no conflicts of interest.

This article’s work presents an enhanced bulk-driven
single-stage architecture of an OTA functioning in the

Table 5: Proposed SBDST OTA and previously reported BD OTAs performance differences at 0.18μm technology
Parameters

[14]

[11]

[15]

[16]

[16]

[17]

[19]

[29]

[20]

2015

2017

2017

OTA1

OTA2

2018

2019

2021

2022

2018
15
0.6
93
61.9

2018
15
0.6
78.41
60.76

50
0.8
87
44.3

50
0.7
89.07
71.35

30
0.4
60
60

0.0024
122 @
1Hz
62.8@
1Hz

0.00773
119@
1Hz
61.8

15
0.6
74
71@
0.1Hz
0.0182
201.8@
10Hz
77.4@
10Hz

60.23
0.779

0.00157 0.079

0.00207

200
140
0.27
0.39
−

125.64
62.82
2.23
0.247
6,156

CL(pF)
Power supply (V)
Phase margin (º)
(DC gain)a (dB)

15
0.5
68.9
67.8

15
0.5
54
70.4

12
0.6
62.45
61.5

UGF (MHz)
CMRRa (dB)

0.003
−

0.009
106 @
1Hz
70@
1Hz

0.03015
−

PSRR − (dB)
−
IRNb @ (μV/HZ0.5) 0.56

−
2.53

−
2.454

SR average (V/μs) 0.84/

0.967

−
6.25 @
0.1Hz
0.0553

@ 1Hz
−
2.97

1.15

1.404

3.5

−
0.25@
0.1Hz
0.0066

125.5
64
1.11
116
−

275
165
1.31
3.01
−

50.63
30.38
0.711
340.7
6620

115
69
1.008
183.13
7406

62,000
49,600
1.17
2.82
−

240
144
1.13
0.412
16,000

PSRRa+ (dB)

−

a

0.59
52
26
0.94
0.24
52000

67.9

1.45
−
−
−
−

ThisWork
2023
15
0.5
61.33
72.35

0.00107 0.007
138.5
85

0.0187
161.48

77.08

86.17

76
−
−

69.22
0.95

c

Total current (nA)
Total power(nW)
FOMSm
FOMLa
Area (μm2)

a: at 1mHz, b: at 1kHz, c: V/ms
75

60
24
3.5
39.5
7900

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

8 References
1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

R. R. Harrison, C. Charles, “A low-power low-noise
CMOS amplifier for neural recording applications”, IEEE J. Solid State Circuit, vol. 38, no.6, pp.
958–965, 2003.
https://doi.org/10.1109/JSSC.2003.811979
S. Chatterjee, Y. Tsividis, P. Kinget, “0.5-V analog
circuit techniques and their applications in OTA
and filter design”, IEEE J. Solid State Circuit, vol. 40,
no. 12, pp. 2372–2387, 2005.
https://doi.org/10.1109/JSSC.2005.856280
A.P. Chandrakasan, N. Verma and D. C. Daly, “Ultralow-power electronics for biomedical applications”, Annu. Rev. Biomed. Eng, vol.10, pp. 247274, 2008. https://www.annualreviews.org/doi/
abs/10.1146/annurev.bioeng.10.061807.160547
N. Suda, P.V. Nishanth, D. Basak, D. Sharma, R.P.
Paily, “A 0.5-V low power analog front-end for
heart-rate detector”, Analog Integr. Circuits Signal
Process, vol. 81, no. 2, pp. 417–430, 2014.
https://doi.org/10.1007/s10470-014-0402-1
Rakesh Kumar Pandey, Vijaya Bhadauria and V.K.
Singh, “Rail-to-Rail, Reconfigurable Subthreshold
Bulk-Driven OTA Based on Flipped Voltage Follower for Biomedical Applications”, Pub. in: 2021,
IEEE International Conference on Technology, Research, and Innovation for Betterment of Society
(TRIBES), 2021.
https://doi.org/10.1109/TRIBES52498.2021.9751665
S. Yan and E. Sanchez-Sinencio, “Low voltage
analog circuit design techniques: A tutorial”,
IEICE Trans. Analog Integr. Circuits Syst., E83-A, pp.
179–196, 2000. https://people.engr.tamu.edu/ssanchez/607-lvtutorial-2000.pdf
S. S. Rajput, & S. S. Jamuar, “Low voltage analog
circuit design techniques”, IEEE Circuits and Systems Magazine, vol. 2, no. 1, pp. 24–42, 2002.
https://doi.org/10.1109/MCAS.2002.999703
R. G. Carvajal, J. Ramı´rez-Angulo, A. J. Lo´pezMartı´n, A. Torralba, J. A. G. Gala´n, A. Carlosena,
& F. M. Chavero, “The flipped voltage follower: A
useful cell for low voltage low-power circuit design”, IEEE Transactions on Circuits and Systems
I: Regular Papers, vol. 52, no. 7, pp. 1276–1291,
2005.
https://doi.org/10.1109/TCSI.2005.851387
H. C. Ferreira, T. C. Pimenta and R. L. Moreno, “An
ultra-low-voltage ultra-low-power weak inversion
composite MOS transistor concept and applications,” IEICE Trans. Electron, E91-C, pp. 662–665,
2008.
https://doi.org/10.1093/ietele/e91-c.4.662
M. O. Trakimas, and S. Sonkusale, “A 0.5 V bulk-input OTA with improved common-mode feedback
for low-frequency filtering applications”, Analog

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

76

Integrated Circuits and Signal Processing, vol. 59,
no.1, pp. 83-89, 2009.
https://doi.org/10.1007/s10470-008-9236-z
T. Sharan, and V. Bhadauria, “Fully differential,
bulk-driven, class AB, subthreshold OTA with enhanced slew rates and gain”, Journal of Circuits
System and Computers, vol. 26, no. 1, 1750001,
2017.
https://doi.org/10.1142/S0218126617500013
S. Ghosh, and V, Bhadauria, “An ultra-low-power
bulk-driven subthreshold super class-AB rail-torail CMOS OTA with enhanced small and large
signal performance suitable for large capacitive
loads”, Microelectronics Journal, 115, 105208,2021,
https://doi.org/10.1016/j.mejo.2021.105208
L.H. Ferreira, T.C. Pimenta, R.L. Moreno, “An ultralow-voltage ultra-low-power CMOS miller OTA
with rail-to-rail input/output swing”, IEEE Transactions on Circuits and Systems II: Express Briefs,
vol.54, no. 10, pp. 843–847, 2007.
https://doi.org/10.1093/ietele/e91-c.4.662
X. Zhao, H. Fang, T. Ling, J. Xu, “Transconductance
improvement method for low-voltage bulk-driven input stage”, Integration, vol. 49 pp. 98–103,
2015.
https://doi.org/10.1016/j.vlsi.2014.11.005
M. Akbari, O. Hashemipour, M. H. Moaiyeri, and A.
Aghajani, “An efficient approach to enhance bulkdriven amplifiers”, Analog Integrated Circuits and
Signal Processing, vol. 92, no. 3, pp.489–499, 2017.
https://doi.org/10.1007/s10470-017-1010-7
T. Sharan, P. Chetri, and V. Bhadauria, “Ultra-lowpower bulk-driven fully differential subthreshold
OTAs with partial positive feedback for Gm-C filters”, Analog Integr. Circuits Signal Process, vol. 94,
no. 3, pp. 427–447, 2018.
https://doi.org/10.1007/s10470-017-1065-5
X. Zhao, Q. Zhang, Y. Wang, and L. Dong, “An approach to essentially improve current efficiency
for bulk-driven OTA”, AEU-International Journal
of Electronics and Communications, vol. 86, pp.
103–107, 2018.
https://doi.org/10.1016/j.aeue.2018.01.028
T. Kulej, and F. Khateb, “0.4-V bulk-driven differential-difference amplifier”, Microelectron. J., vol. 46,
no. 5, pp. 362–369, 2015.
https://doi.org/10.1016/j.mejo.2015.02.009
A. Ghaemnia, and O. Hashemipour, “An ultra-low
power high gain CMOS OTA for biomedical applications”, Analog Integrated Circuits and Signal
Processing, vol. 99 no. 3, pp. 529–537, 2019.
https://doi.org/10.1007/s10470-019-01438-6
Akbari, M.; Hussein, S.M.; Hashim, Y.; Tang, K.-T.
0.4-V Tail-Less Quasi-Two-Stage OTA Using a Novel Self-Biasing Trans-conductance Cell. IEEE Transactions on Circuits and Systems I: Regular Papers

R. K. Pandey et al.; Informacije Midem, Vol. 53, No. 2(2023), 65 – 77

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

2022, 69, 2805–2818.
https://doi.org/10.1109/TCSI.2022.3161964
Kulej, T., Khateb, F., Design and implementation
of sub 0.5-V OTAs in 0.18 um CMOS, International
Journal of Circuit Theory and Applications, vol. 46,
no.6, pp. 1129-1143, 2018.
https://doi.org/10.1002/cta.2465
Kulej, T.; Khateb, F. A Compact 0.3-V Class AB BulkDriven OTA, IEEE Transactions on Very Large Scale
Integration (VLSI) Sys-tems 2020, 28, 224–232.
https://doi.org/10.1109/TVLSI.2019.2937206.
R. Wang and R. Harajani, “Partial posistive feedback for gain enhancement of CMOS OTAs”,
Analog Integr. Circuits Signal Process, vol.8, pp.
21–35, 1995.
https://doi.org/10.1007/978-1-4615-2283-6_3
J.M. Carrilo, G. Torelli, R. Perez-Aloe, J.F. DuqueCarrillo, “1-V rail-to-rail CMOS OpAmp with improved bulk-driven input,”, IEEE J. Solid State Circ.,
vol. 42, no.3, pp.508–517, 2007.
https://doi.org/10.1109/JSSC.2006.891717
L.H. Ferreira, S.R. Sonkusale, “A 60-dB gain OTA operating at 0.25-V power supply in 130-nm digital
CMOS process”, IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 61, no. 6, pp. 1609–
1617, 2014.
https://doi.org/10.1109/TCSI.2013.2289413
H. Veldandi, and R.A. Shaik, “An Ultra-low-voltage
bulk-driven analog voltage buffer with rail-to-rail
input/output range”, Circuits, Systems, and Signal
Processing, vol. 36, no. 12, pp. 4886–4907, 2017.
https://doi.org/10.1007/s00034-017-0663-x
Ballo, A.; Grasso, A.D.; Pennisi, S. 0.4-V, 81.3-NA
Bulk-Driven Single-Stage CMOS OTA with Enhanced Transconductance. Electronics 2022, 11,
2704.
https://doi.org/10.3390/electronics11172704
Andrea Ballo , Alfio Dario Grasso and Salvatore
Pennisi, A 0.6 V Bulk-Driven Class-AB Two- Stage
OTA with Non-Tailed Differential Pair J. Low Power
Electron. Appl. 2023, 13, 24.
https://doi.org/10.3390/jlpea13020024
S. Ghosh, S. Tripathi, V. Bhadauria, “A low harmonic
high gain subthreshold flipped voltage followerbased bulk-driven OTA suitable for low-frequency
applications”, in: D. Harvey, H. Kar, S. Verma, V.
Bhadauria (Eds.), Advances in VLSI, Communication, and Signal Processing. Lecture Notes in Electrical Engineering, vol. 683, Springer, Singapore,
2021,
https://doi.org/10.1007/978-981-15-6840-4_ 38
S. Ghosh, and V, Bhadauria, “ High current efficiency single-stage bulk driven subthreshold-biased
class-AB OTAs with enhanced transconductance
and slew rate for large capacitive loads”, Analog

31.

32.

Integrated Circuits and Signal Processing, 2021.
https://doi.org/10.1007/s10470-021-01929-5
J.A. Galan, A.J. Lopez-Martin, R.G. Carvajal, et al.,
“Super class-AB OTAs with adaptive biasing and
dynamic output current scaling”, IEEE Trans. Circuits Syst., I Reg. Pap., vol. 54 pp. 449–457, 2007.
https://doi.org/10.1109/TCSI.2006.887639
X. Zhao, Q. Zhang, and M. Deng, “Super class-AB
bulk driven OTAs with improved slew rate”, Electronics Letters, vol.51, no. 19, pp. 1488–1489, 2015.
https://doi.org/10.1049/el.2015.1776

Copyright © 2023 by the Authors.
This is an open access article distributed under the Creative Commons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted
use, distribution, and reproduction in any medium,
provided the original work is properly cited.
Arrived: 23. 04. 2023
Accepted: 09. 07. 2023

77

78

Original scientific paper
https://doi.org/10.33180/InfMIDEM2023.203

Journal of Microelectronics,
Electronic Components and Materials
Vol. 53, No. 2(2023), 79 – 86

An Energy-efficient and Accuracy-adjustable
bfloat16 Multiplier
Ratko Pilipović1, Patricio Bulić1, Uroš Lotrič1
Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia

1

Abstract: The approximate multipliers have been extensively used in neural network inference, but due to the relatively large
error, they have yet to be successfully deployed in neural network learning. Recently, the bfloat16 format has emerged as a viable
number representation for neural networks. This paper proposes a novel approximate bfloat16 multiplier with on-the-fly adjustable
accuracy for energy-efficient learning in deep neural networks. The size of the proposed multiplier is only 62% of the size of the exact
bfloat16 multiplier. Furthermore, its energy footprint is up to five times smaller than the footprint of the exact bfloat16 multiplier. We
demonstrate the advantages of the proposed multiplier in deep neural network learning, where we successfully train the ResNet-20
network on the CIFAR-10 dataset from scratch.
Keywords: approximate computing; deep neural networks; energy-efficient processing; bfloat16 multiplier

Energijsko učinkovit približni množilnik v zapisu
bfloat16 z nastavljivo natančnostjo
Izvleček: Približni množilniki so se izkazali za zelo primerne pri sklepanju z nevronskimi mrežami, vendar zaradi relativno velike
napake še niso bili uspešno uporabljeni pri učenju globokih nevronskih mrež. Pred kratkim se je za predstavitev realnih števil v
nevronskih mrežah začel uveljavljati zapis bfloat16. V članku predlagamo nov približni množilnik v zapisu bfloat16 s sprotno nastavljivo
natančnostjo za energetsko učinkovito učenje v globokih nevronskih mrežah. Velikost predlaganega množilnika je samo 62 % velikosti
natančnega množilnika v zapisu bfloat16. Poleg tega je njegov energijski odtis do petkrat manjši od odtisa natančnega množilnika
bfloat16. Uporabnost predlaganega množilnika predstavimo na primeru učenja globokih nevronskih mrež, kjer uspešno naučimo
mrežo ResNet-20 na naboru podatkov CIFAR-10.
Ključne besede: približno računanje; globoke nevronske mreže; energijsko učinkovito računanje; množilnik v zapisu bfloat16
* Corresponding Author’s e-mail: patricio.bulic@fri.uni-lj.si

1 Introduction

between design efficiency and accuracy. Efficient designs come at the cost of accuracy reduction and vice
versa. Nevertheless, approximate computing perfectly
fits neural networks, which, to a certain extent, tolerate
or even adapt to an error caused by noisy input data
or erroneous computation. Widely used approaches in
approximate computing are precision scaling and approximate arithmetic.

Neural network capability of learning from data and
generalising the gained knowledge makes them a very
popular modelling tool in various application fields.
The popularity growth in the last years can be attributed to the deep models, which pose considerable
requirements to the processing hardware. Thus, new
hardware solutions are being developed continuously
to keep the processing hardware on par with the computing demands.

In precision scaling [1], we use fewer bits to represent
numeric values rather than executing all the required
mathematical operations with the full representation.
Several standards for the floating-point presentation
recently appeared: IEEE 754-2019 for half-precision

Approximate computing has emerged as a popular
strategy for area- and energy-efficient circuit design,
where the challenge is to achieve the best trade-off

How to cite:
R. Pilipović et al., “An Energy-efficient and Accuracy-adjustable bfloat16 Multiplier", Inf. Midem-J. Microelectron. Electron. Compon. Mater.,
Vol. 53, No. 2(2023), pp. 79–86
79

R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86

[2], posit format with dynamic range and mantissa [3]
and Google’s bfloat16, targeting the machine-learning
workloads [4]. Storing the numeric values with fewer
bits reduces the size of arithmetic circuits and their
complexity. Besides, it saves on-chip memory and reduces the amount of data that must be transferred, improving speed.

tions. In [17], the authors proposed a 32-bit iterative
approximate floating-point multiplier based on twodimensional pseudo-Booth encoding. The accuracy of
the proposed multiplier is tuned by three parameters:
iteration, encoder’s radix, and word length after truncation. To our knowledge, the only state-of-the-art approximate 16-bit bfloat multiplier is proposed in [15].
This variable-precision approximate multiplier uses the
bfloat16 format for operand representation and the
intermediate conversion of product exponent to the
posit encoding to control the mantissa multiplication
accuracy. All these multipliers were used only in the
inference phase in deep learning models and in imageprocessing applications, where neglectable degradation in accuracy was observed.

Multiplication represents a ubiquitous arithmetic operation in neural network processing. Moreover, multipliers are complex circuits that importantly affect a processing hardware’s area and energy footprint. Hence,
the applications can benefit in terms of power and area
consumption by replacing the exact multiplier with an
approximate one. The approximate multiplier design
can originate in the logarithmic approximation of numerical values [5-8] or non-logarithmic approaches,
like discarding some stages in Booth multipliers [9-11].
Although most approximate multipliers are designed
for fixed-point arithmetic, many floating-point designs,
capable of presenting numerical values in a wider
range, have appeared lately.

A design that would suit most applications should be
able to multiply with the required accuracy, not excluding exact computation, and accept a wide range of numeric values. In this paper, we propose an efficient and
accuracy-adjustable approximate 16-bit multiplier for
operands presented in the bfloat16 format, which does
not require any hardware reconfiguration to adapt accuracy and demonstrates its applicability in the neural
network inference and learning phases.

There have been several attempts to use approximate
integer multipliers in neural network learning [12-14].
The authors of these studies report that the learning
was successful, but they mainly worked with tiny neural networks. To the best of our knowledge, there has
yet to be a successful attempt to train large-scale neural networks using approximate multipliers. In neural
network learning, we need higher precision arithmetic,
so until now, neural networks have mainly been trained
using the exact floating-point multipliers [3], [15].

In the remainder of the paper, we first detail the proposed BFILM multiplier design. Section 3 shows the
hardware characteristics of the design and demonstrates the BFILM multiplier usability in neural network
inference and learning. Lastly, we conclude the paper
with the main findings.

Common to most of the existing designs is that their
accuracy can be adjusted at the design time. As such,
they can perfectly fit the targeting application but fail
for many others. However, many applications need adjustable accuracy during run time. In neural network
processing, for example, we can use lower accuracy
during the inference phase but need much higher accuracy during the learning phase. Moreover, some
parts of an application may still require exact multiplication. For such an application, it would be beneficial
to design a multiplier capable of handling all accuracy
requirements, thus avoiding putting a plethora of multipliers on a chip and not exploiting them simultaneously.

2 The design of BFILM multiplier
The proposed brain float iterative logarithmic multiplier (BFILM) operates on numerical values in the bfloat16
format. The advantage of representing the numerical
value 0 in the bfloat16 format is, that it keeps one sign
bit s(0) and the 8-bit exponent e(0) equal to the IEEE
754 single-precision floating-point format but shortens the mantissa m(0) to 7 bits. Thus, it enables using
tiny numerical values, important in the neural network
learning phase [18] for example. While the multiplier
determines the sign and the exponent exactly, it follows the idea of the approximate iterative logarithmic
multiplier to compute the mantissa. The number of
steps, which determine the accuracy of the multiplier,
can be changed on the fly.

Several precision-tuning 32-bit floating-point multipliers for deep neural network processing have recently
been proposed. The work [16] proposes the 32-bit
floating-point approximate PAM multiplier with runtime customisation, which can successfully replace
a single-precision floating-point multiplier in some
deep neural networks and image-processing applica-

Fig. 1 shows the structure of the BFILM multiplier, which
takes operands 01 and 02 to compute the approximate
product Papprox. The multiplier consists of a straightforward circuit for determining the sign of the product

80

R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86

and two loosely connected circuits for determining the
product’s exponent and mantissa.

An important component of the BFILM multiplier is
the approximate mantissa multiplier that relies on
the iterative logarithmic multiplier (ILM) [7]. Suppose
we have two non-negative 8-bit operands x and y, expressed as the sum of the leading bit and the residu-

2.1 The exponent circuitry
The exponent circuity in Fig. 1 incorporates two adders. We must add both operands’ exponents to get
the product’s exponent. However, the bfloat16 format
uses the offset-binary representation of the exponent,
with the zero offset being 127. To correctly code the
product’s exponent, we need an additional adder to
subtract the offset. The logic connected to the carry
input cin of the first adder covers the situations when
the product’s exponent must be normalised due to the
large approximate product Pa obtained from the mantissa multiplier.

k

x 2k x + rx and =
y 2 y + ry , which multiply to
um, =
the product





k

k

p  xy  x 2 y  ry  x 2 y  xry

(1)

k

 x 2 y  2kx ry  rx ry .
By summing up the first-order Taylor expansions of



log 2 x  k x  log 2 1  rx 2 k x



2.2 The mantissa circuitry





 k x  ln 1  rx 2 k x log 2 e 		

(2)

 k x  rx 2 k x log 2 e

The mantissa circuitry in Fig. 1 comprises the mantissa
multiplier and the mantissa normalizer. The mantissa
stores only the fractional bits, to which we must prepend the leading one to get an 8-bit fixed point unsigned number at the input to the mantissa multiplier.
The multiplication results is a product, given in 16-bit
unsigned fixed-point format with two integer bits
and 14 fractional bits, of which we take only the nine
most significant bits to the output Pa of the mantissa
multiplier. We form the product’s mantissa m(Papprox)
regarding the integer part of the output Pa. When it is
greater than one with Pa [8] set, we normalise the result
by shifting the radix point one place to the left. To do
so, we increment the product’s exponent and take the
middle seven bits Pa [7:1]. In all other cases, normalisation is unnecessary, and the product’s mantissa equals
the seven least significant bits Pa [6:0].

and log 2 y ≈ k y + ry 2
tion

−ky

log 2 e , we get the approxima-





 k k
k
log 2 p   k x  k y   2  x y  rx 2 y  ry 2kx log 2 e

  k x  k y   log 2 1  2




 kx  k y



r 2
x

ky



 ry 2kx 


(3)

By taking the antilogarithm of log2 p approximation, we
obtain an approximate product





 k  k 
k
 kx  k y  
1  2 x y rx 2 y  ry 2kx 


ky
 kx  k y 
kx
2
 rx 2  ry 2

pa  2



kx



ky

 2  rx 2  ry 2

(4)

kx

k

 x 2 y  ry 2kx
which equals equation (1) with the last term omitted. Thus, computing the product approximation pa
requires only two shifts and an addition, completely
avoiding multiplication of the term rxry.
The ILM core circuitry in Fig. 2 computes the approximate product and both residua. The leading one dek

k

tectors extract the leading one bits 2 x and 2 y and
their characteristic numbers kx and ky from operands
x and y. We need both leading one bit to compute
the residua and the characteristic numbers to do the
required shifts of the operand x and the residuum ry.
The truncated barrel shifters output only the nine most
significant bits required in further processing, thus importantly reducing their size and the size of the adder.

Figure 1: The circuitry of the 16-bit bfloat multiplier.

81

R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86

initial ILM step (I = 1), the multiplexers pass the operands
X and Y to the ILM core, while in the next ILM steps (I>1),
the multiplexers feed the ILM core with residua rx and ry
from the previous ILM step. The accumulator keeps the
approximation of the mantissa product, which is in each
ILM step increased by the value pa. To comply with the
circuitry presented in Fig. 1, the accumulator needs to
keep only the nine most significant bits.
At this point, we would like to emphasize that the proposed multiplier does not require any hardware reconfiguration if we want to perform more than one ILM
step. For example, when more ILM steps are required,
we only need to feed the residua rx and ry (Fig. 2) back
to the input of the ILM core as presented in Fig. 3. In
this case, the multiplexers choose what goes to the ILM
core: the new operands, X and Y, or the residua from
the previous iteration, rx and ry. In the actual implementation, of course, we must add registers at the input of
multiplexers, but these are not shown for simplicity.
Figure 2: The circuitry of the ILM core.

3 Results

The relative error of the product (p - pa)/p = rxry /p can be
as high as 25 %. To reduce it, we can iteratively repeat
the above procedure by multiplying residua rx and ry and
adding the result to the current approximation. The procedure can be repeated until at least one residuum becomes zero, thus achieving an error as small as necessary.

3.1 Hardware performance
We implement the multipliers in Verilog and synthesise
them to the SkyWater PDK 130 cell library using OpenLane [19-21]. The library consists of a 130 nm technology with an operating voltage of 1.8 V, and five metal
layers [22-23]. The timing constraints, used for all evaluated designs, specify clock-related parameters, which
affect synthesis and timing analysis. We set a clock signal with a period of 10 ns, hence not violating a critical
path. To evaluate the power, we use timing with a 100
MHz virtual clock (by definition, a virtual clock is a clock
that has no real source in the design and is commonly
used to specify delay constraints during static timing
analysis), load capacitance equal to 33.442 fF (PDK default) and supply voltage equal to 1.8 V.

The mantissa multiplier shown in Fig. 3 comprises the
ILM core, two multiplexers, and an accumulator to iteratively refine the approximate mantissa product Pa. In the

We analysed the hardware performance of the BFILM
multiplier in terms of power, area, delay, and powerdelay-product (PDP) and compare it with the exact
bfloat16 multiplier. Table 1 shows that the BFILM multiplier outperforms the exact multiplier in all hardware
metrics; its energy consumption estimated through
PDP is even more than five times smaller.
Table 1: The synthesis results for the examined multipliers.
Multiplier
exact bfloat16
BFILM

Figure 3: The circuitry of the approximate mantissa
multiplier.
82

Delay Power
[ns]
[uW]
2.89
869
1.67
298

Area
[um2]
6120
3796

PDP
[fJ]
2590
498

R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86

Table 2 compares hardware characteristics of the stateof-the-art variable-accuracy bfloat16 multipliers. The
results are given as relative values to the standard reference implementations of the exact bfloat16 multiplier.
The BFILM multiplier, with its very slim design, outperforms the recently proposed BFLP16-prop multiplier
[15] in all aspects.

These results suggest that the BFILM multiplier should
fit well with error-resilient applications where low-energy consumption is an important goal and where most
of the time the BFILM multiplier with a small number of
ILM steps could be used. An important feature of the
BFILM multiplier is that we can control the product accuracy by adjusting the number of ILM steps without
hardware modification, ultimately leading even to removing the exact multiplier from the circuitry.

Table 2: Comparison of the bfloat16 multipliers regarding
hardware gains relative to the exact bfloat16 multiplier.

3.2 Impact on neural network learning
Multiplier
exact bfloat16
BFLP16-prop [15]
BFILM

Delay
[%]
100
104
58

Power
[%]
100
58
33

Area
[%]
100
81
62

PDP
[%]
100
59
19

Convolutional neural networks achieve remarkable
performance in visual recognition tasks [24]. However,
the learning and inference of convolutional neural networks are computationally demanding tasks that involve many multiplications. Nevertheless, convolutional neural networks are error-tolerant models, making
them perfect candidates for employing approximate
multipliers. Therefore, we assess the influence of the
proposed multiplier on the performance of the inference and learning phases.

Since the BFILM multiplier does not require reconfiguration or additional hardware for more accurate processing,
the multiplier’s size (area) and power are preserved for an
arbitrary number of the ILM steps. Of course, with the additional ILM steps, it is necessary to observe that residua rx
and ry must be multiplied once or twice and added to the
final product. Therefore, in this case, the processing time
required to calculate the product increases linearly with
the number of the ILM steps and thus does also the energy consumption. We assess different configurations of the
BFILM multiplier in terms of delay, energy consumption
(PDP) and the mean relative error distance (MRED), and
present them in Table 3. For easier comparison, the delay
and energy consumption are given relative to the values
of the exact bfloat16 multiplier.

To evaluate the BFILM multiplier, we select the
ResNet-20 convolutional neural network [25-26] and
the CIFAR-10 dataset [27]. We change the number representation in the ResNet-20 convolutional neural network from the single-precision floating-point format
to the bfloat16 format. In the experiments, we use the
Caffe framework [28], where we replace the calls to the
cuBLAS multiplication routines with the calls to our
own GPU kernels, which emulate the proposed BFILM
multiplier.

The proposed multiplier with two or three ILM steps has
a lower energy consumption than the exact bfloat16
multiplier and the BFLP16-prop multiplier [15]. Moreover, the BFILM multiplier with two ILM steps is not
much slower than the state-of-the-art BFLP16-prop
multiplier [15]. However, the BFILM multiplier with only
one ILM step has a rather large error, which with two
ILM steps comes close to the BFLP16-prop multiplier’s
MRED, and then drops by order of magnitude with
each additional ILM step.

The neural network learns using the predetermined
split of the dataset to train and test sets [27]. Before
learning, we preprocess the images by subtracting
their mean value. Besides, we quantify the ResNet-20
single-precision floating-point weights to the bfloat16
format representation by simply discarding the last
16 bits of the floating-point mantissa. In the learning
phase, we optimize the multinomial logistic loss function [29] with the Nesterov momentum algorithm [30].
The learning starts with randomly initialised weights. In
all experiments, we train the network for 64000 epochs.

Table 3: Comparison of delay, PDP, and the MRED error for the different number of ILM steps in the BFILM
multiplier.
Multiplier
exact bfloat16
BFLP16-prop [15]
BFILM, 1 ILM step
BFILM, 2 ILM steps
BFILM, 3 ILM steps

Delay
[%]
100
104
58
115
173

PDP
[%]
100
59
19
38
58

In the first experiment, we evaluate the influence of the
proposed multiplier on the ResNet-20 classification accuracy. As the BFILM multiplier is configurable in terms
of the number of steps affecting the multiplication error, we test several BFILM configurations. In the tested
configurations, BFILM-1-1, BFILM-1-2, BFILM-2-2 and
BFILM-2-3, the first number denotes the number of ILM
steps in the inference phase, while the second number
denotes the number of ILM steps used in the learning
phase.

MRED
[10-3]
0
3.50
91.21
9.08
0.86
83

R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86

Table 4 shows the classification accuracy of the
CIFAR-10 dataset. For each configuration, we list the average value and standard deviation over five runs. Significant multiplication error of BFILM-1-1 leads to low
classification accuracy. Increasing the number of the
ILM steps in the inference and learning phase improves
classification accuracy. For example, with BFILM-2-2
and BFILM-2-3, the classification accuracy is almost the
same as with the exact bfloat16 multiplier.
Table 4: Performance of the ResNet-20 convolutional
neural network on the CIFAR-10 dataset using bfloat16
multipliers.
Multiplier
exact bfloat16
BFILM-1-1
BFILM-1-2
BFILM-2-2
BFILM-2-3

Test set classification accuracy [%]
91.50 ± 0.10
86.32 ± 1.26
90.98 ± 0.15
91.30 ± 0.30
91.40 ± 0.20

Figure 4: Varying configuration of BFILM during the
learning phase.

Also, we can see from the results for BFILM-1-1 and
BFILM-1-2 that increasing the number of the ILM steps
in the learning phase positively affects classification
performance. On the other hand, a further increase
in the number of steps in the inference phase from
BFILM-1-2 to BFILM-2-2 has much less impact. Moreover, according to Table 3, BFILM-1-2 has a very small
energy footprint and thus could be sufficient for neural
network inference and learning.

4 Conclusion
In this paper, we proposed a novel approximate
bfloat16 multiplier with adjustable accuracy, which can
be achieved without any hardware reconfiguration.
Instead, the proposed BFILM multiplier iteratively uses
an approximate logarithmic multiplier core to reduce
the error. This way, we avoid using additional error refinement circuits, keeping the design small and energy
efficient. The primary purpose of the proposed design
is to use it in deep neural network processing in the inference and learning phases. We apply the BFILM multiplier in the ResNet-20 convolutional neural network to
classify the CIFAR-10 dataset. We demonstrate the impact of various BFILM configurations on the neural network learning process and classification accuracy. The
results show that we can easily adjust the multiplier’s
accuracy according to the application’s requirements.
The main advantage of the on-the-fly adaptation of the
BFILM multiplier comes to expression during the learning phase. The results prove that we can start with one
ILM step in the inference and learning phase to save
energy and later, when model performance improves,
increase the number of the ILM steps to refine the result further. In future work, we aim to develop an algorithm that could optimize the learning process in terms
of speed and efficiency by automatically adapting the
ILM steps to the BFILM multiplier when needed.

The second experiment highlights the advantage of the
on-the-fly accuracy adaptation of the BFILM multiplier,
which can help in faster and more energy-efficient neural network learning. The idea is to start with one ILM
step in the inference and learning phase to save energy
and later, when model performance improves, increase
the number of the ILM steps to further refine the result.
Fig. 4 shows the outcome of the learning process on
the training and testing set for five separate runs, each
with randomly initialised neural network weights. For
the loss (red) and the accuracy (green), we show the
span of obtained values and the curve averaged over
all runs. We see that with the BFILM-1-1 configuration,
the model improves rapidly and reaches a classification
accuracy of more than 60 % in only 10000 epochs. At
this point, we use an additional ILM step in the learning
phase (BFILM-1-2) to improve the model’s convergence
and achieve more than 99.4 % of the accuracy of the
exact bfloat16 multiplier. However, if the accuracy still
needs to be increased for some applications, we can
enhance the model by training it with additional ILM
steps.

84

R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86

5 Acknowledgments

8.

This research was supported by Slovenian Research
Agency under Grants P2-0359 (National research program Pervasive computing), P2-0241 (Synergy of the
technological systems and processes) and by Slovenian Research Agency and Ministry of Civil Affairs, Bosnia
and Herzegovina, under Grant BI-BA/21-23-033 (Bilateral Collaboration Project).

9.

6 Conflict of Interest

10.

The authors declare no conflict of interest.
The funders had no role in the design of the study; in
the collection, analyses, or interpretation of data; in the
writing of the manuscript; nor in the decision to publish the results.

11.

7 References
1.

2.
3.

4.

5.

6.

7.

12.

G. Armeniakos, G. Zervakis, D. Soudris, and J. Henkel, ‘‘Hardware approximate techniques for deep
neural network accelerators: A survey,’’ ACM Comput. Surv., mar 2022.
https://doi.org/10.1145/3527156.
“IEEE standard for floating-point arithmetic,””
2019, IEEE Std 754-2019 (Revision of IEEE 7542008).
R. Murillo, A. A. Del Barrio Garcia, G. Botella, M. S.
Kim, H. Kim, and N. Bagherzadeh, “Plam: a posit
logarithm-approximate multiplier,” IEEE Transactions on Emerging Topics in Computing, pp. 1–1,
2021.
H. Kim, ‘‘A low-cost compensated approximate
multiplier for bfloat16 data processing on convolutional neural network inference,’’ ETRI Journal,
vol. 43, no. 4, pp. 684–693, 2021. https://onlinelibrary.wiley.com/doi/abs/10.4218/etrij.2020-0370.
J. N. Mitchell, ‘‘Computer multiplication and division using binary logarithms,’’ IRE Transactions on
Electronic Computers, vol. EC-11, no. 4, pp. 512–
517, Aug. 1962.
V. Mahalingam and N. Ranganathan, ‘‘Improving
accuracy in Mitchell’s logarithmic multiplication
using operand decomposition,’’ IEEE Transactions
on Computers, vol. 55, no. 12, pp. 1523–1535, Dec.
2006.
https://doi.org/10.1109/TC.2006.198.
Z. Babić, A. Avramović, and P. Bulić, ‘‘An iterative
logarithmic multiplier,’’ Microprocessors and Microsystems, vol. 35, no. 1, pp. 23–33, 2011.
https://doi.org/10.1016/j.micpro.2010.07.001.

13.

14.

15.

16.

17.

18.

85

M. S. Kim, A. A. D. Barrio, L. T. Oliveira, R. Hermida,
and N. Bagherzadeh, ‘‘Efficient Mitchell’s approximate log multipliers for convolutional neural networks,’’ IEEE Transactions on Computers, vol. 68, no.
5, pp. 660–675, Dec. 2019.
https://doi.org/10.1109/TC.2018.2880742.
V. Leon, G. Zervakis, D. Soudris, and K. Pekmestzi,
‘‘Approximate hybrid high radix encoding for
energy-efficient inexact multipliers,’’ IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 26, no. 3, pp. 421–430, Nov. 2018.
https://doi.org/10.1109/TVLSI.2017.2767858.
H. Waris, C. Wang, and W. Liu, ‘‘Hybrid low radix
encoding-based approximate Booth multipliers,’’
IEEE Transactions on Circuits and Systems II: Express
Briefs, vol. 67, no. 12, pp. 3367–3371, Feb. 2020.
https://doi.org/10.1109/TCSII.2020.2975094.
H. Waris, C. Wang, W. Liu, J. Han, and F. Lombardi,
‘‘Hybrid partial product-based high-performance
approximate recursive multipliers,’’ IEEE Transactions on Emerging Topics in Computing, vol. 10, no.
1, pp. 507–513, 2022.
https://doi.org/10.1109/TETC.2020.3013977.
U. Lotrič and P. Bulić, ‘‘Applicability of approximate multipliers in hardware neural networks,’’
Neurocomputing, vol. 96, pp. 57–65, 2012 [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S0925231212003311
T. Y. Cheng, Y. Masuda, J. Chen, J. Yu, and M. Hashimoto, ‘‘Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training,’’ Integration, vol. 74, pp. 19–31, 2020.
https://doi.org/10.1016/j.vlsi.2020.05.002.
R. Pilipović, V. Risojević, J. Božič, P. Bulić, and U.
Lotrič, ‘‘An approximate GEMM unit for energyefficient object detection,’’ Sensors, vol. 21, no. 12,
2021.
https://doi.org/10.3390/s21124195
H. Zhang and S. B. Ko, ‘‘Variable-precision approximate floating-point multiplier for efficient
deep learning computation,’’ IEEE Transactions on
Circuits and Systems II: Express Briefs, vol. 69, pp.
2503–2507, 5 2022.
https://doi.org/10.1109/TCSII.2022.3161005.
C. Chen, W. Qian, M. Imani, X. Yin, and C. Zhuo,
‘‘PAM: A piecewise-linearly-approximated floating-point multiplier with unbiasedness and configurability,’’ IEEE Transactions on Computers, vol.
71, pp. 2473–2486, 10 2022.
https://doi.org/10.1109/TC.2021.3131850.
A. Towhidy, R. Omidi, and K. Mohammadi, ‘‘On the
design of iterative approximate floating-point
multipliers,’’ IEEE Transactions on Computers, 2022.
https://doi.org/10.1109/TC.2022.3216465.
A. Y. Romanov, A. L. Stempkovsky, I. V. Lariushkin,
G. E. Novoselov, R. A. Solovyev, V. A. Starykh, I. I.

R. Pilipović et al.; Informacije Midem, Vol. 53, No. 2(2023), 79 – 86

19.

20.
21.

22.

23.
24.

25.

26.

27.
28.

29.

Romanova, D. V. Telpukhov, and I. A. Mkrtchan,
‘‘Analysis of posit and bfloat arithmetic of real
numbers for machine learning,’’ IEEE Access, vol. 9,
pp. 82 318–82 324, 2021.
https://doi.org/10.1109/ACCESS.2021.3086669.
A. A. Ghazy and M. Shalan, ‘‘OpenLANE: The
Open-Source Digital ASIC Implementation Flow,’’
in 2020 Workshop on Open-Source EDA Technology (WOSET), 2020, last accessed 27 September
2022 .Available: https://woset-workshop.github.
io/PDFs/2020/a21.pdf
OpenLane, ‘‘Openlane EDA Toolset.’’ 2022, last
accessed 27 September 2022. Available: https://
github.com/The-OpenROAD-Project/OpenLane
M. Chupilko, A. Kamkin, and S. Smolov, ‘‘Survey of
open-source flows for digital hardware design,’’
in 2021 Ivannikov Memorial Workshop (IVMEM),
2021, pp. 11–16.
T. Edwards, ‘‘Google/SkyWater and the Promise
of the Open PDK,’’ in 2020 Workshop on OpenSource EDA Technology (WOSET), 2020, last accessed 27 September 2022. Available: https://
woset-workshop.github.io/PDFs/2020/a03.pdf
‘‘Google SkyWater Open Source PDK.’’ 2022, last
accessed 27 September 2022. Available: https://
github.com/google/skywater-pdk
A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Imagenet classification with deep convolutional neural networks,’’ in Advances in Neural Information
Processing Systems, F. Pereira, C. J. C. Burges, L.
Bottou, and K. Q. Weinberger, Eds., vol. 25. Lake
Tahoe, NV, USA: Curran Associates, Inc., Dec. 2012,
pp. 1097–1105.
K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual
learning for image recognition,’’ in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
Y. He, X. Zhang, and J. Sun, ‘‘Channel pruning for
accelerating very deep neural networks,’’ in 2017
IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 1398–1406.
A. Krizhevsky, ‘‘Learning multiple layers of features
from tiny images,’’ University of Toronto, Toronto,
Tech. Rep., Apr. 2009.
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J.
Long, R. Girshick, S. Guadarrama, and T. Darrell,
‘‘Caffe: Convolutional architecture for fast feature
embedding,’’ in Proceedings of the 22nd ACM International Conference on Multimedia, ser. MM
’14. New York, NY, USA: Association for Computing Machinery, 2014, p. 675–678. Available:
https://doi.org/10.1145/2647868.2654889
J. S. Long and J. Freese, Regression Models for
Categorical Dependent Variables using Stata, 3rd

30.

Edition. StataCorp LP, 2014. Available: https://
www.stata.com/bookstore/regression-modelscategorical-dependent-variables
I. Sutskever, J. Martens, G. Dahl, and G. Hinton,
‘‘On the importance of initialization and momentum in deep learning,’’ in Proceedings of the 30th
International Conference on Machine Learning,
ser. Proceedings of Machine Learning Research,
S. Dasgupta and D. McAllester, Eds., vol. 28, no. 3.
Atlanta, Georgia, USA: PMLR, 17–19 Jun 2013, pp.
1139–1147. Available: https://proceedings.mlr.
press/v28/sutskever13.html 6 VOLUME 11, 2023

Copyright © 2023 by the Authors.
This is an open access article distributed under the Creative Commons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted
use, distribution, and reproduction in any medium,
provided the original work is properly cited.
Arrived: 22. 06. 2023
Accepted: 21. 07. 2023

86

Original scientific paper
https://doi.org/10.33180/InfMIDEM2023.204

Journal of Microelectronics,
Electronic Components and Materials
Vol. 53, No. 2(2023), 87 – 102

A New Design Optimization Methodology of Fully
Differential Dynamic Comparator
Leila Khanfir1, Jaouhar Mouine*2
Laboratory of Analysis, Design and Control of Systems, University of Tunis El Manar, National
Engineering School of Tunis, Tunis, unisia
2
Department of Electrical Engineering, College of Engineering, Prince Sattam Bin Abdulaziz
University, Al Kharj, Saudi Arabia
1

Abstract: The need to reduce the time to market for high-performance integrated circuits has become a primary concern in modern
electronics design. Many efforts are currently being made to streamline the design process for increasing complexity circuits while
providing optimal performances, especially for nanoscale technologies. This paper presents a new and effective methodology for the
design of fully differential comparators to achieve a high-performance operation using dynamic topology and nanoscale technology.
The proposed methodology is not process dependent and can be applied to similar conventional comparator structures to optimize
the operation speed while ensuring good offset cancellation, efficient noise immunity, and reduced design time and complexity.
The design steps include theoretical analysis and simulation-based optimization of the comparator speed, as well as offset and noise
reduction within a minimal design time. All the analog and digital building blocks are designed using dynamic topologies, including
the clock generator, to ensure high speed and synchronized operation. The resulting circuit is a new two-stage dual clock fully
differential comparator. Compared with its equivalent counterparts, it provides improved operation speed, and reduced offset voltage
and kickback noise. This comparator is designed in the TSMC 65 nm CMOS process. Its performance shows that it achieves a 1.25 GHz
operation speed, presents less than 9 mV offset error, and generates a kickback noise of less than 40 mV with a 10 kΩ input resistance
during the reset phase only. It consumes 213 µW from a 1.2 V power supply at 1.25 GHz.
Keywords: fully differential dynamic comparator; kickback noise; offset self-calibration; clock generator; finite state machine.

Nova metodologija optimizacije zasnove polnega
diferencialnega dinamičnega komparatorja
Izvleček: Potreba po skrajšanju časa za trženje visoko zmogljivih integriranih vezij je postala glavna skrb pri sodobnem načrtovanju
elektronike. Trenutno potekajo številna prizadevanja za racionalizacijo postopka načrtovanja vedno bolj zapletenih vezij ob
zagotavljanju optimalnih zmogljivosti, zlasti za tehnologije v nanometrski razsežnosti. V tem članku je predstavljena nova in učinkovita
metodologija za načrtovanje polnih diferencialnih komparatorjev za doseganje visoko zmogljivega delovanja z uporabo dinamične
topologije. Predlagana metodologija ni odvisna od procesa in jo je mogoče uporabiti za podobne konvencionalne strukture
komparatorjev, hkrati pa zagotovi dobro izničevanje odmikov, učinkovito odpornost proti šumom ter skrajša čas in zapletenost
načrtovanja. Koraki načrtovanja vključujejo teoretično analizo in na simulaciji temelječo optimizacijo hitrosti delovanja komparatorja
ter odpravo kompenzacije in šuma v minimalnem času načrtovanja. Vsi analogni in digitalni gradniki so zasnovani z uporabo
dinamičnih topologij, vključno z generatorjem ure, da se zagotovi visoka hitrost in sinhronizirano delovanje. Tako nastalo vezje je nov
dvostopenjski dvotaktni polni diferencialni komparator. V primerjavi z enakovrednimi primerki zagotavlja večjo hitrost delovanja ter
manjšo kompenzacijsko napetost in šum povratnega udarca. Ta komparator je zasnovan v 65 nm postopku CMOS podjetja TSMC.
Njegovo delovanje kaže, da dosega hitrost delovanja 1,25 GHz, ima manj kot 9 mV napako odmika in ustvarja šum odboja manj kot 40
mV z vhodno upornostjo 10 kΩ. Pri 1,25 GHz porabi 213 µW iz 1,2-voltnega napajanja.
Ključne besede: fully differential dynamic comparator; kickback noise; offset self-calibration; clock generator; finite state machine.
* Corresponding Author’s e-mail: *j.mouine@psau.edu.sa

How to cite:
L. Khanfir et al., “A New Design Optimization Methodology of Fully Differential Dynamic Comparator", Inf. Midem-J. Microelectron. Electron. Compon. Mater., Vol. 53, No. 2(2023), pp. 87–102
87

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

1 Introduction

FDDC when it came to the process and mismatch variations and noise unbalance. The suppression of the
sensed common noise was less efficient. In addition,
the comparison was performed over two clock cycles,
which affected the operation speed. Moreover, a static
second stage was added to the comparator to increase
the gain, making the comparators even slower. Another FDDC was used in [7] to implement a SAR-assisted
noise-shaping pipeline ADC. The proposed structure
included self-calibrated current sources to compensate
for mismatches and operated with two synchronized
clocks. The circuit design achieved good performance.
However, the proposed comparator was specific to the
designed ADC and the operation speed was very low.

The scaling of silicon technologies has been one of the
primary factors that have allowed for outpacing the
exponential increase of performance demand over the
past few decades. Transistor scaling increases the integration density and operation speed. At the same time,
the resulting circuits are more sensitive to random and
systematic errors, such as offset and noise. Additional
circuitry for error compensation and noise suppression
is then needed, leading to a drastic increase in design
time and effort. Therefore, in modern circuit design,
optimization methodologies to improve performances
have become mandatory not only to answer to the increasing design constraints, but also to compensate for
increased errors and noises while optimizing the time
to market. Recently, optimization methodologies have
become a major research field in MOS circuit design
[1]–[3].

As for offset compensation, mismatches are usually
calibrated off-chip to reduce the design complexity [4],
[5]. In [7], a background calibration for interstage offset was proposed to compensate for comparator mismatches. Even if the operation speed was not altered,
there were “dead zones” in the calibration scheme that
reduced its efficiency. Moreover, the proposed scheme
mainly relies on the overall system architecture and can
hardly be reproduced with a different circuit design.

Dynamic comparators are largely used in advanced
mixed signal systems, such as analog to digital converters (ADCs). The design constraints of these systems
are usually stringent, depending closely on those of
the comparator. To improve the immunity of ADCs to
sensed common noise, a specific variant of the dynamic comparator is usually used [4]– [7]; it is a six-terminal circuit that compares an input voltage difference
to a reference voltage difference [8] and is commonly
called a differential pair comparator or fully differential dynamic comparator (FDDC). However, it is slower
than the common four-terminal-like circuit and is more
sensitive to kickback noise, as well as process and mismatch variations [9]. Achieving high performance and
good noise immunity with a six-terminal dynamic
comparator requires more design effort and time than
with the common four-terminal one. Therefore, design
methodologies could help designers significantly reduce design time and efforts.

The comparator gain is also an important feature to implement high-resolution ADCs. It is usually increased
by using preamplification stages or multistage comparators. In [5], a three-stage comparator was used, but
only the first one was dynamic. Thus, the comparator
gain was high, whereas the operation speed was low.
Likewise, a two-stage dynamic comparator, in which
only the first stage is dynamic, was also presented in
[10]. In [11], a three-stage, fully dynamic comparator
was proposed. However, the presented structure was
not fully differential, and the three stages operated
over the same clock period.
The current paper presents a new two-stage fully dynamic fully differential comparator. The decision is
made over the entire clock period. Also, additional
circuitry is added to generate synchronized clocks, to
reduce kickback noise, and to compensate for mismatches. The whole system is fully dynamic without a
considerable increase in design complexity. It achieves
a fully differential comparison, optimal operation
speed, good immunity to kickback noise, and self-calibrated offset voltage. The proposed design is process
independent and can be used in different applications.

An FDDC was employed in [4] for its low kickback noise,
good power efficiency, and simple dynamic structure.
To reduce mismatch effects on loop stability, the authors kept the comparator gain at low values, leading
to a considerable decrease in the operation speed. As
for immunity to comparator noise, the authors applied
a noise-shaping successive approximation register
quantizer to all stages in a pipelined ADC. Although
the proposed technique has advantages other than
the noise immunity of the comparator, it remains complex and specific to the designed ADC. In [5], a charge
distribution FDDC was used to implement a levelcrossing ADC. It was constructed with two separate
comparators to compare the differential input voltage
to a differential reference voltage. The two separate
comparators were more sensitive than an all-in-one

Section 2 presents the proposed system architecture of
the FDDC, including clock generation, kickback noise
immunity, and offset calibration. The new two-stage
FDDC is presented and discussed in section 3. Its operation is also detailed and compared with the onestage-like circuit. Section 4 describes the proposed
88

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

circuit and how it ensures immunity to kickback noise
while also detailing the clock generator design. Section
5 presents the proposed design technique for a digital offset self-calibration scheme using full custom dynamic circuits. Section 6 presents the simulation results
and circuit characterization. A comparison to state-ofthe-art performances is also addressed.

MOS devices, which makes the circuit more sensitive
to process and mismatch variations, especially when
designed in nanometer-scale technologies. Moreover,
because of the dynamic operation of the comparator,
there are large voltage variations in the internal nodes
between devices. These variations are coupled through
parasitic capacitors to the comparator inputs as a voltage signal creating a disturbance that is usually called
kickback noise. This switching noise is added to the
analog input signal and affects the comparison results.
Kickback noise cannot be removed, but there are a few
techniques to reduce its effects on the decision process
[12], [13].

2 Proposed system architecture
The comparator is typically a one-bit ADC. When the
difference between the compared voltages is about a
few hundred millivolts or more, the decision process is
usually accurate and fast. However, as the input voltage decreases to a few millivolts and less, the decision
process becomes much slower and more sensitive to
the input signal quality, as well as to circuit nonidealities such as offset and switching noises. Indeed, analog
signals usually present noise. Noise is random and
common to comparator inputs. On the other hand, a
dynamic comparator is usually designed with small

Fig. 1(a) shows a four-terminal comparator, which is
known as the strong-arm latch comparator and has
been largely used in ADC design [14]. It presents two
inputs and two outputs. One input is generated from
an external voltage source, while the other comes from
a resistive ladder. This affects the two inputs with different noise levels, making the comparison process only
effective when the sensed voltage is greater than the
difference between the two input noise signals. In contrast, a six-terminal comparator is shown in Fig. 1(b);
this is called the differential pair comparator [8], [15] or
FDDC [16], [17], and has been largely used in pipeline
and SAR ADCs [18]. This comparator presents four inputs and two outputs.
The inputs are a differential analog input signal and differential reference voltage. The two outputs are complemented: a positive output OP and negative output
OM. The positive output OP goes high when the differential analog input voltage VIN+ - VIN- is greater than the
reference voltage difference VREF+ - VREF-:

if VIN   VIN -   VREF   VREF -   0
then OP  '1' and OM  '0 '
else OP  '0 ' and OM  '1'

		

(1)

Thus, the common noise in each differential input is
cancelled separately, which considerably improves the
comparator precision. This section describes the toplevel architecture of the proposed FDDC, including
immunity to kickback noise and self-calibration of the
offset voltage.
Fig. 2 describes the proposed system. The symbol
shown in Fig. 2(a) presents the input and output terminals of the system. Fig. 2(b) illustrates the clock diagram
of the external and internal clock signals, while Fig. 2(c)
depicts the top-level architecture.
The proposed comparator is a new two-stage FDDC.
The clock generator produces two synchronized clock

Figure 1: Strong-arm latch comparator (a) dynamic
comparator (b) fully differential dynamic comparator.
89

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

signals clk and clks to ensure the operation of the first
and second stages, respectively. To reduce the input
noise, an RC circuit can be added at the comparator inputs as a first-order filter, but at the price of a reduced
operation speed. In the proposed solution, the resistance of a CMOS switch and parasitic capacitor Cp at
the comparator inputs together form an RC filter. These
two components are too small to affect the comparator
speed but also too small to ensure the cancellation of
the kickback noise. Therefore, in the proposed scheme,
the switches, together with the input parasitic capacitors, are used as track-and-hold circuit blocks, which are
controlled to reduce the effect of noise on the decision
process. Two clock signals clkn and clkn’ are used to control the switches to operate only during the comparator reset time (when clk = `0’) before beginning a new
cycle. Thus, kickback noise appears at the comparator
inputs for a limited period, during which the decision
process cannot be affected.

crements one of the two N-bit control signals d+ and
d- by 1 to compensate for the mismatches in the comparator as well as in the switches at the comparator inputs. This process continues as long as Q+, Q-, calib and
calib’ remain unchanged. The offset regulator design is
detailed in section 5.

The clock generator provides four synchronized clock
signals: clk, clks, clkn, and clkn’. These clock signals ensure
a three-phase operation comparator: track-and-hold,
decision, and reset. The circuit is designed so that the
track-and-hold, as well as a part of the decision process,
are performed during the reset time, which improves
the comparator speed compared with the state-of-theart method. The comparator is described in detail in
section 3, while noise suppression and clock generation are presented in section 4.

Figure 2: Proposed system (a) symbol view (b) clock
diagram (c) architecture.

To compensate for the comparator offset errors, a
three-phase operation system is proposed. First, the
initial reset phase is controlled using the external signal
reset. When this signal is high, the two N-bit outputs d+
and d- of the two counters are initialized to zero. Thus,
the initial reset phase allows for initializing the capacitor banks to equal initial charges. This represents the
initial state S0 of the two FSMs used in the self-calibration process. At that time, the eight input switches are
configured to connect the four comparator inputs IN+,
IN-, REF+, and REF- to the differential inputs VIN+, VIN-,
VREF+, and VREF-, respectively. Second, a calibration phase
is controlled by two complementary external signals
calib and calib’. This phase occurs only once after the
initial reset phase. When calib and calib’ are set to ‘1’ and
‘0’, respectively, eight switches (in blue in Fig. 2(c)) that
are placed at the system inputs disconnect the comparator inputs IN+, IN-, REF+, and REF- from the differential inputs VIN+, VIN-, VREF+, and VREF-, and connect them
to the common mode reference voltages VCM, which
ensures equal charges at the input parasitic capacitors.
VCM is the mean value of the input range. During the
calibration phase, the comparator outputs Q+ and Qare applied to the offset regulator. At each clock cycle,
according to Q+ and Q- levels, the clock generator in-

3 New two-stage fully differential
comparator
The operation speed is a primary constraint in the comparator design. The comparison speed can be defined
as the time required to provide a valid output decision.
A dynamic comparator operates under a clock signal
clk alternating decision and reset phases in each clock
cycle. The two phases of decision and reset usually correspond to the two clock levels ‘1’ (on) and ‘0’ (off ). Thus,
denoting the decision and reset times by ton and toff, respectively, the total comparison time tclk is equal to:

tclk  ton  toff 					(2)
A track-and-latch circuit, basically a Set Reset (SR) latch,
is usually added to the comparator outputs to retrieve
static output signals. Thus, the decision time ton is typically the sum of two times: the comparison time tc
needed by the dynamic comparator to produce a valid
output, and the SR latch time tSR required by the SR
latch to change state according to the comparator outputs. The decision time ton is then equal to:

90

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

not be complementary, and the comparison decision
will not be valid. This happens when resolving small input values and when the PMOS threshold voltage |V THP|
is larger than VDD/2, which is usually the case in scaled
technologies like 65 nm and below.

ton  tc  tSR 					(3)
Once the SR latch state has changed, the comparator
outputs can be reset to the initial value without affecting the SR latch state until the next decision process
begins. Inserting (3) into (2), the total comparison time
tclk is then defined in terms of the comparison time tc,
the SR latch response time tSR and the reset time toff:

In the present work, a new two-stage FDDC where the
comparison speed is optimized with no restriction on
technology use is proposed. Indeed, as shown in Fig.
3, each stage includes a positive feedback loop, which
reduces the comparison time tc compared with [3].
In addition, the positive feedback loop in the second
stage provides complementary outputs, regardless of
the technology parameters used. Moreover, the two
stages operate under two different clock signals as in
[3], which reduces the total comparison time tclk to the
sum of the decision time tc and the reset time toff as defined in (5).

tclk  tc  tSR  toff 				(4)
The comparison time tc depends on the internal capacitor sizes, internal feedback loops, and the value of the
resolved input voltage. For a few hundred millivolts of
the input voltage, tc can be small and reach nano and
picoseconds according to the comparator structure.
However, when resolving near 0 V input values, the
comparator output evolution becomes slow and tc
tends to infinity. Therefore, to sense micro and nanovolt input values in a reduced time, it is necessary to
minimize the comparator internal capacitors by using
small devices, and to improve the comparator structure by creating positive feedback loops, immunity to
switching noises, and compensation for process and
mismatch variations.

The circuit operates as follows: in the first stage, a differential analog input voltage ΔVIN = (VIN+ - VIN-) and differential reference voltage ΔVREF = (VREF+ - VREF-) are applied
to the four input pair transistors (M1-4). The voltages VIN+
and VREF- are applied to transistors (M1,4), which have a
common drain. These transistors generate two currents and feed node X- with a current, which is the image of the sum of the two applied voltages (VIN+ + VREF-).
Likewise, considering the circuit symmetry, transistors
(M2,3) feed node X+ with a current, which is the image of
the sum of the two applied voltages (VIN- + VREF+). When
the clock signal clk is low, the tail transistors (M5,6) turn
off, while the reset transistors (M11-14) turn on. This allows for initializing the latch nodes X+, X-, O+ and O- to
VDD. Conversely, when clk goes high, the tail transistors
(M5,6) close while the reset transistors (M11-14) open. At
this time, the four input pair transistors feed the latch
nodes X+ and X- with a differential current ΔIX = IX+ - IX-,
which is the image of the voltage difference between
the sums of the applied voltages. This voltage difference is denoted as ΔVINPUT and is equal to:

A double tail and three-stage triple-latch comparators
are designed with a 28 nm MOS process [11]. The first
one is a two-stage double tail comparator that includes
only one positive feedback loop, while the second one
includes three positive feedback loops. The first one
achieves a comparison time tc equal to 50 ps against
27 ps for the second comparator when resolving the 5
mV input value. Nevertheless, in the two comparators,
the stages operate during the same clock period, which
makes tc the sum of the response times of all stages put
in a series. Moreover, there is no improvement for tSR
and toff in the total comparison time tclk in (4). In [3], a
two-stage dual-clock latch comparator is proposed.
The comparator includes one feedback loop. However,
the second stage is controlled by a second clock, reducing the on-time ton in (2) to tc only. Thus, the total comparison time tclk defined in (4) becomes:

+

tclk  tc  toff 					(5)

+

(6)

The resulting ΔIX activates the latch transistors (M7-10)
which operate as a strong positive feedback loop to
regenerate the outputs O+ and O- to complementary
logic levels. The generated outputs are then applied to
the input transistors (Ms1,s2) of the second stage. Transistors (Mdi+(i=1..N)) and (Mdi-(i=1..N)) are two capacitor banks,
each one including N binary-weighted charges. These
capacitor banks are controlled by two N-bit inputs, d+
= (di+(i=1..N)) and d- = (di-(i=1..N)), and are used to compensate for process and mismatch variations. This specific

Moreover, the second stage is built with a stack of two
elements only, which reduces the total capacitor seen
at the outputs of the first stage, leading to a minimal
comparison time tc. The comparator is designed with a
180 nm MOS process and achieves a comparison time
of 900 ps when resolving a 25 µV input value. However,
the second stage operates when only one of the firststage outputs decreases to a threshold value. If both
outputs reach this value, the second-stage outputs will
91

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

structure of the charges also reduces the switching
noise and improves the operation speed [3].

(DFFs). The true single-Phase clock (TSPC) DFF presented in [19] is considered to design the clock generator in
the proposed system in Fig. 2(c). It is a nine-transistor,
three-stage DFF operating with one single clock signal
and including no more than three stacked devices per
stage. This circuit is shown in Fig. 4, where a reset command and inverter are added to the output.

Considering the second stage, when clks is high, outputs Os+ and Os- are initialized to ‘0’ turning off transistors (Ms3,s4). As clks becomes low, reset transistors (Ms5,s6)
open. As shown in Fig. 3(b), this happens at the end of
the reset phase of the first stage, where both outputs
O+ and O- are initialized to VDD. Thus, transistors (Ms1,s2)
turn off like the other four ones. When one of the first
stage outputs O+ and O- begins decreasing, transistor
(Ms1) or (Ms2), respectively, begins operating to charge
one of the output voltage Os+ and Os-, respectively, to
VDD. When the applied input voltage difference is too
small, both O+ and O- can decrease before regenerating to logic levels. Then, transistors (Ms3,s4) will operate
as positive feedback to maintain one of the outputs to
‘0’ while the other one charges to VDD. Without these
transistors, this may result in both outputs Os+ and Osat VDD. In this case, when these signals are applied to
the SR latch, they create an undefined state, resulting
in a wrong output Q+ and Q- decision.
The last stage is a NOR gate SR latch. It maintains its
state when the applied signals Os+ and Os- are initialized to ‘0’ and keeps or changes the state when the outputs are complemented, resulting in static outputs Q+
and Q-.

Figure 4: True Single-Phase Clock DFF (a) symbol (b)
circuit-level Design.
This structure is convenient and should provide an
operational speed greater than the comparator. A detailed description of the circuit can be found in [19].

4 Clock generator and kickback noise
suppression

Fig. 5 shows the design details of the proposed clock
generator. The circuit generates four signal outputs clk,
clks, clkn and clkn’. Because the last two are complemen-

The generation of synchronized clock signals is
achieved by sequential circuits using data flip flops

Figure 3: Proposed FDDC (a) circuit (b) clock diagram.
92

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

tary, the circuit states can only be defined according to
the three outputs clk, clks, and clkn. These three outputs
are denoted by vector c = (clk clks clkn) = (x x x), where x
is equal to ‘1’ or ‘0’. As described in Fig. 5(a) and (b), the
circuit goes through four states S0, S1, S2, and S3. First,
the clock generator is initialized to state S0 with an external reset = ‘1’. This state corresponds to the sampleand-hold phase by connecting external signals to the
comparator inputs (Fig. 1(c)). This also corresponds to
the reset of the comparator first stage. Vector c is then
equal to (0 0 1). Second, state S1 corresponds to the reset of the two comparator stages, for which c is equal
to (0 1 0). Third, state S2 is the state where the comparator first-stage operation begins, which corresponds to
c equal to (1 0 0). Fourth, state S3 is the continuation of
state S2 with c still equal to (1 0 0). This last state is required because the first-stage operation is slower than
the second one. Therefore, high and low levels of clk
must last longer than those of clks and clkn.

can significantly reduce the effect of kickback noise on
the decision process. However, the clock generation
in [13] used delay circuits, and outputs were not synchronized. Hence, the design was specific to the chosen clock timing, as well as to the technology used. In
contrast, the proposed design generates synchronized
outputs and can be reproduced without considering
the technology used or transistor size.

5 Proposed offset self-calibration
technique
In Fig. 2(c), the proposed offset regulator receives the
comparator static outputs Q+ and Q- and generates
two N-bit outputs d+ and d-. These outputs are then
used to control the 2N binary-weighted transistors
(Mdi+) and (Mdi-) shown in Fig. 3(a). The least significant bit (LSB) transistor is set to minimal dimensions,
while, for the other weighted transistors, the channel
width is doubled until reaching the most significant bit
(MSB) transistor. The main idea is to create a progressive charge imbalance to compensate the comparator
offset as in [3], [20]. However, in [3], a high-level design
methodology for the self-calibration scheme is proposed. As a result, the circuit is slow and large because
of the large number of chained gates. Whereas in [20],
the offset regulation is off chip and too complex for a
circuit-level design. In the present work, the proposed

The finite state machine (FSM) is depicted in Fig. 5(b).
It has no inputs and generates the three outputs: clk,
clks, and clkn. The gate-level and circuit-level synthesis
are given in Fig. 5(c) and (d), respectively. An inverter
is added to generate the complement of clkn. State S0
is the sample-and-hold state, while state S2 is the decision phase. Inserting states S1 and S3 in between S0 and
S2 allows for reduction of kickback noise effects on the
decision process. Indeed, as discussed in [13], isolating
the decision process from the sample-and-hold phase

Figure 5: Proposed clock generator design (a) clock diagram (b) Moore finite state machine (c) gate-level design (d)
circuit-level design.
93

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

offset regulator is minimalist and could be easily designed at the circuit level.

pacitor banks, two cases will not be used to avoid a significant variation in the capacitive compensation load:
“all transistors are on” and “all transistors are off”, which
correspond to d+ and d- equal to 0 and 2N-1, respectively. Therefore, the two N-bit control signals d+ and
d- should be initialized to 1, for which all the binaryweighted transistors are on, except for the LSB transistor. Then, according to Q+ and Q- levels, d+ and d- incrementation will either be stopped by setting E+ and
E- to ‘0’ or pursued by setting E+ and E- to ‘1’. The incrementation should stop before reaching 2N-1, for which
all transistors are blocked. The case d+ and d- equal to
2N-2 turns off all the calibrating transistors, except the
LSB one. The parasitic capacitors of the blocked transistors can be neglected compared with those of the on
transistors.

Figure 6: Block diagram of the proposed offset regulator.
The proposed offset regulator block diagram is presented in Fig. 6. The circuit input stage is an FSM, which
receives the comparator static outputs Q+ and Q- and
generates two control digits e+ and e-. These two digits
are combined into the calibration control signal calib
using an AND gate to generate two digital signals: E+
and E-. These signals are then used as two enable input
signals of two N-bit counters. The counters generate
two N-bit control words to calibrate the two capacitor
banks in the comparator shown in Fig. 3(a). In these ca-

Each conducting transistor is then equivalent to a capacitor. As a result, when d+ and d- are equal to ‘1’, the
N-1 largest capacitors are in parallel. This sets the calibrating capacitive load at the maximum value on both
sides of the comparator. When applying a 0 V-input
voltage, the comparator output Q+ is either high or
low. When Q+ is high, the comparator is considered
as exhibiting a positive offset voltage. To compensate
for this offset, d- is incremented by 1 (d- = d-(initial) +1 =
2). This corresponds to a first step decrease of the capacitive load on the right side of the comparator with

Figure 7: Proposed FSM design to control the two counters, (a) Moore FSM, (b) proposed circuit-level design.

Figure 8: Proposed N-bit counter design (a) Moore FSM
(b) module-level design (c) proposed circuit-level design to start the counter from 1.
94

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

6 Simulation and comparison

respect to the left side. Hence, in the next comparison
cycle, the positive offset voltage either decreases toward 0 V or becomes negative. If the offset voltage is
still positive in the next cycle, that is Q+ is still high, d- is
incremented again. This continues until the offset voltage becomes negative, that is Q+ becomes low, or until
d- reaches 2N-2. The generation of e+ and e- according
to Q+ and Q- levels is described by the FSM shown in
Fig. 7(a). A reset command sets the system to state S0
where both digital outputs e+ and e- are set to ‘0’. When
Q+ and Q- are equal to ‘1’ and ‘0’, respectively, the system enters state S1 where the outputs e+ and e- are set
to ‘0’ and ‘1’, respectively. The system remains in that
state until Q+ and Q- change to opposite logic levels.
When this happens, the system enters state S2 where
outputs e+ and e- are set to ‘1’ and ‘0’, respectively. This
state allows for rebalancing the system once the offset
voltage changes signs. Simulations have shown better
results when the system is rebalanced twice by adding
state S3.

To validate the proposed design methodology and
evaluate the proposed circuit performances, the proposed two-stage FDDC shown in Fig. 3 has been designed in the TSMC 65 nm CMOS process using standard-threshold MOS devices. The offset calibrating
capacitor banks are set to six bits. The basic comparator
shown in Fig. 1(b), followed by a NAND-based SR latch
is also designed using the same standard-threshold devices and will be used on a comparison basis to show
the advantages of the proposed structure.
In the first simulation set, both FDDCs are simulated at
room temperature under nominal operating conditions.
They are powered by 1.2 V supply voltage and operate
at a 1.25 GHz clock frequency. A first DC voltage source
is set to -300 µV and connected to the differential input
voltage, while a second DC voltage source is set to VCM

Then, the system enters a final state S7 where both outputs e+ and e- are set back to ‘0’ again. Because the circuit is symmetrical, considering Q+ and Q- equal to ‘0’
and ‘1’, respectively, leads to states S4, S5 and S6 which
are symmetrical to states S1, S2 and S3, respectively.
Fig. 7(b) shows the proposed FSM circuit synthesis. It
uses dynamic circuits and the DFF shown in Fig. 4. To
generate static outputs e+ and e- with maximal operation speed, switched circuits with positive feedback are
used.
The generated outputs are used to control two N-bit
counters. Fig. 8(a) shows the FSM of an N-bit counter.
The module-level design of the counter is shown in Fig.
8(b), while the proposed circuit-level design is shown
in Fig. 8(c). In the proposed circuit-level design, the
first DFF is reset to ‘1’ instead of ‘0’ to initialize the Nbit counter to 1 instead of 0. Fig. 9 shows the modified
DFF. However, to avoid the counter reaching 2N-1, the
on time of the external signal calib is set to exactly 2N-2
cycles.

Figure 9: First DFF of the N-bit counter (a) symbol (b)
circuit-level design.

Figure 10: Transient analysis of the fully differential dynamic comparator (a) basic comparator (b) proposed
two-stage comparator.
95

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

= 950 mV and is connected to both reference inputs to
set VREF = (VREF+ - VREF-) to 0 V. Fig. 10 shows the transient
analysis results for both comparators. This figure is used
to determine the decision time ton for both structures.
In Fig. 10(a), the decision time ton of the basic comparator, as defined in (3), is equal to the difference between
when clk goes high and when the negative output Q- of
the SR latch crosses the mid supply voltage value (VDD/2
= 600 mV). In this first case, the output Q+ and Q- transition must happen during the clk on-time. Otherwise, the
decision could not be made, and the comparator output
would be invalid. In Fig. 10(a), Q+ transition happens
slightly before clk transition.

In the third simulation set, immunity to kickback noise
is simulated using the circuit of Fig. 12(a). In that circuit, the stage preceding the comparator represents
the Thevenin equivalent, with a Thevenin resistor RTH
equal to 2 x 5 k. The comparator is the proposed FDDC,
including switches and the clock generator, as detailed
in Fig. 3. To assess the proposed design, simulations are
performed with and without noise reduction. The transient evolution of the comparator input signals with and
without noise reduction is shown in Fig. 12(b). In both
cases, the kickback noise is about a few tens of millivolts. However, compared with the input signals without
noise reduction (in red in Fig. 12(b)), input signals with
noise reduction (in green) exhibit noise during the reset time only, whereas without noise reduction, noise is
present during the entire decision cycle. Although the
noise maximum level is not reduced, the circuit remains
immune to kickback noise during the decision phase,
which is essential to ensure high accuracy.

In the proposed circuit, the decision time ton is equal to
the comparison time tc, as discussed in section 3. The
comparison time tc in Fig. 10(b), corresponds to the difference between when clk goes high and when the negative output Os- of the second stage crosses 600 mV. In
this second case, Os- transition must happen during the
clk on-time. However, since Os- logic level is maintained
during the reset, Q+ and Q- transition could happen at
any time of the clock cycle, even after clk transition.

Figure 11: Transient evolution of the generated clocks.
Thus, the decision time ton is equal to 400 ps and 360 ps
in the basic and proposed comparators, respectively.
The speed improvement of 40 ps in the proposed comparator is then about 10%, as in [3]. However, in [3],
the two-stage comparator could not operate properly
when powered by voltages equal to 1.2 V and below.
The proposed design operation is independent of the
technology used, as discussed in section 3.

Figure 12: Kickback noise simulation (a) simulation circuit (b) kickback noise at the comparator inputs.
In the fourth simulation set, the offset correction is simulated using the circuit shown in Fig. 13(a). The differential analog input is connected to a triangular voltage
source VINPUT = VIN+ - VIN- with a slope equal to 1mV/10 ns.
The differential reference inputs VREF+ and VREF- are connected to a common mode voltage source VCM = VREF+ =
VREF- = 950 mV.

In the second simulation set, the clock generator shown in
Fig. 5(d) is simulated under 1.2 V with a 20 GHz input clock
signal clkIN. The results are shown in Fig. 11. In this figure, four
synchronized outputs clk, clks, clkn, and clkn’ are generated in
accordance with the clock diagram of Fig. 5(a). The simulations show that the signal frequency can exceed 2.5 GHz.
96

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

Figure 14: Offset voltage compensation by capacitor
banks.

Figure 15: Simulation circuit of the FDDC, including an
offset voltage VOS.

Figure 13: Offset self-calibration simulation (a) simulation circuit (b) ideal transfer characteristic (c) real transfer characteristic.
The differential input voltage VREF = VREF+ - VREF- is then
equal to 0 V. Thus, considering the two voltage differences, VINPUT and VREF, the ideal transfer characteristic of
the dynamic comparator would be similar to the one
presented in Fig. 13(b). Here, both hysteresis and offset
are null. However, in real conditions, the comparator always exhibits hysteresis and offset [21]. Fig. 13(c) shows
the realistic transfer characteristic. The hysteresis window is centered on VM and delimited by trip points V TR+
and V TR-. The offset voltage VOS is defined as the difference between VM and VREF:

Vos  VM  VREF 				(7)
Figure 16: FSM input output signals of the offset regulator when VOS = 50 mV.

This offset definition is used to evaluate offset voltage
compensation using the comparator capacitor banks.
Considering the circuit symmetry, simulations can be
performed by holding d+ or d- at 1 while incrementing
the other from 1 to N - 2. In Fig. 14, the offset voltage is
determined by holding d- at 1 while incrementing d+
from 1 to N - 2 for N - 2 clock cycles.

Fig. 16 shows the applied input signals reset, calib, and
calib’. The reset action initializes both FSM outputs e+
and e- to ‘0’, which corresponds to state S0 of the FSM
shown in Fig. 7(a). Then, with Q+ and Q- equal to ‘1’ and
‘0’, respectively, the FSM outputs e+ and e- become ‘0’
and ‘1’, respectively, which corresponds to state S1. After
35 clock cycles, Q+ and Q- change to the opposite logic
levels, leading e+ and e- to change to ‘1’ and ‘0’, respectively. This change lasts two clock cycles, which corresponds to state S2 followed by state S3 in the FSM. After

In the fifth simulation set, the operation of the offset regulator FSM shown in Fig. 7(a) is evaluated using the circuit
shown in Fig. 15. In this circuit, an offset voltage equal to
50 mV is added in series with a positive comparator input.

97

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

these two cycles, the outputs e+ and e- are set back to
‘0’ which corresponds to the FSM last state S7.
Two signals E+ and E-, which are identical to e+ and e-,
are also generated. Indeed, because calib = 1, an AND
logic operation between calib and e+ and e-, as shown in
Fig. 6, results in the two signals E+ and E-. These signals
are applied to the enable inputs of two 6-bit counters,
leading to two offset calibration control signals d+ and
d-, respectively. Fig. 17 shows the generated control signals, where d+ and d- are equal to 3 and 37, respectively.

Figure 17: Six-bit offset control signals d+ and d- when
VOS = 50 mV.

Figure 18: Offset evaluation with VOS = 50 mV (a) trip
point V TR+ (b) trip point V TR-.

In Fig. 18, the trip points VTR+ and VTR- are determined as
741 µV and -77 µV, respectively. The offset voltage VOS is
determined using (7) and is equal to 332 µV. Thus, the proposed self-calibration method has effectively reduced the
offset voltage from 50 mV to a few hundred microvolts.

system could achieve. Fig. 19 shows the resulting offset
regulator FSM outputs. The system goes through states
S0, S1, S2, and S3. However, the control signal calib is set
to ‘0’ before the FSM reaches state S2, that is, before Q+
and Q- change to the opposite logic levels. Indeed, the
control signal calib is used to disable the counter incrementation when it reaches 2N-2, as discussed in section

In the sixth simulation set, the circuit in Fig. 15 is used
again with an offset voltage equal to 150 mV to evaluate the maximum offset correction that the designed

Table 1: Summary and comparison of the characteristics of fully differential dynamic comparators.
Parameter
Process
Topology
Meas/Sim
Offset regulation
VOS range/max (mV)
Kickback noise during decision (mV)
Comparison rate (GHz)
Power (µW)
Energy/Comp. (fJ/comp)

2002’ [8] 2014’ [23]
CMOS 350 CMOS
nm
90 nm
Diff. pair 4-inputs
Measured Simulated
No
No
80
33
1
0.1
1
580
51
5800
51
98

2016’ [22] 2017’ [24]
CMOS
CMOS 180
nm
40 nm
Diff. pair Diff. pair
Measured Simulated
No
Analog
±1
0.45
3.33
0.5
2100
373
630.63
746

2018’ [25] This work
CMOS 180 CMOS
nm
65 nm
Diff. pair Diff. pair
Simulated Simulated
No
Digital
±5
±9
≈0
1.3
1.25
265
213
203.85
170.4

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

Figure 19: FSM input output signals of the offset regulator when VOS = 150 mV.

Figure 21: Offset evaluation with VOS = 150 mV (a) trip
point V TR+ (b) trip point V TR-.times.

Figure 20: offset control signals d+ and d- when VOS =
150 mV.
5. Therefore, the enable signal E- is no longer identical
to e-. Fig. 20 shows the resulting counters outputs d+
and d-, which are equal to 1 and 62, respectively.
Fig. 21 is used to determine the maximal offset correction. The obtained offset voltage after correction is 4.33
mV. Thus, the system can achieve a maximal offset correction of 145.67 mV.
In the seventh simulation set, the offset voltage is determined while considering the process and mismatch
variations. Fig. 22 shows the offset variation of the designed FDDC under mismatch variation with and without offset calibration with a 100-run Monte Carlo simulation. Without offset calibration, the offset voltage VOS
has a maximum variation of ±160 mV. This offset is reduced to ±9 mV after calibration.

Figure 22: Monte Carlo simulation of the offset voltage
(a) without calibration (b) with calibration.
99

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

The proposed design achieves an effective self-calibration
of the offset voltage. The standard deviation is reduced
from 41.3 mV to 2.23 mV after calibration, resulting in a
decrease of more than 18 times. The offset correction can
be improved by increasing the channel length of the calibration transistors, as discussed in [3], or by increasing the
number of charges in the capacitor banks.

2.

3.

The proposed design performance is summarized
in Table. 1. This table also presents the performance
achieved in current related works on FDDCs. The proposed design is the only one that includes offset calibration and noise cancellation in FDDCs. It achieves the
second-best energy efficiency after a 40 nm CMOS design [22]. However, in [22], no offset regulation is proposed, which would increase the consumed power and
decrease the operation speed.

4.

5.

7 Conclusions
The current paper presented a new and effective methodology design for FDDCs, including kickback noise
immunity and offset self-calibration. In the proposed
design, the kickback noise is almost null during the
decision phase and less than 40 mV during the reset
phase. Moreover, the proposed FDDC achieves an effective digital offset self-calibration, in which the offset
voltage is reduced more than 18 times. The proposed
circuit is designed with minimalist building blocks and
consumes no more than 213 µW at a 1.25 GHz comparison rate. It achieves high performance compared with
the current state-of-the-art achievements in terms of
offset calibration, noise cancellation, operation speed,
power consumption, and design simplicity. Moreover,
the proposed design methodology is generic and independent of the technology used.

6.

7.

8.

8 Acknowledgments

9.

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education
in Saudi Arabia for funding this research work through
the project number (IF-PSAU-2021/01/5237).

10.

9 References
1.

M. I. Dewan and D. H. Kim, “NP-Separate: A New
VLSI Design Methodology for Area, Power, and
Performance Optimization,” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and
Systems, vol. 39, no. 12, pp. 5111–5122, Dec. 2020,
https://doi.org/10.1109/TCAD.2020.2966551.

11.

100

J. Atkinson, A. Bailey, and A. Tajalli, “Systematic
Design of Loop Circuit Topologies Using C/IDS
Methodology,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 30, no. 10, pp.
1538–1542, Oct. 2022,
https://doi.org/10.1109/TVLSI.2022.3181969.
L. Khanfir and J. Mouïne, “Design optimisation
procedure for digital mismatch compensation in
latch comparators,” IET Circuits, Devices & Systems,
vol. 12, no. 6, pp. 726–734, 2018,
https://doi.org/10.1049/iet-cds.2018.5153.
S. Oh et al., “An 85 dB DR 4 MHz BW Pipelined
Noise-Shaping SAR ADC With 1–2 MASH Structure,” IEEE Journal of Solid-State Circuits, vol. 56, no.
11, pp. 3424–3433, Nov. 2021,
https://doi.org/10.1109/JSSC.2021.3086853.
B. Yazdani and S. Jafarabadi Ashtiani, “A Low Power Fully Differential Level-Crossing ADC With Low
Power Charge Redistribution Input for Biomedical
Applications,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 3, pp. 864–868,
Mar. 2022,
https://doi.org/10.1109/TCSII.2021.3127279.
C.-C. Lu and D.-K. Huang, “A 10-Bits 50-MS/s SAR
ADC Based on Area-Efficient and Low-Energy
Switching Scheme,” IEEE Access, vol. 8, pp. 28257–
28266, 2020,
https://doi.org/10.1109/ACCESS.2020.2971665.
Y. Song, Y. Zhu, C.-H. Chan, and R. P. Martins, “A 40MHz Bandwidth 75-dB SNDR Partial-Interleaving
SAR-Assisted Noise-Shaping Pipeline ADC,” IEEE
Journal of Solid-State Circuits, vol. 56, no. 6, pp.
1772–1783, Jun. 2021,
https://doi.org/10.1109/JSSC.2020.3033931.
L. Sumanen, M. Waltari, V. Hakkarainen, and K. Halonen, “CMOS dynamic comparators for pipeline
A/D converters,” in 2002 IEEE International Symposium on Circuits and Systems. Proceedings, May
2002, vol. 5, p. V–V.
https://doi.org/10.1109/ISCAS.2002.1010664.
Y. Liu, S. Fang, and Y. Wang, “A Novel Time-Multiplexed Fully Differential Interface ASIC With
Strong Nonlinear Suppression for MEMS Accelerometers,” IEEE Transactions on Instrumentation
and Measurement, vol. 71, pp. 1–13, 2022,
https://doi.org/10.1109/TIM.2022.3207795.
K.-J. Moon, D.-R. Oh, M. Choi, and S.-T. Ryu, “A 28nm CMOS 12-Bit 250-MS/s Voltage-Current-Time
Domain 3-Stage Pipelined ADC,” IEEE Transactions
on Circuits and Systems II: Express Briefs, vol. 67, no.
12, pp. 2843–2847, Dec. 2020,
https://doi.org/10.1109/TCSII.2020.2990910.
BA. T. Ramkaj, M. J. M. Pelgrom, M. S. J. Steyaert,
and F. Tavernier, “A 28 nm CMOS Triple-Latch
Feed-Forward Dynamic Comparator With <27 ps
/ 1 V and <70 ps / 0.6 V Delay at 5 mV-Sensitivity,”

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

IEEE Transactions on Circuits and Systems I: Regular
Papers, vol. 69, no. 11, pp. 4404–4414, Nov. 2022,
https://doi.org/10.1109/TCSI.2022.3199438.
P. M. Figueiredo and J. C. Vital, “Kickback noise reduction techniques for CMOS latched comparators,” IEEE Transactions on Circuits and Systems II:
Express Briefs, vol. 53, no. 7, pp. 541–545, Jul. 2006,
https://doi.org/10.1109/TCSII.2006.875308.
L. Khanfir and J. Mouine, “Clock delay-based design for hysteresis programming and noise reduction in dynamic comparators,” Analog Integr Circ
Sig Process, vol. 106, no. 2, pp. 409–419, Feb. 2021,
https://doi.org/10.1007/s10470-020-01656-3.
R. K. Siddharth, Y. Jaya Satyanarayana, Y. B. Nithin
Kumar, M. H. Vasantha, and E. Bonizzoni, “A 1-V,
3-GHz Strong-Arm Latch Voltage Comparator for
High Speed Applications,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 12,
pp. 2918–2922, Dec. 2020,
https://doi.org/10.1109/TCSII.2020.2993064.
L. Sumanen, M. Waltari, and K. Halonen, “A mismatch insensitive CMOS dynamic comparator for
pipeline A/D converters,” in ICECS 2000. 7th IEEE
International Conference on Electronics, Circuits
and Systems, Dec. 2000, vol. 1, pp. 32–35 vol.1.
https://doi.org/10.1109/ICECS.2000.911478.
V. Katyal, R. L. Geiger, and D. J. Chen, “A New High
Precision Low Offset Dynamic Comparator for
High Resolution High Speed ADCs,” in APCCAS
2006 - 2006 IEEE Asia Pacific Conference on Circuits
and Systems, Dec. 2006, pp. 5–8.
https://doi.org/10.1109/APCCAS.2006.342249.
P. P. Gandhi and N. M. Devashrayee, “A novel low
offset low power CMOS dynamic comparator,”
Analog Integr Circ Sig Process, vol. 96, no. 1, pp.
147–158, Jul. 2018,
https://doi.org/10.1007/s10470-018-1166-9.
K. Ohhata et al., “A 900-MHz, 3.5-mW, 8-bit Pipelined Subranging ADC Combining Flash ADC and
TDC,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 9, pp. 1777–1787,
Sep. 2018,
https://doi.org/10.1109/TVLSI.2018.2827943.
J. Yuan and C. Svensson, “High-speed CMOS circuit technique,” IEEE Journal of Solid-State Circuits,
vol. 24, no. 1, pp. 62–70, Feb. 1989,
https://doi.org/10.1109/4.16303.
P. Nuzzo, F. D. Bernardinis, P. Terreni, and G. V. der
Plas, “Noise Analysis of Regenerative Comparators for Reconfigurable ADC Architectures,” IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 55, no. 6, pp. 1441–1454, Jul. 2008,
https://doi.org/10.1109/TCSI.2008.917991.
L. Khanfir and J. Mouïne, “Systematic Hysteresis
Analysis for Dynamic Comparators,” Journal of

22.

23.

24.

25.

Circuits, Systems, and Computers, vol. 28, no. 06, p.
1950100, Jun. 2019,
https://doi.org/10.1142/S0218126619501007.
V. Milovanović and H. Zimmermann, “A two-differential-input/differential-output fully complementary self-biased open-loop analog voltage
comparator in 40 nm LP CMOS,” in 2014 29th
International Conference on Microelectronics Proceedings - MIEL 2014, May 2014, pp. 355–358.
https://doi.org/10.1109/MIEL.2014.6842163.
M. Hassanpourghadi, M. Zamani, and M. Sharifkhani, “A low-power low-offset dynamic comparator for analog to digital converters,” Microelectronics Journal, vol. 45, no. 2, pp. 256–262, Feb.
2014,
https://doi.org/10.1016/j.mejo.2013.11.012.
S. Naghavi et al., “A 500 MHz low offset fully differential latched comparator,” Analog Integr Circ Sig
Process, vol. 92, no. 2, pp. 233–245, Aug. 2017,
https://doi.org/10.1007/s10470-017-0998-z.
P. P. Gandhi and N. M. Devashrayee, “A novel low
offsetlow power CMOS dynamic comparator,”
Analog Integr Circ Sig Process, vol. 96, no. 1, pp.
147-158, Jul. 2018,
https://doi.org/10.1007/s10470-018-1166-9.

Copyright © 2023 by the Authors.
This is an open access article distributed under the Creative Commons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted
use, distribution, and reproduction in any medium,
provided the original work is properly cited.
Arrived: 05. 04. 2023
Accepted: 23. 08. 2023

101

L. Khanfir et al.; Informacije Midem, Vol. 53, No. 2(2023), 87 – 102

102

Original scientific paper
https://doi.org/10.33180/InfMIDEM2023.205

Journal of Microelectronics,
Electronic Components and Materials
Vol. 53, No. 2(2023), 103 – 117

Towards Smaller Single-point Failure-resilient
Analog Circuits by Use of a Genetic Algorithm
Žiga Rojec
Department EDA, Faculty of Electrical Engineering, University of Ljubljana, Slovenia
Abstract: Failure-resilient analog circuits are difficult to design, but artificial intelligence can help crawl the topology solution space.
Using evolutionary computation-based topology synthesis we evolve analog arcus tangent computational circuits, resilient to any
rectifying diode or resistor high-impedance single failure or removal. We encode analog circuit topologies as individuals with an
upper-triangular incident matrix. Circuits are evolved using a combined technique utilizing parts of NSGA-II and PSADE, based on a
special three-dimensional robustness function. We show that topology size for a failure-resilient circuit can be classes smaller than
hand-made component-redundancy-based solutions. Our best failure-resilient topology comprises six diodes, three resistors, and a
voltage offset source.
Keywords: analog circuits, analog circuit synthesis, circuit optimization, failure-resilience, circuit robustness

Manjšanje analognih vezij odpornih na odpoved
poljubne komponente z uporabo genetskega
algoritma
Izvleček: Analogna vezja, ki so odporna na napake, je težko načrtovati. Pri prečesavanju prostora možnih topologij lahko pomaga
umetna inteligenca. Z sintezo topologij, temelječi na evolucijskem algoritmu, smo razvili analogno računsko vezje za inverzni tangens,
ki je odporno na visokoimpedančno okvaro posamezne komponente (diode ali upora) ali njene odstranitve. Topologija analognega
vezja je v algoritmu zapisana v obliki zgornje-trikotne vpadne matrike. Vezja razvijemo z uporabo kombinirane metode z uporabo
večkriterijskega optimizacijskega algoritma NSGA-II in PSADE, kjer je za usmerjanje sinteze razvita posebna tri-kriterijska funkcija
robustnosti. V članku prikazujemo kako zmanjšati velikost topologije, odporne na odpoved komponente, na razrede manjšo velikost
od ročno izdelanih robustnih topologij, ki temeljijo na redundanci posameznih komponent. Naš najboljši rezultat je analogno
računsko vezje za inverzni tangens, ki je sestavljeno iz šestih diod, treh uporov in odmičnega napetostnega vira.
Ključne besede: analogna vezja, sinteza analognih vezij, optimizacija vezij, odpornost na napake, robustnost vezij
* Corresponding Author’s e-mail: ziga.rojec@fe.uni-lj.si

1 Introduction

However, customer requirements might get even harder. When a device is targeted for use in harsh conditions
(i. e., space exploration, aeronautical missions, automotive, robotics), we expect the product to be robust
against extreme temperature swings, high ionizing and
electromagnetic radiation levels, high working currents, and more. That kind of stress can lead to component faults and premature device failure. Furthermore,
failed components in remote and unmanned missions
could not be replaced easily.

Design of an analog circuit is a challenging task, especially when the product has to meet high standards
and fulfill tough requirements.
Designers often use various simulation tools to predict
temperature, humidity, and electromagnetic behavior
during circuit operation. Furthermore, to predict the
blueprint manufacturability and maximize the production yield, they also use statistical methods, such as
Monte Carlo analysis [1].

How to cite:
Ž. Rojec, “Towards Smaller Single-point Failure-resilient Analog Circuits by Use of a Genetic Algorithm", Inf. Midem-J. Microelectron. Electron. Compon. Mater., Vol. 53, No. 2(2023), pp. 103–117
103

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

Researchers have already focused on hardening electronic devices against failures per se [2]. The classical
ways of doing that include component redundancy,
overdesign, shielding and insulation, thermal management, and so on. Most of the time such solutions significantly increase the size, weight, and finally, the cost of
the device. The upper methods usually aim to protect
every circuit component as if it was the main breaking
point of the system.

man expert in the industry, AI might help in the rapid
exploring of undiscovered topology space, thereby
helping and speeding up the design process.

Researchers have already proposed systems resilient
to failures that occur in vivo. Meaning, the circuit has
the ability to persist functional when one or more components fails during the circuit operation [3]–[6]. Such
systems usually utilize duplicated circuit modules to
form redundant sub-systems which are controlled by
various voting mechanisms [3], [7]. However, the demultiplexer then becomes the weak part of the system.

1.1.1 Synthesis method
Analog topology synthesis is an extremely non-linear
and complex task, which is why most existing approaches in this field search topology with a method,
based on the Darwinian selection of the fittest, i.e. evolutionary or genetic algorithm.

Reviews of existing analog circuit synthesis techniques
can be found in existing literature [21], [22]. However,
we give a brief overview of existing topology synthesis
efforts for extremely robust and failure-resilient analog
circuits below.

Somehow special are the works of Zebulum and Keymeulen, et. Al., who presented an evolutionary algorithm that is being run on the controlling unit of the
circuit under failure, in vivo [4], [12].

This paper shows an alternative method of evolving failure-resilient analog circuits. Using an intensive
evolutionary search, we can find novel analog circuit
topologies that exhibit robustness to any electronic
component (semiconductor diode or resistor) highimpedance failure or removal, without a dedicated active demultiplexing system.

Evolutionary methods demonstrate a capacity to tackle unconventional challenges. One compelling reason
that supports the continued relevance of evolutionary
computation, even when compared to neural networks
like GNNs, is that they do not always require prior training to align with the defined cost function.

We show in this work, that by using an evolutionary topology synthesis tool, we can greatly reduce the size
and the number of needed components to achieve
failure-resilience of an analog circuit, compared to canonical hand-made design.

However, emerging tools rooted in GNNs, like CktGNN,
showcase impressive capabilities in generating robust
circuit topologies [23].
1.1.2 Synthesis goals and degrees of robustness
Passive filters are usually the entry point for showing
the performance of analog circuit synthesis tools. Most
of the works on failure-resilience also experimented
with the synthesis of robust passive analog filter circuits, dealing with various degrees of component
faults. Resistor/capacitor/inductor removal was considered in [9], [15], while in addition [3], [7] also studied the complexity of partial and full short-circuit and
high-impedance faults. Studies [24]–[27] only considered R/L/C parameter perturbation without full component failure.

To the best of our knowledge, this is one of the few
published works on the automated synthesis of a priori robust, failure-resilient nonlinear computational
analog circuits [3], [4], [8]–[15], and also one of the first
attempts of redundancy reduction by using evolutionary search.
The paper is organized as follows. We summarize previous work on robust topology synthesis in Section 1.1
and describe our motivation in 1.2. We describe the
applied topology synthesis technique in Section 2. Results are given in Section 3, summarized in 3.8 and concluded in Section 4.

Other authors reported syntheses of
compensator circuit [8] and
inverter, amplifier, and oscillator [13] resilient to
bipolar transistor removal,
PID controller with R/L/C removal resilience [10],
transistor-fault resilient amplifier [11],
half-wave rectifier, NOR gate, and voltage-controlled oscillator for extreme temperature swings
(in situ evolution) [12]

1.1 Previous work
The discovery of novel circuit topologies has been
done by hand for over a century. This is changing with
the availability of novel tools, relying on artificial intelligence [16]. Since the beginning of this research area
[17]–[19], computer-aided circuit synthesis has become human-competitive and trustworthy for fabrication [16], [20]. We believe, rather than replacing a hu104

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

-

XNOR gate, analog multiplier, and inverter resilient to arbitrary faults in the controlling unit FPTA
(Field Programmable Transistor Array) [4]
the natural logarithm and square-root analog
computational circuits resilient to semiconductor
diode short-circuit or high-impedance malfunction [28]

redundancy on a single-component level. In the case
of an arctan circuit, every diode has to be paired in
parallel and every resistor has to be (at least) tripled in
parallel. Two diodes in parallel give a sub-circuit where,
theoretically, any of the two diodes might enter highimpedance failure without transfer function transformation. Single resistor with resistance Rn has to be replaced with three parallel resistances 3 Rn to maintain
33% relative error of sub-circuit in case of one resistor
entering high-impedance failure.

1.2 Motivation
1.2.1 Failure-resilience
For this work, let us define failure-resilience as an analog
circuit topology property, where any of the basic components (diode or resistor) can be removed or replaced
with high-impedance failure, with the circuit showing
minimal-to-zero deformation of nominal signal processing abilities. The voltage source and the 10 k W inputpullup resistor are excluded from the definition.

Figure 4 shows a hand-designed topology that fulfills
the failure-resilience criteria. Fair nominal response
and narrow error range in failure cases are presented
in Figure 5 and Figure 6. Evidently, the circuit topology
hence the number of needed components goes offscale. While the nominal non-robust topology includes
10 resistors and 5 diodes (excluding the input resistor,
see 1.2.1), the hand-made robust version comprises 30
resistors and 10 diodes. In CMOS technology, for example, resistors occupy large chip areas [30]. In addition,
those resistances are multipliers of the nominal values,
which further multiplies the needed area for fabrication. The circuit total cost would be above comparison
to the nominal non-robust version.

The methodology incorporates various failure scenarios using specialized “failure-defining” Spice models,
as demonstrated in our prior work [28], where we successfully synthesized analog circuits resilient to both
high-impedance and short-impedance failures in semiconductor diodes. In this paper, we primarily concentrate on minimizing topologies that are fully resilient to
high-impedance failures. However, due to high computational costs, we do not address short-circuit failures
for all component types in this paper; this topic is left
for future research.

However, novel studies of analog topology synthesis
imply, that number of needed components for failureresilience might somehow be lower than expected in
hand-made designs [3], [7]. The possible reason for that
phenomenon is that open-ended topology synthesis
allows component-level redundancy to be replaced
with system-level redundancy.

1.2.2 Size of failure-resilient circuits
Failure-robustness comes with a cost. It is generally
paid by (often significantly) higher total number of
needed components for the same nominal task as a
non-robust circuit would perform. For a system to survive such rigorous change, as one or any component
removal/failure, redundant elements and connections
must be available in the system.

1.2.3 Topology size as a synthesis constraint
In this study, we explored the lower limits of topology
size for a failure-resilient computational analog circuit.
We show, that for the arcus-tangent circuit, the topology could be reduced from 40 critical components in
hand-made design down to 8 components by evolutionary-based synthesis. This also has fewer components than used hand-made non-robust design (15).

Let us consider an example of a non-linear, computational analog circuit from Figure 1. The circuit outputs
an inverse tangent of input voltage signal between 0
and 10 V. It is a hand-designed linear voltage divider,
with diodes used to switch between five linear segments, which closely interpolate the mathematical
function [29]. Due to its simplicity, the topology is often
used instead of the amplifier-chain summing circuit. If
any of the components in the dotted square (except
for the voltage source) fails (or is removed), the circuit’s
transfer function severely changes as seen in Figure 2
with absolute error range plot and Figure 3 with relative error plots.
The most common and straightforward approach to
achieving failure-resilience property is to introduce

Figure 1: Canonical hand-designed piece-wise linear
arctan computational circuit topology.
105

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

Our study provides step-by-step size-reducing results
for further investigation and a better understanding of
underlying mechanisms.
Primary contribution of this paper lies in the demonstration
of a novel application of evolutionary methods, resulting in
the attainment of system robustness that has not been observed in any existing systems or circuits within the literature.

Figure 5: Hand-designed failure-resilient arctan circuit:
nominal response (black) covers the arctan function.
The range of various failure responses is given in blue.

Figure 2: Hand-designed non-robust arctan circuit: nominal response (black) completely covers the arctan function. The range of various failure responses is given in blue.

Figure 6: Hand-designed failure-resilient arctan circuit:
relative error curves of nominal (solid) and component
failures (dotted and dashed).

2 Methods
In this section, we provide details of the methods used
in this circuit synthesis. The applied approach is mostly
based on [28].

Figure 3: Relative error curves of nominal (solid) and
component failures (dotted and dashed).

2.1 Analog Circuit Representation
Upper-triangular incident matrix is a well-proven
method of encoding an analog circuit topology [22],
[28], [31]. It is based on a fixed set of available component terminals. Each building block can comprise one
or more input/output terminals (see Figure 7). Usually,
the building-block terminals are located on the left
side of the fixed set, and outer connections are located
on the right-side of the set. The set is then mirrored in
two dimensions, forming a connection matrix, where
the logical one represents an existing zero-impedance

Figure 4: Hand-designed piece-wise linear arctan
computational circuit, robust to any single component
high-impedance failure or removal.
106

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

!

!

!

!
!

!

C1

!
!

!
!

Inner-connections

Vin

!

R2

R1

Forbidden sector
without connections

!

offspring 1

R2

!

N=1

offspring 2

!

!

Outer-connections

R1

C1

R2

R1

8 shows two examples of newly-created offspring with
one terminal (N=1) and three terminal (N=3) information being exchanged. Note that in the applied algorithm, the number of exchanged terminal connections
N is a randomly-chosen number from the set {1,2,3}.

parents

connection between the terminals on both axes. The
matrix is filled with logical ones on a diagonal so that
by definition, every terminal is connected to itself. Only
the upper matrix triangle is used to exclude half of the
redundant mirror connections from the bottom triangle, to reduce the effective matrix size, without sacrificing any topology search space [31], [32]. Additionally,
in the inner-connections sector of the matrix, we allow
every possible connection, while in the outer-connection section only one positive logical value is allowed
per line, filtering-out any connections between outer
terminals.

N=3

Figure 8: Topology crossover examples. For better illustration, parent no. 2 is a full upper-triangular matrix [31].

Vout

The value vector is being optimized using two different
methods. The first one is a reproduction mechanism,
inspired by a well-known intermediate crossover [33].
The choice between topology-matrix or value-vector
crossover is initiated by the evolutionary algorithm. In
one case offspring will inherit a modified topology and
in another a modified parameter.

C1

Figure 7: An example of an upper-triangular matrix,
representing a simple T-shaped analog circuit topology [31].

Another parameter tuning technique in this work is an
established PSADE (Parallel Simulated Annealing and
Differential Evolution) [34]. Due to its computational
expensiveness (yet effectiveness), it is triggered only
every 10th generation on one to three best individuals.

Components with adjustable parameters (i.e., resistances, capacitances, transistor widths and lengths, etc.)
have their values organized in a separate array, called
value vector. While the topology matrix is purely binary, the value vector is a numeric entity.

2.3 Fitness function

2.2 Genetic Reproduction and Sizing

The fitness function should encompass the desired
properties of the circuit. Additionally, it should filter
out individuals with unwanted properties and help to
guide the searching algorithm through the valley of
local minima. We will briefly review the applied fitness
function below, but the full justification of chosen criteria is given in [28].

For evolutionary computation and mimicking natural genetic reproduction, we use the topology-matrix
crossover technique, described in [31]. Every terminal is
connected to other terminals via the logical values that
reside on a column and a row, intersecting the diagonal element, that represents the connection to itself. By
exchanging the two lines of the matrix with another
topology matrix, the information of the terminal connecting with the rest of the circuit is transferred. Figure

In the case of open-ended topology synthesis, the fitness function definition is rather complex and com107

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

prises several stages. The first is an evaluation of the
circuit’s transfer function, i.e. signal processing quality,
using a DC analysis in Spice simulator. In the case of
arctan circuit design (let us denote the mathematical
function as g) we calculate the root mean square error
(RMSE) between Vout (Vin) and g(Vin). We call the result
fitness and denote it as f.

harden the circuit against the D0 removal or high-impedance failure. Let us have four additional diodes to
fulfill that requirement (one would be enough, but we
assume the search algorithm does not know that). The
search algorithm can encounter a topology with four
diodes with no effect on the nominal transfer function
(example in Figure 9 (right)). Still, if D0 fails, the circuit
does, too. However, if any of D1-4 fails, the circuit still
delivers the transfer function. It appears as only 20%
of critical components (diodes) cause a fatal scenario
for the circuit. The latter circuit might get promoted
because of its better “robustness” value. Obviously, this
is not the case, because D1-4 are not electrically connected and do not play any role in signal processing.
That kind of circuit has to be ranked out since it does
not contribute to real circuit robustness.

Calculation of failure-resilient circuit fitness needs to
be carried out for every predicted failure scenario. In
our work, failure-resilience is defined as the high impedance failure of any resistor or semiconductor diode
(see 1.2.1). In the case of 30 resistors and 10 diodes, the
total number of RMSE calculations must be 41 – that
is one for nominal (no failure) scenario fnom, and 40 for
every critical device failed, multiplied by the number
of failure types considered (only one failure type in this
case). Vector f comprises all RMSE results:

f   f nom , f1,1 , ..., f1, F , ..., f N , F  		 (1)
where N is the total number of critical components and
F is the total number of failure types [28].
Failure-resilient circuit evaluation is carried out in multiple dimensions, and forms a three-dimension robustness vector r:

Figure 9: False-robustness problem [28].
Inclusiveness [28] successfully unfolds the false-robustness problem. Using modified diode models and SPICE
simulator commands we determine which of the components are electrically connected (included) and have
an effect on signal processing. Inclusiveness (denoted
by I) is calculated as a ratio between the number of all
critical and included components. Having an updated
robustness definition:

 f nom 


r   f max  					(2)
f 


where fnom is RMSE result of no-failure, nominal circuit
topology, fmax is the maximum of vector f and αf is the
standard deviation of the same vector [28]. Vector r
gives insight into a single failure-resilient candidate
nominal performance
performance in case of worse single-point failure and
statistical failure scattering.

 f nom 


r   f max  I 				
f 



(3)

circuits with greater inclusiveness are promoted over
the circuits with floating or flawed connected components. However, this can lead the synthesis to build
larger circuits with excessive redundancy, so component number limits must be set elsewhere in the algorithm. In our case, the top number of available devices
is set in the pre-defined component set, which also defines the topology-matrix size. Note that only the inclusiveness of diodes was considered in our work.

This separation gives a chance to the NSGA-II algorithm
to non-dominantly sort the individuals into Paretofronts and by that maintain the genetic diversity, thus
avoiding premature convergence.
In the specific case of a failure resilient circuit synthesis,
a practitioner might encounter a false-robustness phenomenon, which we explain below.
Let us consider an example of a simple diode half-wave
rectifier (Figure 9, left). If D0 fails or is removed, the rectifier is no longer working, and statistically, one critical
component (diode) makes a 100% chance of circuit
failure. Imagine a topology modification, that would

2.4 Synthesis algorithm
The search and sorting algorithm utilize major ideas
from NSGA-II [35].
108

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

The evolutionary algorithm is initiated by a randomly
generated population. Then every individual is evaluated according to the fitness/robustness from Section 0.
Sorting is performed in three steps, following NSGA-II.
In the first step, individuals that do not dominate each
other (are not beaten in any combination of objectives)
are assigned to a front (i.e. Pareto front). The remaining
individuals are put in a second, third, etc., front, with the
same non-dominance criteria. A new generation assembly is the second step. We aggregate the new generation
starting with individuals from the 1st front, and continue
with available individuals from further fronts. Because
a union of parents and offspring is usually larger than
available space in the new generation, there is a front of
individuals, that does not fit as a whole to the new generation. A selection between non-dominated individuals needs to be undertaken. This is done in the third step,
the crowding distance calculation. The crowding distance is the distance between two neighboring points
(i.e. individuals) along each of the objective axes. Ranking individuals with higher crowding distance helps to a
more even distribution in a front of individuals.

We repeat the synthesis algorithm until at least one of
the stopping criteria (i.e., design requirements, max.
number of generations, timeout.) is met. When ten generations have passed, we run a PSADE [34] parameter
optimization on three of the best circuits from the population and thus fine-tune the ambitious individuals.
Figure 10 summarizes the main synthesis algorithm
steps.

2.5 Finding minimal topology
Our objective was to evolve circuits with consistent
performance even if devices are removed. Initially,
we aimed to incorporate as many “redundant” components as possible. However, circuit size doesn’t always reflect actual functional contributions, leading to
“dummy” or electrically connected but non-functional
components.
To address this, we introduced “Inclusiveness” to prevent circuits dominated by dangling sub-circuits,
enhancing evolutionary outcomes. Individuals with
greater inclusiveness measure propagate more effectively. Our experimentation revealed a paradox when
maximizing redundancy while minimizing circuit size
simultaneously. Hence, we perform separate stages for
minimizing and maximizing circuit schematics. We are
listing two more reasons, why the size of circuit schematics is not another objective of NSGA-II search.

After the assembly of the new generation, a parent selection process takes place. With the tournament, some
randomly selected individuals are chosen from the
generation. The selected individuals compete based
on their front number (lower is better) and crowding
distance (higher is better). Two tournaments take place
to choose two future parents.
Having selected two parents, their genetic material
gets reproduced. This can be done by mating their
genetic material as in 0 or by mutating it. Control over
mating/mutation is a statistical probability, set at the
beginning of the algorithm. Similarly, a probability parameter controls whether the topological or parametric part of the gene will be mated/mutated.

Our topology representation method using an uppertriangular incident matrix limits arbitrary extensions
during evolution runs. Varying matrix sizes in the evolutionary pool cause inconsistent crossovers and mating patterns.
The third concern relates to the computational complexity of NSGA-II and evaluating circuits under different failure scenarios. A variable maximum component
number during evolution would increase computational effort, impacting NSGA-II’s performance and circuit robustness evaluation. As a result, we chose not to
experiment with variable component numbers to minimize computational burden.

Initial population
Evaluation
Sorting

(calculate fitness/robustness)

(according to rank and crowding-distance)

Tournament

(parent selection)

Reproduction

(offspring creation)

3 Results

Offspring evaluation
END

True

Criteria
met?

False

10th
generation?

Our experiment comprised eight independent topology searches. For each synthesis we predefined the
set of available components, that is Nd diodes and Nr
resistors that are subject to possible high-impedance
failure. Voff and a Rin input resistor (the latter was nonoptional) were also available with each synthesis but
were excluded from failure consideration.

False

True
PSADE parameter opt. on
3 of best individuals

Figure 10: The applied evolutionary algorithm flowchart [31].
109

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

The main part of the experiment was discovering the
possibilities of finding topologies with fewer components than in hand-designed examples (e.g., from Figure 4), that perform arcus tangent analog calculation
and exhibit the failure-resilience property (1.2.1).

(roughly 15 hours). The outcome is presented in Figure
11. The final topology comprises all 12 available diodes.
Some resistors were excluded from the final topology
since they do not have any signal-processing effect
(such as short-connected resistors, or resistors connected to simulator-helper nodes). The voltage source
was also not included in the final design. We excluded
some of the components already from topology schematics in Figure 11.

The genetic algorithm parameters were fixed through
the experiment and are summarized in Table 1.
Table 1: Genetic algorithm properties.
Parameter
Population
Tournament
Mating prob.
Topology reproduction prob.

We summarize the circuit performance in three parameters: nominal topology RMSE is 0.312, the worst failure
RMSE is 0.370 and the standard distribution of all cases
(nominal and failures) is 0.026. One can visualize those
results in Figure 12 and Figure 13.

Value
1000
3
0.6
0.8

Resistance values were limited to the range between
10 and 100 kW, and voltage source with DC range of 0
to 6 V. Every synthesis was conducted on an i9 HP desktop, utilizing 16 computational threads on 8 processor
cores.

3.1 Synthesis with a max of 12 diodes, 12 resistors
With the ambition to cut the number of needed components for the circuit, we gave the first upper limit of
Ndmax = 12 and Nrmax=12. This is already a significant cut
of the total number of components (Nd + Nr) in comparison to hand designed example from Figure 4 which
comprises 40 components. The algorithm can, however, synthesize a topology with fewer elements.

Figure 12: Synthesized arctan computational circuit
(Ndmax = 12, Nrmax = 12): nominal response (black),
arctan function (red, dashed-dotted). The range of various failure responses is given in blue.

Starting with a random population, without any prior
knowledge available in the population itself, we let the
combined NSGA-II algorithm run for 306 generations

Figure 13: Synthesized arctan computational circuit
(Ndmax = 12, Nrmax = 12): relative error curves of nominal
(solid) and component failures (dotted and dashed).
Figure 11: Synthesized arctan computational circuit
(Ndmax = 12, Nrmax = 12), robust to any single component
high-impedance failure or removal.

Together with a voltage source, six available resistors
were not used in the final circuit. That is why we con110

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

ducted our experiment with tighter device component
limits.

3.2 Synthesis with a max of 10 diodes, 10 resistors
The next synthesis was limited to Ndmax = 10 and Nrmax=10.
We stopped the algorithm after 822 generations (that was
after 33h).
The outcome is presented in Figure 14. The final topology comprises all 10 available diodes. Two resistors
were not included in the final topology.

Figure 16: Synthesized arctan computational circuit
(Ndmax = 10, Nrmax = 10): relative error curves of nominal
(solid) and component failures (dotted and dashed).

3.3 Synthesis with a max of 8 diodes, 8 resistors
We proceed with Ndmax = 8 and Nrmax= 8. We stopped
the algorithm after 432 generations (11h).
The outcome is presented in Figure 17. The final topology comprises 6 diodes and 6 resistors that can fail during the circuit operation. Two resistors and two diodes
were not included in the final topology.

Figure 14: Synthesized arctan computational circuit
(Ndmax = 10, Nrmax = 10), robust to any single component
high-impedance failure or removal.
Circuit performance: nominal topology RMSE is 0.158,
the worst failure RMSE is 0.270 and the standard distribution of all cases (nominal and failures) is 0.032. One
can visualize failure ranges in Figure 15 and Figure 16.
This circuit performs better than the one from the previous synthesis, according to the three observables. It
also comprises 2 diodes less and four resistors more.

Figure 17: Synthesized arctan computational circuit
(Ndmax = 8, Nrmax = 8), robust to any single component
high-impedance failure or removal.
Circuit performance: nominal topology RMSE is 0.149,
the worst failure RMSE is 0.152 and the standard distribution of all cases (nominal and failures) is 0.017. One
can visualize failure ranges in Figure 18 and Figure 19.

Figure 15: Synthesized arctan computational circuit
(Ndmax = 10, Nrmax = 10): nominal response (black),
arctan function (red, dashed-dotted). The range of various failure responses is given in blue.
111

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

Figure 18: Synthesized arctan computational circuit
(Ndmax = 8, Nrmax = 8): nominal response (black), arctan
function (red, dashed-dotted). The range of various failure responses is given in blue.
Figure 20: Synthesized arctan computational circuit
(Ndmax = 6, Nrmax = 6), robust to any single component
high-impedance failure or removal.

Figure 19: Synthesized arctan computational circuit
(Ndmax = 8, Nrmax = 8): relative error curves of nominal
(solid) and component failures (dotted and dashed).
Because the algorithm kept solving the problem using
less than the maximum of available components, we
proceed and further tighten the Ndmax and Nrmax criteria.

Figure 21: Synthesized arctan computational circuit
(Ndmax = 6, Nrmax = 6): nominal response (black), arctan
function (red, dashed-dotted). The range of various failure responses is given in blue.

3.4 Synthesis with a max of 6 diodes, 6 resistors

3.5 Synthesis with a max of 5 diodes, 5 resistors

We stopped the Ndmax = 6 and Nrmax= 6 synthesis after
2340 generations (48 h).

The Ndmax = 5 and Nrmax= 5 synthesis was stopped after
2582 generations (36 h).

Figure 20 shows the outcome. The final topology uses all
available diodes and three out of six available resistors.

As shown in Figure 23, the final topology comprises all
available components.

Circuit performance: nominal topology RMSE is 0.106,
the worst failure RMSE is 0.110 and the standard distribution of all cases (nominal and failures) is 0.008. One
can visualize failure ranges in Figure 21 and Figure 22.

Although the synthesis comprises only ten critical
components (plus voltage source and input resistor),
the performance was not yet diminished. The nominal
112

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

Figure 22: Synthesized arctan computational circuit
(Ndmax = 6, Nrmax = 6): relative error curves of nominal
(solid) and component failures (dotted and dashed).

Figure 25: Synthesized arctan computational circuit
(Ndmax = 5, Nrmax = 5): relative error curves of nominal
(solid) and component failures (dotted and dashed).

3.6 Synthesis with a max of 4 diodes, 4 resistors
Searching for the bottom limit, we conducted the Ndmax = 4
and Nrmax= 4 synthesis. We finished it after 1077 generations
and 12h.
The final topology comprised 4 resistors and 4 diodes
(Figure 26).

Figure 23: Synthesized arctan computational circuit
(Ndmax = 5, Nrmax = 5), robust to any single component
high-impedance failure or removal.

Figure 26: Synthesized arctan computational circuit
(Ndmax = 4, Nrmax = 4), robust to any single component
high-impedance failure or removal.
The nominal topology RMSE is 0.173, the worst failure
RMSE is 0.217 and the standard distribution of all cases
is 0.028. See failure ranges in Figure 27 and Figure 28.
We have discovered, that this synthesis is a probable
bottom limit in our experiment. To illustrate, how a
smaller design poorly fits the requirement, we show
one more synthesis.

Figure 24: Synthesized arctan computational circuit
(Ndmax = 5, Nrmax = 5): nominal response (black), arctan
function (red, dashed-dotted). The range of various failure responses is given in blue.
topology RMSE is 0.108, the worst failure RMSE is 0.165
and the standard distribution of all cases is 0.022. See
failure ranges in Figure 24 and Figure 25.
113

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

3.7 Synthesis with a max of 3 diodes, 3 resistors
Using limits Ndmax = 3 and Nrmax= 3 synthesis, we finished the search after 3188 generations (11h).
See Figure 29 for the topology. The nominal topology
RMSE is 0.497, the worst failure RMSE is 0.507 and the
standard distribution is 0.010. Failure ranges are shown
in Figure 30 and Figure 31. We can observe a two-piece
approximation of the arctan function, which yields
high RMSE.

Figure 27: Synthesized arctan computational circuit
(Ndmax = 4, Nrmax = 4): nominal response (black), arctan
function (red, dashed-dotted). The range of various failure responses is given in blue.

Figure 30: Synthesized arctan computational circuit
(Ndmax = 3, Nrmax = 3): nominal response (black), arctan
function (red, dashed-dotted). The range of various failure responses is given in blue.

Figure 28: Synthesized arctan computational circuit
(Ndmax = 4, Nrmax = 4): relative error curves of nominal
(solid) and component failures (dotted and dashed).

Figure 31: Synthesized arctan computational circuit
(Ndmax = 3, Nrmax = 3): relative error curves of nominal
(solid) and component failures (dotted and dashed).

Figure 29: Synthesized arctan computational circuit
(Ndmax = 3, Nrmax = 3), robust to any single component
high-impedance failure or removal.

3.8 Result Summary
Table 1 summarizes the experiment results. Surprisingly, tightening the number of available diodes and
resistors has led to improved circuit performance in
114

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

both nominal functionality and robustness, with its
best at Nd=6, Nr=3. Although initial syntheses involved
searches over Ndmax > 6, Nrmax > 3 topology space, the
Nd = 6, Nr = 3 best solution was not discovered in these.

thesis tool, we introduced novel topologies of analog
arcus tangent circuit. The most compact one comprises
six diodes, three resistors, a voltage source, and an input resistor. Each of the diodes and the three resistors
can fail or be removed, with almost no computational
error.

Table 2: Results of a conducted experiment. Every row
is an independent topology synthesis with different
num. of component limits. The first row is the handmade robust design.
Ndmax
N/A
12
10
8
6
5
4
3

Nrmax
N/A
12
10
8
6
5
4
3

Nd
10
12
10
6
6
5
4
2

Nr
30
6
8
6
3
5
4
3

fnom
0.116
0.312
0.158
0.149
0.106
0.108
0.173
0.497

fmax
0.262
0.370
0.270
0.152
0.110
0.165
0.217
0.507

Based on this research, we can conclude that the integration of system redundancy for single-point failures
was achieved by imposing a strict limitation on the
maximum size of available components. We showed,
that to achieve such resilience, surprisingly low number of electrical components is needed.

sf
0.047
0.026
0.032
0.017
0.008
0.022
0.028
0.010

In the realm of CMOS design, reducing the number
of components doesn’t necessarily translate to cost
savings on its own. However, we conducted a brief
analysis of the total resistance for both robust circuits,
encompassing both hand-crafted and synthesized designs. Total resistance can provide a rough estimate of
circuit area in certain CMOS processes. For instance, the
total resistance of a hand-designed circuit (as shown in
Fig. 4) amounts to approximately 219 kΩ, whereas the
resistance of the best synthesized circuit totals around
20 kΩ (a difference of a decade).

There might be several reasons for that phenomenon.
The first, most obvious one, is an enormous search
space for topology search. Within one synthesis run, we
cannot sample every possible circuit, but rather crawl
the space using the evolutionary search. This is why
two evolutionary syntheses with the same goal but
different initial settings might not produce the same
outcome.

Furthermore, reducing the number of components
can have a direct impact on cost savings in the realm
of discrete electronics, such as PCBs. In the domain of
discrete resistors, the resistance value itself does not
significantly affect the cost of the device, assuming factors like manufacturer, package, power rating, and tolerance remain the same. With this in mind, the minimization of robust topologies emerges as a pivotal factor
in achieving cost-effective and highly reliable circuits.

The second reason is more related specifically to the
robustness definition in our experiment. As noted, our
problem definition does not reward circuits with fewer
components, but rather the opposite. Inclusiveness
(see 2.3) rewards circuits that electrically include all
available components to push means of redundancy
into the circuit and avoid false robustness. During the
synthesis, while the objectives might already be met
with requirements, the inclusiveness criteria might
draw the search toward more included components,
which makes the search too wide and lasting long. We
conclude, that with such-defined search problem, the
hard limits on the topology size and the number of
available components are key to an efficient small-size
failure-resilient topology search.

In comparison to previous experiments, this study considers not only diodes, but also resistors to be a possible point of failure. We experimented with evolutionary
search for circuits that are robust to both, short-circuit
and open-circuit failures in all possible failure points
(components), including some experiments including
transistors. However, we acknowledge that further investigation and modified approaches are required to
address this specific problem effectively.
We believe our work will inspire further practitioners in
the field of analog circuit topology synthesis.

4 Conclusions
Using the topology synthesis tools, we can find topologies, that exhibit novel properties, such as failure
tolerance. We showed that failure-resilience in analog
circuits can be achieved with smaller-than-expected
topologies, by introducing system-level redundancy
instead of much more expensive component-level redundancy. Using an evolutionary-based topology syn-

5 Supplementary material
The source code of the synthesis tool is available online
at https://github.com/zigarojec/MatrixCircEvolutions.

115

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

6 Acknowledgments
I would like to thank my colleagues from the EDA department of the Faculty of Electrical Engineering, the
University of Ljubljana for all the support in my work.

9.

7 Conflict of Interest
We can declare no conflict of interest in this work.

10.

8 References
1.

2.

3.

4.

5.

6.

7.

8.

Á. Bűrmen in H. Habal, „Computing Worst-Case
Performance and Yield of Analog Integrated Circuits by Means of Mesh Adaptive Direct Search“,
Inf. MIDEM, let. 45, str. 160–170, jun. 2015.
Y. Deval, H. Lapuyade, in F. Rivet, „Design of CMOS
integrated circuits for radiation hardening and its
application to space electronics“, v 2019 Ieee 13th
International Conference on Asic (asicon), F. Ye in T.
A. Tang, Ur., New York: Ieee, 2019. Pridobljeno: 10.
avgust 2022. [Na spletu]. Dostopno na: https://
www.webofscience.com/wos/woscc/full-record/
WOS:000541465700105
M. Liu in J. He, „An Evolutionary Negative-Correlation Framework for Robust Analog-Circuit Design
Under Uncertain Faults“, Ieee Trans. Evol. Comput.,
let. 17, št. 5, str. 640–665, okt. 2013,
https://doi.org/10.1109/TEVC.2012.2228208.
D. Keymeulen, R. S. Zebulum, Y. Jin, in A. Stoica,
„Fault-tolerant evolvable hardware using fieldprogrammable transistor arrays“, Ieee Trans. Reliab., let. 49, št. 3, str. 305–316, sep. 2000,
https://doi.org/10.1109/24.914547.
M. Xue in J. He, „Evolutionary topology programming for analog circuit fault tolerant design“, v
2013 25th Chinese Control and Decision Conference
(CCDC), maj 2013, str. 3391–3396.
https://doi.org/10.1109/CCDC.2013.6561534.
S. Askari, M. Nourani, in A. Namazi, „Fault-tolerant
A/D converter using analogue voting“, IET Circuits
Devices Amp Syst., let. 5, št. 6, str. 462–470, nov.
2011,
https://doi.org/10.1049/iet-cds.2011.0042.
K.-J. Kim, A. Wong, in H. Lipson, „Automated synthesis of resilient and tamper-evident analog circuits without a single point of failure“, Genet. Program. Evolvable Mach., let. 11, št. 1, str. 35–59, mar.
2010,
https://doi.org/10.1007/s10710-009-9085-2.
R. S. Zebulum, M. Vellasco, M. A. Pacheco, in H.
T. Sinohara, „Evolvable hardware: On the automatic synthesis of analog control systems“, v

11.

12.

13.

14.
15.

16.

116

2000 IEEE Aerospace Conference. Proceedings (Cat.
No.00TH8484), mar. 2000, str. 451–463 let.5.
https://doi.org/10.1109/AERO.2000.878521.
K.-J. Kim in S.-B. Cho, „Combining Multiple Evolved
Analog Circuits for Robust Evolvable Hardware“, v
Intelligent Data Engineering and Automated Learning - IDEAL 2009, E. Corchado in H. Yin, Ur., v Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2009, str. 359–367.
https://doi.org/10.1007/978-3-642-04394-9_44.
G. A. Hollinger in D. A. Gwaltney, „Evolutionary design of fault-tolerant analog control for a piezoelectric pipe-crawling robot“, v Proceedings of the
8th annual conference on Genetic and evolutionary
computation, v GECCO ’06. New York, NY, USA: Association for Computing Machinery, jul. 2006, str.
761–768.
https://doi.org/10.1145/1143997.1144133.
Q. Ji, Y. Wang, M. Xie, in J. Cui, „Research on FaultTolerance of Analog Circuits Based on Evolvable
Hardware“, v Evolvable Systems: From Biology to
Hardware, L. Kang, Y. Liu, in S. Zeng, Ur., v Lecture
Notes in Computer Science. Berlin, Heidelberg:
Springer, 2007, str. 100–108.
https://doi.org/10.1007/978-3-540-74626-3_10.
R. S. Zebulum, A. Stoica, D. Keymeulen, L. Sekanina, R. Ramesham, in X. Guo, „Evolvable
hardware system at extreme low temperatures“,
v Evolvable Systems: From Biology to Hardware,
J. M. Moreno, J. Madrenas, in J. Cosp, Ur., Berlin: Springer-Verlag Berlin, 2005, str. 37–45. Pridobljeno: 17. avgust 2021. [Na spletu]. Dostopno na: https://www.webofscience.com/wos/
woscc/summary/3e9863eb-c395-495e-94b03f857c12151a-048a98b8/date-descending/1
P. Layzell in A. Thompson, „Understanding Inherent Qualities of Evolved Circuits: Evolutionary
History as a Predictor of Fault Tolerance“, v Evolvable Systems: From Biology to Hardware, J. Miller,
A. Thompson, P. Thomson, in T. C. Fogarty, Ur., v
Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2000, str. 133–144.
https://doi.org/10.1007/3-540-46406-9_14.
S. Ando in H. Iba, „Analog Circuit Design with Variable Length Chromosomes“, str. 8.
K.-J. Kim in S.-B. Cho, „Automated synthesis of
multiple analog circuits using evolutionary computation for redundancy-based fault-tolerance“,
Appl. Soft Comput., let. 12, št. 4, str. 1309–1321,
apr. 2012,
https://doi.org/10.1016/j.asoc.2011.12.002.
A. Mirhoseini idr., „A graph placement methodology for fast chip design“, Nature, let. 594, št. 7862,
str. 207-+, jun. 2021,
https://doi.org/10.1038/s41586-021-03544-w.

Ž. Rojec; Informacije Midem, Vol. 53, No. 2(2023), 103 – 117

17.

18.

19.

20.

21.
22.

23.

24.

25.

26.

27.

W. Kruiskamp in D. Leenaerts, „Darwin: Analogue
circuit synthesis based on genetic algorithms“,
Int. J. Circuit Theory Appl., let. 23, št. 4, str. 285–296,
1995,
https://doi.org/10.1002/cta.4490230404.
J. R. Koza, F. H. Bennett III, D. Andre, M. A. Keane, in
F. Dunlap, „Automated Synthesis of Analog Electrical Circuits by Means of Genetic Programming“,
Trans Evol Comp, let. 1, št. 2, str. 109–128, jul. 1997,
https://doi.org/10.1109/4235.687879.
H. Y. Koh, C. H. Sequin, in P. R. Gray, „OPASYN: a
compiler for CMOS operational amplifiers“, IEEE
Trans. Comput.-Aided Des. Integr. Circuits Syst., let.
9, št. 2, str. 113–125, feb. 1990,
https://doi.org/10.1109/43.46777.
T. McConaghy, P. Palmers, M. Steyaert, in G. G. E.
Gielen, „Trustworthy Genetic Programming-Based
Synthesis of Analog Circuit Topologies Using Hierarchical Domain-Specific Building Blocks“, IEEE
Trans Evol. Comput., let. 15, str. 557–570, 2011.
S. E. Sorkhabi in L. Zhang, „Automated topology
synthesis of analog and RF integrated circuits: A
survey“, Integration, let. 56, str. 128–138, 2017.
Ž. Rojec, Á. Bűrmen, in I. Fajfar, „Analog circuit topology synthesis by means of evolutionary computation“, Eng. Appl. Artif. Intell., let. 80, str. 48–65,
apr. 2019,
https://doi.org/10.1016/j.engappai.2019.01.012.
Z. Dong, W. Cao, M. Zhang, D. Tao, Y. Chen, in X.
Zhang, „CktGNN: Circuit Graph Neural Network
for Electronic Design Automation“, predstavljeno na The Eleventh International Conference on
Learning Representations, sep. 2022. Pridobljeno:
8. avgust 2023. [Na spletu]. Dostopno na: https://
openreview.net/forum?id=NE2911Kq1sp
J. He, K. Zou, in M. Liu, „Section-representation
scheme for evolutionary analog filter synthesis
and fault tolerance design“, v Third International
Workshop on Advanced Computational Intelligence, avg. 2010, str. 265–270.
https://doi.org/10.1109/IWACI.2010.5585181.
S. Li, W. Zou, in J. Hu, „A Novel Evolutionary Algorithm for Designing Robust Analog Filters“, Algorithms, let. 11, št. 3, str. 26, mar. 2018,
https://doi.org/10.3390/a11030026.
J. Hu, X. Zhong, in E. D. Goodman, „Open-ended
robust design of analog filters using genetic programming“, v Proceedings of the 7th annual conference on Genetic and evolutionary computation,
v GECCO ’05. New York, NY, USA: Association for
Computing Machinery, jun. 2005, str. 1619–1626.
https://doi.org/10.1145/1068009.1068283.
S. Ando in H. Iba, „Analog circuit design with a
variable length chromosome“, v Proceedings of
the 2000 Congress on Evolutionary Computation.

28.

29.
30.
31.

32.

33.
34.

35.

CEC00 (Cat. No.00TH8512), jul. 2000, str. 994–1001
let.2.
https://doi.org/10.1109/CEC.2000.870754.
Ž. Rojec, I. Fajfar, in Á. Burmen, „Evolutionary Synthesis of Failure-Resilient Analog Circuits“, Mathematics, let. 10, št. 1, Art. št. 1, jan. 2022,
https://doi.org/10.3390/math10010156.
A. K. Kenneth, „Piecewise Linear Circuits“, mar.
2004.
F. Maloberti, „Design of CMOS Analog Integrated
Circuits“.
Ž. Rojec, J. Olenšek, in I. Fajfar, „Analog Circuit Topology Representation for Automated Synthesis
and Optimization“, Inf. Midem-J. Microelectron.
Electron. Compon. Mater., let. 48, št. 1, str. 29–40,
mar. 2018.
G. Györök, „Crossbar network for automatic
analog circuit synthesis“, v 2014 IEEE 12th International Symposium on Applied Machine Intelligence
and Informatics (SAMI), jan. 2014, str. 263–267.
https://doi.org/10.1109/SAMI.2014.6822419.
D. G. Tomasz, Genetic Algorithms Reference. Tomasz Gwiazda, 2006.
J. Olenšek, T. Tuma, J. Puhan, in Á. Bűrmen, „A new
asynchronous parallel global optimization method based on simulated annealing and differential
evolution“, Appl. Soft Comput., let. 11, št. 1, str.
1481–1489, 2011,
https://doi.org/10.1016/j.asoc.2010.04.019.
K. Deb, A. Pratap, S. Agarwal, in T. Meyarivan, „A
Fast and Elitist Multiobjective Genetic Algorithm:
NSGA-II“, Trans Evol Comp, let. 6, št. 2, str. 182–197,
apr. 2002,
https://doi.org/10.1109/4235.996017.

Copyright © 2023 by the Authors.
This is an open access article distributed under the Creative Commons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted
use, distribution, and reproduction in any medium,
provided the original work is properly cited.
Arrived: 17. 02. 2023
Accepted: 06. 10. 2023

117

Boards of MIDEM Society |
Organi društva MIDEM
MIDEM Executive Board | Izvršilni odbor MIDEM
President of the MIDEM Society | Predsednik društva MIDEM
Prof. Dr. Barbara Malič, Jožef Stefan Institute, Ljubljana, Slovenia
Vice-presidents | Podpredsednika
Prof. Dr. Janez Krč, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia
Dr. Iztok Šorli, Mikroiks d.o.o., Ljubljana, Slovenia
Secretary | Tajnik
Olga Zakrajšek, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia
MIDEM Executive Board Members | Člani izvršilnega odbora MIDEM
Prof. Dr. Slavko Bernik, Jožef Stefan Institute, Slovenia
Assoc. Prof. Dr. Miha Čekada, Jožef Stefan Institute, Ljubljana, Slovenia
Prof. DDr. Denis Đonlagić, UM, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia
Prof. Dr. Leszek J. Golonka, Technical University, Wroclaw, Poljska
Prof. Dr. Vera Gradišnik, Tehnički fakultet Sveučilišta u Rijeci, Rijeka, Croatia
Mag. Leopold Knez, Iskra TELA, d.d., Ljubljana, Slovenia
Mag. Mitja Koprivšek, ETI Elektroelementi, Izlake, Slovenia
Asst. Prof. Dr. Gregor Primc, Jožef Stefan Institute, Ljubljana, Slovenia
Prof. Dr. Janez Trontelj, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia
Asst. Prof. Dr. Hana Uršič Nemevšek, Jožef Stefan Institute, Ljubljana, Slovenia
Dr. Danilo Vrtačnik, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia

Supervisory Board | Nadzorni odbor
Prof. Dr. Franc Smole, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia
Prof. Dr. Drago Strle, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia
Igor Pompe, retired

Court of honour | Častno razsodišče
Darko Belavič, Jožef Stefan Institute, Ljubljana, Slovenia
Dr. Miloš Komac, retired
Dr. Hana Uršič Nemevšek, Jožef Stefan Institute, Ljubljana, Slovenia

Informacije MIDEM
Journal of Microelectronics, Electronic Components and Materials
ISSN 0352-9045

Publisher / Založnik:
MIDEM Society / Društvo MIDEM
Society for Microelectronics, Electronic Components and Materials, Ljubljana, Slovenia
Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale, Ljubljana, Slovenija
www.midem-drustvo.si