https://doi.or g/10.31449/inf.v47i3.4736 Informatica 47 (2023) 303–314 303
Lightweight Multi-Objective and Many-Object ive Pr oblem Formulations for
Evolutionary Neural Ar chitectur e Sear ch with the T raining-Fr ee Performance
Metric Synaptic Flow
An V o, T an Ngoc Pham, V an Bich Nguyen and Ngoc Hoang Luong
University of Information T echnology , Ho Chi Minh City , V ietnam
V ietnam National University , Ho Chi Minh City , V ietnam
E-mail: 19520007@gm.uit.edu.vn, 19520925@gm.uit.edu.vn, vannb@uit.edu.vn, hoangln@uit.edu.vn
Keywords: neural architecture search, evolutionary algorithms, multi-objective optimization
Received: March 1 1, 2023
Neural ar chitectur e sear ch (NAS) with naïve pr oblem formulations and applications of conventional sear ch
algorithms often incur pr ohibitive sear ch costs due to the evaluations of many candidate ar chitectur es. For
each ar chitectur e, its accuracy performance can be pr operly evaluated after hundr eds (or thousands) of
computationally expensive training epochs ar e performed to achieve pr oper network weights. A so-called
zer o-cost metric, Synaptic Flow , computed based on random network weight values at initialization, is
found to exhibit certain corr elations with the neural network test accuracy and can thus be used as an
efficient pr oxy performance metric during the sear ch. Besides, NAS in practice often involves not only
optimizing for network accuracy performance but also optimizing for network complexity , such as model
size, number of floating point operations, or latency , as well. In this article, we study various NAS pr oblem
formulations in which multiple aspects of deep neural networks ar e tr eated as multiple optimization ob-
jectives. W e employ a widely-used multi-objective evolutionary algorithm, i.e., the non-dominated sorting
genetic algorithm II (NSGA-II), to appr oximate the optimal Par eto-optimal fr onts for these NAS pr oblem
formulations. Experimental r esults on the NAS benchmark NA TS-Bench show the advantages and disad-
vantages of each formulation.
Povzetek: Uporabljen je algoritem NSGA-II za analizo NAS pr oblemov , tj. za iskanje primerne nevr onske
ar hitektur e.
1 Intr oduction
The goal of Neural Architecture Search (NAS) is to acceler -
ate the design process of high-performing deep neural net-
work architectures by exploring the vast space of possible
network configurations and selecting the most promising
ones. This process typically involves searching over a lar ge
number of potential architectures, evaluating their perfor -
mance, and iteratively refining the algorithm to conver ge
on the best-performing architectures [ 12 ]. However , many
state-of-the-art NAS methods require substantial computa-
tional resources. For example, Zoph et al. [ 30 ] employed
800 GPUs over 28 days to solve NAS using a reinforcement
earning algorithm, whereas Real et al. [ 27 ] proposed an
evolution-based technique (AmoebaNet-A) that took 7 days
to execute on 450 K40 GPUs. T o reduce such heavy com-
putation costs, current NAS ef ficiency research proposes
the adoption of training-fr ee performance metrics [ 1 ] as a
performance objective rather than network accuracy . These
metrics can be computed using network weights at initial-
ization and do not require any training epochs, thus primar -
ily involving network designs. Several such training-free
metrics have been shown to be correlated with actual net-
work accuracy to some extent [ 1 ]. Hence, optimizing these
metrics potentially leads to promising architectures.
While most studies focus on optimizing network archi-
tectures for a single objective, such as network accuracy ,
real-world neural network deployments frequently neces-
sitate the consideration of other important factors, such as
FLOPs, number of parameters, and latency . NAS archi-
tectures that are optimized just for accuracy may be too
cumbersome for resource-constrained embedded devices.
Moreover , by solving multi-objective problems, a trade-of f
front between performance and complexity can be obtained,
which provides decision-makers with the necessary infor -
mation to select an appropriate network. Several research
has presented multi-objective NAS (MONAS) formula-
tions that take into consideration important aspects. For
example, Lu et al. [ 20 ] presented NSGA-Net, which used
the non-dominated sorting genetic algorithm II (NSGA-
II) [ 6 ] to solve an MONAS problem with two conflicting
objectives, i.e., the classification error and the number of
floating-point operations (FLOPs). In another work [ 19 ],
NSGA-II was also used to solve a many-objective prob-
lem formulation with five optimization objectives, includ-
ing ImageNet accuracy , number of parameters, number of
multiply-add operations, CPU and GPU latency .
Lu et al. [ 19 ] also developed a surrogate model to fore-
304 Informatica 47 (2023) 303–314 V o et al.
cast the accuracy of candidate architectures and the predic-
tor was refined during the search process to enhance the
performance of NSGA-II in solving MONAS. T o build the
predictor , a limited number of architectures were sampled
from the search space at first. Following that, NSGA-II
was used to search for architectures, treating the accuracy
predictor as an objective alongside other complexity objec-
tives. Despite the fact that they employed a surrogate model
as an objective for NSGA-II to discover architectures, they
still trained these architectures and used them as training
samples to refine the accuracy predictor . Using complex-
ity metrics and training-free performance metric Synaptic
Flow ( synflow ) simultaneously , Phan et al. [ 25 ] randomly
choose a wide variety of architectures and evaluate their
complexity and performance. Non-dominated architectures
with high performance and low complexity are then utilized
to initialize the population for a bi-objective evolutionary
NAS process where network accuracy is used as the pri-
mary performance metric. The training-free synflow met-
ric is only employed during the warm-up phase. During the
search phase, candidate architectures still need to be trained
and evaluated for their performance. It’ s also possible to
use synflow metric to enhance the performance of NSGA-
II in solving multi-objective NAS problems as in [ 26 ], by
developing a training-free multi-objective local search. In
each generation, a subset of potential architectures under -
goes a local search process that uses synflow for improve-
ment checks, eliminating the need for training epochs. In
contrast to these w orks, our approach does not rely on any
training process. Instead, we use the training-free perfor -
mance metric synflow to evaluate all candidate architec-
tures during the search. This eliminates the need for train-
ing and allows us to search for high-quality architectures
more ef ficiently . Do et al. [ 7 ] also propose a completely
training-free multi-objective evolutionary NAS framework
that employs the number of linear regionsR
N
and the con-
dition number of the neural tangent kernelκ N
to evaluate
candidate architectures, which are data-dependent metrics
computed using mini-batches from a training dataset. In
our work, we use the data-agnostic metric synflow as our
performance objective. The resulting architectures are thus
potentially applicable to a wider range of tasks and datasets.
This article extends our SoICT 2022 conference paper
on training-free multi-objective and many-objective evo-
lutionary NAS [ 29 ]. In [ 29 ], we discussed several multi-
objective and many-objective NAS problem formulations
and employed the well-known multi-objective evolution-
ary algorithm NSGA-II to solve these formulations. More-
over , we exclusively used the data-agnostic training-free
metric synflow to evaluate candidate architecture perfor -
mance without any training. In this article, we extend the
analysis in our preliminary work by adding the hypervol-
ume performance indicator results instead of only Inverted
Generational Distance (IGD). While IGD exhibits the con-
ver gence behavior of a multi-objective algorithm, it c annot
be used in real-world situations due to its requirement of
the Pareto-optimal front (see Sections 2.1 and 4.2.1 ). The
hypervolume, which requires only a reference nadir point,
is a more practical performance indicator for evaluating
and comparing multi-objective NAS approaches (see Sec-
tion 4.2.2 ). Employing both IGD and hypervolume thus
yields more detailed investigations into the ef fectiveness
of dif ferent NAS problem formulations. W e present the
IGD and hypervolume results in terms of GPU hours rather
than the number of generations, which allows us to better
assess the ef ficiency of our approaches. Our experimen-
tal results demonstrate that T raining-Free Many-Objective
Evolutionary NAS (TF-MaOENAS) provides several ad-
vantages when achieving competitive results while taking
only 3 GPU hours.
2 Backgr ounds
2.1 Multi-objective neural ar chitectur e
sear ch
Multi-Objective NAS (MONAS) [ 20 , 26 ] can be formu-
lated as searching for high-quality architectures in a search
space Ω wherem dif ferent aspects (e.g., error rate, model
size, or latency) are optimized simultaneously . Each
aspect is modeled as a separate objective f
i
(x), i ∈ { 1,...,m } , and each candidate architecture x ∈ Ω thus
has a corresponding vector of objective values f(x) =
(f
1
(x),...,f
m
(x)) . All objectives, without loss of gen-
erality , are assumed to be minimized.
An architecturex dominates another architecturey if and
only ifx strictly outperformsy in at least one aspect andx
is never outperformed byy in any aspects:
x≺ y ⇐⇒ ∀ i,f
i
(x)≤ f
i
(y)∧∃ i,f
i
(x)<f
i
(y)
If some objectives conflict with each other , e.g., network
prediction accuracy versus network complexity , there will
not exist an ideal architecture optimizing all those compet-
ing objectives. Instead, there exists a Pareto set P
S
of ar -
chitectures, in which all can be considered Pareto-optimal
because they are not dominated by any other architectures:
P
S
={ x∈ Ω | ∄ x
′ ∈ Ω , x
′ ≺ x} The images of all Pareto-optimal architectures in P
S
to-
gether form a Pareto-optimal front P
F
in the objective
space [ 5 , 23 ]:
P
F
={ f(x)∈ R
m
| x∈ P
S
} Each point on P
F
denotes the vector of objective values
f(x) of a Pareto-optimal architecturex , which exhibits an
optimal trade-of f among the competing objectives. For ex-
ample, from a Pareto-optimal architecturez , if we modify
z to improve network performance (e.g., prediction accu-
racy), network complexity (e.g., model size or FLOPs) must
be increased as well. In other words, there exists no means
in the search space Ω to alterz in order to increase accu-
racy performance without incurring additional computation
Lightweight Multi-Objective and Many-Objective Problem Formulations… Informatica 47 (2023) 303–314 305
cost. A Pareto-optimal frontP
F
thus exhibits insightful in-
formation for decision-makers, e.g., which Pareto-optimal
architecture onP
F
exhibits the most desirable trade-of f be-
tween network latency and accuracy .
The optimal solution of MONAS is not a single ideal
architecture but the Pareto set P
S
. However , achieving
the entire P
S
is prohibitively costly (if there are many
Pareto-optimal architectures) and unnecessary (if choosing
between architectures close to each other on the Pareto-
optimal frontP
F
does not make considerable dif ferences).
It is often more practical to find an appr oximation setS that
yields a good appr oximation fr ont f(S) well approximat-
ing the Pareto-optimal frontP
F
in terms of both proximity
and diversity [ 6 , 21 ].
2.2 Non-dominated sorting genetic
algorithm II
Evolutionary Algorithms (EAs) are often employed for
handling multi-objective optimization problems because
their intrinsic population-based operations are well-suited
for the goal of finding multiple non-dominated solutions to
approximate Pareto-optimal fronts [ 5 , 15 , 23 ]. In this arti-
cle, we consider the Non-dominated Sorting Genetic Al-
gorithm II (NSGA-II) [ 6 ] as the optimization algorithm.
NSGA-II has also been widely used for solving dif ferent
MONAS problem instances as well [ 19 , 20 ]. In the follow-
ing paragraph, we provide a brief description of NSGA-II,
and further details can be found in [ 6 ].
The NSGA-II populationP is initialized withN individ-
uals, where each individual corresponds with a candidate
architecture randomly sampled from the search space Ω .
Until the computation budget is over , or other termination
criteria are met, NSGA-II operates in a generational man-
ner as follows. In every generation, a setS ofN promising
individuals in terms of Pareto dominance are selected from
P via binary tournament selection. N new candidate archi-
tectures (i.e., setO of of fspring individuals) are generated
from the parent architectures (i.e., setS of selected individ-
uals) via variation operators (i.e., crossover and mutation)
and are evaluated for their objective values. The current
populationP and the of fspring population are then mer ged
into a pool(P+O) where all2N individuals are sorted into
their non-domination ranks 0, 1, 2,... . Rank 0 consists of
individuals that are not dominated by any other individuals
in (P +O) , and ranki consists of individuals that would
be non-dominated if individuals from lower ranks (<i ) are
omitted. A group ofN promising individuals are then se-
lected from the pool (P +O) via truncation selection to
form the population for the next generation. Lower -rank
individuals are selected first, and if selections need to be
performed among individuals of the same rank, far -apart
individuals are preferred.
2.3 T raining-fr ee performance metric
synaptic flow
Synaptic Flow ( synflow ) is a metric for measuring the im-
portance of each parameter in a neural network architec-
ture, based on the inter -layer interaction of other network
parameters. T anaka et al. [ 28 ] first introduced the synflow
score for single parameterw
[l]
ij
in thel -th layer of a fully-
connected neural network as follows:
P(w
[l]
ij
)=
2
4
1
T
N
Y
k=l+1
      W
[k]
      3
5
i
      w
[l]
ij
      "
l− 1
Y
k=1
      W
[k]
      1
#
j
(1)
where W
[k]
is the weight matrix of the k -th layer of the
network. The synflow score for a parameterw
[l]
ij
takes into
account the product of the absolute values of the weights of
all the layers downstream from the current layerl , as well
as the product of the absolute values of the weights of all the
layers upstream from the current layerl . Thus, the synflow
score of a parameter P
S
(w
[l]
ij
) reflects the contribution of
that parameter to the information flow of the network.
Abdelfattah et al. [ 1 ] then extended the synflow score
to evaluate the entire network architecturex , which is sum
of the synflow scores for allM parameters in the network
as follows:
S(w(x))=
M
∑ i=1
P(w
i
(x)) (2)
According to [ 1 ], the training-free performance metric
synflow exhibits a strong correlation with the final ac-
curacy of the network in the NAS-Bench-201 architecture
search space, with Spearman ρ coef ficients of 0.74, 0.76,
and 0.75 on CIF AR-10, CIF AR-100, and ImageNet16-120,
respectively . The bi-objective space of test accuracy af-
ter 200 training epochs versus FLOPs for all architectures
in NA TS-Bench is depicted in Figure 1 . According to
the graph, architectures with greater synflow scores tend
to have higher test accuracy . Furthermore, synflow is
a data-agnostic metric that can be computed solely based
on network weights (see Equation 1 ). Unlike other per -
formance metrics that do not require training, such as
jacob_cov [ 22 ] or the condition number of the NTK [ 4 ],
synflow does not need any data mini-batches to be used
as input for the network. It means that synflow can be
used to measure the performance of a neural network with-
out having to pass any data through it. This is beneficial
since it allows for a more ef ficient evaluation of a neural
network’ s performance without having to expend resources
on data collection and pre-processing. Since synflow is
data-independent and does not quire any training epochs,
it can serve as an ef fective proxy for optimizing network
accuracy in tackling NAS problems [ 1 ].
306 Informatica 47 (2023) 303–314 V o et al.
 Figure 1: Illustration of all network architectures in the NA TS-Bench search space. Brighter hexagons indicate greater
values of synflow , while red triangles denote the architectures with the highest synflow values.
3 NAS pr oblem formulations
3.1 Multi-objective NAS pr oblem
formulation
NAS can be formulated as a multi-objective optimization
problem, which seeks to simultaneously optimize two ob-
jectives, such as accuracy and computational complexity ,
as follows:
min F(x)=(f err(x, w
∗ (x), D
val
),f comp(x))∈ R
2
, st w
∗ (x)∈ arg minL(x, w(x), D
train
), x∈ Ω arch
, w(x)∈ Ω weight
(x), (3)
wherex denotes an architecture in the search space Ω arch
.
Multi-Objective NAS (MONAS), aiming to find a set of
architectures that exhibit optimal trade-of fs between accu-
racy and complexity , is a bi-level optimization problem.
At the upper level, it seeks high-quality candidate archi-
tectures that optimize both error rate f
err
and complexity
f
comp
, while at the lower level, it searches for the proper
network weight valuesw
∗ (x) for each candidate architec-
turex . The network weight valuesw(x) must be specified
in order to accurately evaluate the error rate of a network ar -
chitecturex . This requires solving a lower -level optimiza-
tion problem over the network weight space Ω weight
(x) of
the given architecturex . By doing so, we can obtain a set
of weight values that minimize the error rate and maximize
the performance of the network. This is typically done by
employing a stochastic gradient descent (SGD) algorithm to
perform many iterative updates on network weight values in
order to minimize a loss functionL , which measures the dif-
ference between network predictions and ground-truth tar -
gets for data items in a training datasetD
train
. This process
can be computationally expensive and time-consuming, but
is necessary in order to accurately obtain the proper values
ofw(x) for any given architecture.
The two optimization objectives at the upper level of
MONAS are minimizing error rate f
err
and minimizing
complexity f
comp
. The complexity of a network can be
assessed v ia metrics such as the number of floating point
operations (FLOPs), latency , the number of parameters
(#parameters), or the number of multiply-accumulate units
(#MACs). These metrics can be calculated without the
weights of the network, and thus require minimal comput-
ing time. Besides, in order to prevent overfitting to the
training datasetD
train
, it is important to calculate error rate
f
err
on a separate validation datasetD
val
that has not been
used for training. At the end of a search, the error rates and
weight of resulting architectures should be tested on a new
datasetD
test
to measure their ability to generalize.
3.2 T raining-fr ee multi-objective NAS
pr oblem formulation
Evaluating the prediction performance of multiple candi-
date architectures in MONAS requires computationally in-
tensive training procedures (i.e., lower -level optimization)
which consume a significant amount of computing time
(see Equation 3 ). T o eliminate this cost, several NAS for -
mulations have been proposed to use training-free perfor -
mance metrics as proxies for the network error rate f
err
.
This approach allows us to quickly evaluate candidate ar -
chitectures without having to perform costly training pro-
cedures. W e present a training-free bi-objective NAS for -
mulation that uses the synflow metric as an alternative to
f
err
as follows:
minimize F(x)=(f
SF
(x, w(x)),f comp(x))∈ R
2
, subject to x∈ Ω arch
, w(x)∈ Ω weight
(x), (4)
wheref
SF
(x, w(x))=−S (w(x)) (as synflow should be
maximized). At the start of the lower level optimization
process,w(x) can be initialized randomly to compute their
synflow scores. W e name this formulation TF-MONAS.
3.3 T raining-fr ee many-objective NAS
pr oblem formulation
The TF-MONAS formulation in Equation 4 can be further
extended by incorporating many objectives simultaneously .
This approach allows for the simultaneous consideration of
Lightweight Multi-Objective and Many-Objective Problem Formulations… Informatica 47 (2023) 303–314 307
multiple complexity metrics, such as FLOPs, latency , #pa-
rameters, and #MACs. By optimizing the neural network
architecture from various perspectives, this approach can
ensure that the resultant architecture is suitable for a variety
of applications. A common scenario is the use of deep neu-
ral networks on embedded devices, such as drones or smart
watches. This requires taking into account a variety of hard-
ware limitations, such as model size and storage capacity ,
as well as usage requirements like response time. All of
these must be considered when deploying deep neural net-
works on embedded devices in order to ensure reliable per -
formance. Additionally , we note that these complexity met-
rics can be evaluated without incurring too much computing
cost. W e formulate training-free many-objective evolution-
ary NAS (TF-MaOENAS) that does not require any training
and consists of five objectives as follows:
minimize F(x)=(f
SF
(x, w(x)),f
MACs
(x), f
latency
(x),f
FLOPs
(x),f params(x)), subject to x∈ Ω arch
, w(x)∈ Ω weight
(x), (5)
The complexity metrics used in this work are FLOPs
, latency , the number of parameters (#parameters), and
#MACs. W e name this formulation TF-MaOENAS. W e
will compare solving one TF-MaOENAS formulation that
involves four optimization objectives for network complex-
ity against solving separately four dif ferent TF-MOENAS
formulations in which each model considers only one
complexity objective. Moreover , we also compare TF-
MaOENAS with (training-based) MaOENAS to demon-
strate the benefits of the training-free performance metric
synflow .
4 Experiments
4.1 Experimental details
Our experiments are carried out on NA TS-Bench [ 8 ], which
is an extended version of NAS-Bench-201 [ 1 1 ]. NA TS-
Bench comprises 15,625 architectures and provides a vari-
ety of metrics for evaluating dif ferent architectures, such as
accuracy , number of parameters, and training time, across
three datasets: CIF AR-10, CIF AR-100, and ImageNet16-
120. W e experiment with four MONAS approaches on
NA TS-Bench, each with specific optimization objectives as
follows:
1. 04 MOENAS variants : 01 training-based perfor -
mance metric (validation accuracy after 12 training
epochs as in other related works [ 1 1 , 26 ]) versus 01
complexity metric (FLOPs, #parameters, latency , or
#MACs).
2. 04 TF-MOENAS variants : 01 training-free perfor -
mance metric ( synflow ) versus 01 complexity metric
(FLOPs, #parameters, latency , or #MACs).
3. 01 MaOENAS variant : 01 training-based perfor -
mance metric (validation accuracy after 12 training
epochs) versus 04 complexity metrics (FLOPs, la-
tency , #parameters, and #MACs).
4. 01 TF-MaOENAS variant : 01 training-free perfor -
mance metric ( synflow ) versus 04 complexity met-
rics (FLOPs, latency , #parameters, and #MACs).
W e use the NSGA-II [ 6 ] as our multi-objective search
algorithm. W e set the population size to 20, the number of
generations to 50, and used random initialization. W e also
employ binary tournament selection, two-point crossover
with a probability of 0.9, and polynomial mutation with a
probability of 1/l , where l is the encoding length of each
individual.
Besides, we also implement an elitist ar chive [ 21 ] to save
non-dominated architectures discovered so far throughout
the NAS process. When an architecture is evaluated, it is in-
cluded in the elitist archive if it is not dominated by any ex-
isting architectures in the elitist archive. Existing architec-
tures that are dominated by newly added architecture will
be removed from the elitist archive. The non-dominated
architectures in the elitist archive, therefore, constitute an
approximation set, which may be regarded as the NSGA-
II optimization result. The elitist archive is only used for
result logging and does not impact the workings of NSGA-
II. Because non-dominated solutions might be lost due to
the stochasticity of the variation and selection operators, an
elitist archive is desirable for multi-objective evolutionary
algorithms.
W e conduct 21 independent runs of NSGA-II for each
problem formulation presented in Section 3 on CIF AR-10,
CIF AR-100, and ImageNet16-120 of NA TS-Bench. All of
our experiments can be performed using Google Colab.
4.2 Performance metric
4.2.1 Inverted generational distance
T o compare an approximation setS of non-dominated ar -
chitectures against the Pareto-optimal frontP
F
of the most
ef ficient trade-of f architectures, we employ the Inverted
Generational Distance (IGD) [ 3 ] which is defined as:
IGD(S,P
F
)=
1
|P
F
| ∑ p∈ P
F
min
x∈S
∥ p− f(x)∥ 2
(6)
The smaller IGD indicates the better approximation
front achieved by the current solutions. For example, if
IGD(S
1
,P
F
) < IGD(S
2
,P
F
) , thenS
1
is a better approx-
imation front compared toS
2
regarding P
F
. The Pareto-
optimal frontP
F
can be obtained by iterating over all archi-
tectures in the NAS benchmark. The Pareto-optimal front
P
F
is computed by querying the database of NA TS-Bench
for test accuracy values after 200 epochs. Approxima-
tion setsS are taken from the elitist archive obtained from
the search process. The IGD value between the archive
and the Pareto-optimal front P
F
can be calculated after
each evolutionary generation to measure how close the cur -
rent approximation front of the algorithm is to the front of
308 Informatica 47 (2023) 303–314 V o et al.
Pareto-optimal architectures P
F
. The test accuracy after
200 epochs and IGD values are only used to assess the ef-
fectiveness of the search algorithms and are not employed
to direct the search process.
4.2.2 Hypervolume
Hypervolume [ 3 , 18 ] is also a measure of the quality of a
set of non-dominated solutions in multi-objective optimiza-
tion besides IGD. It can be computed by measuring the
area covered by the solution points on the approximation
front with regard to a r efer ence point . In contrast to IGD,
which requires the Pareto-optimal front P
F
to compute
(IGD can thus hardly be used in real-world multi-objective
optimization), hypervolume only need a reference point to
be specified, which is usually the nadir point (the worst
point in the objective space). The higher hypervolume im-
plies that the corresponding method achieves a better ap-
proximation front. For example, if Hypervolume(S
1
) >
Hypervolume(S
2
) , thenS
1
is a better approximation front
compared toS
2
.
4.3 Result analysis
IGD Hypervolume
T est
accuracy
Search cost
(hours)
Space: T est accuracy - FLOPs
(1)0. 0198± 0. 01711. 0332± 0. 001394. 28± 0. 17 53. 7
(2)0. 0250± 0. 01331. 0223± 0. 002794. 29± 0. 17 0. 7
(3)0. 0308± 0. 01771. 0334± 0. 001194. 27± 0. 13 54. 8
(4)0. 0096± 0. 00211. 0298± 0. 002294. 37± 0. 00 2. 7
Space: T est accuracy - Latency
(1) 0. 0228± 0. 0019 1. 0050± 0. 0006 94. 30± 0. 09 54. 1
(2)0. 0532± 0. 00560. 9431± 0. 020094. 29± 0. 14 1. 1
(3)0. 0277± 0. 00600. 9967± 0. 016894. 27± 0. 13 54. 8
(4)0. 0412± 0. 00600. 9581± 0. 009894. 37± 0. 00 2. 7
Space: T est accuracy - #Parameters
(1)0. 0180± 0. 01381. 0332± 0. 001494. 27± 0. 18 53. 8
(2)0. 0314± 0. 01701. 0233± 0. 002794. 24± 0. 22 0. 8
(3)0. 0309± 0. 01761. 0334± 0. 001194. 27± 0. 13 54. 8
(4)0. 0098± 0. 00221. 0296± 0. 002394. 37± 0. 00 2. 7
Space: T est accuracy - #MACs
(1)0. 0195± 0. 01311. 0331± 0. 001794. 24± 0. 22 53. 8
(2)0. 0189± 0. 00691. 0280± 0. 003494. 35± 0. 03 0. 8
(3)0. 0266± 0. 01501. 0333± 0. 001194. 27± 0. 13 54. 8
(4) 0. 0104± 0. 0023 1. 0292± 0. 002594. 37± 0. 00 2. 7
T able 1: Results of search and evaluation directly on
CIF AR-10: (1) MOENAS, (2) TF-MOENAS, (3) MaOE-
NAS, (4) TF-MaOENAS. Results that are underlined indi-
cate t he best method and results that are bolded denote the
best method with statistical significance (p-value< 0.01)
Figure 2 demonstrates that TF-MaOENAS achieves su-
perior IGD conver gence results compared to other ap-
proaches while taking just 3 GPU hours in most cases,
with the exception of test accuracy versus latency space.
However , in terms of hypervolume, MaOENAS and MOE-
NAS alternatively surpass other approaches on CIF AR-
10 and ImageNet16-120. T able 1 , T able 2 , and T able 3
show comprehensive results on CIF AR-10, CIF AR-100 and
ImageNet16-120. It is noted that the hypervolume of TF-
MaOENAS still outperforms other methods in the major -
ity of cases on CIF AR-100, and its hypervolume is only
slightly lower than that of other training-based methods on
IGD Hypervolume
T est
accuracy
Search cost
(hours)
Space: T est accuracy - FLOPs
(1)0. 0384± 0. 00860. 7958± 0. 001572. 39± 0. 21 53. 8
(2)0. 0493± 0. 01760. 7851± 0. 003672. 56± 0. 44 0. 8
(3)0. 0334± 0. 01280. 7964± 0. 001972. 40± 0. 30 54. 8
(4) 0. 0122± 0. 0045 0. 7993± 0. 0019 73. 49± 0. 07 2. 7
Space: T est accuracy - Latency
(1)0. 0318± 0. 00700. 7960± 0. 001372. 68± 0. 68 54. 0
(2)0. 1182± 0. 01390. 7460± 0. 014973. 51± 0. 00 1. 0
(3)0. 0352± 0. 00840. 7701± 0. 005772. 40± 0. 30 54. 8
(4)0. 0446± 0. 00570. 7539± 0. 007673. 49± 0. 07 2. 7
Space: T est accuracy - #Parameters
(1)0. 0369± 0. 00290. 7960± 0. 001372. 47± 0. 23 53. 8
(2)0. 0189± 0. 00380. 7883± 0. 002573. 47± 0. 11 0. 8
(3)0. 0335± 0. 01270. 7963± 0. 001972. 40± 0. 30 54. 8
(4) 0. 0123± 0. 0045 0. 7990± 0. 0020 73. 49± 0. 07 2. 7
Space: T est accuracy - #MACs
(1)0. 0313± 0. 00940. 7956± 0. 002172. 39± 0. 36 53. 8
(2)0. 0156± 0. 00250. 7941± 0. 004173. 51± 0. 00 0. 8
(3)0. 0270± 0. 00960. 7961± 0. 001872. 40± 0. 30 54. 8
(4) 0. 0126± 0. 0041 0. 7985± 0. 002173. 49± 0. 07 2. 7
T able 2: Results of search and evaluation directly on
CIF AR-100: (1) MOENAS, (2) TF-MOENAS, (3) MaOE-
NAS, (4) TF-MaOENAS. Results that are underlined indi-
cate the best method and results that are bolded denote the
best method with statistical significance (p-value< 0.01)
CIF AR-10 and ImageNet16-120. Furthermore, because it
is a training-free approach, it only requires 3 GPU hours as
opposed to dozens to hundreds of GPU hours for training-
based methods like MOENAS and MaOENAS. Regarding
test accuracy , TF-MaOENAS discovers top-performing ar -
chitectures on NA TS-Bench and outperforms other meth-
ods in the majority of situations.
The experimental results also show that TF-MaOENAS
and TF-MOENAS (using synflow ) perform better than
MaOENAS and MOENAS (using validation accuracy af-
ter 12 training epochs), respectively . This indicates that us-
ing synflow is more ef fective at optimizing for multiple
objectives simultaneously than using the validation accu-
racy after 12 training epochs. This might reflect that the
training-free synflow metric is more capable of measuring
and balancing between optimizing for accuracy and other
complexity objectives. Moreover , synflow is a training-
free metric, it just takes a few seconds to compute, result-
ing in a lower computing cost than a training-based met-
ric. On the other hand, TF-MaOENAS, which employs
five objectives concurrently , outperforms TF-MOENAS,
which employs only two objectives. This is due to the ad-
dition of MACs, the number of parameters, and latency as
complexity objectives in addition to synflow and FLOPs.
Most of the time, optimizing more objectives is favor -
able while not incurring considerably more computing ex-
penses. This will provide a fuller picture of the complexity
of achieved architectures, enabling a more precise evalua-
tion of the trade-of fs between performance and complex-
ity . Additionally , the penta-objective approximation fronts
obtained by TF-MaOENAS can be projected into dif fer -
ent bi-objective spaces (i.e., test accuracy versus one com-
plexity metric) and still achieve better results than the cor -
responding TF-MOENAS variants. This means that run-
ning TF-MaOENAS once in the penta-objective space can
obtain good approximation fronts in dif ferent bi-objective
Lightweight Multi-Objective and Many-Objective Problem Formulations… Informatica 47 (2023) 303–314 309
 Figure 2: IGD and hypervolume comparisons in terms of GPU hours (log scale) on four dif ferent bi-objective spaces
(plot title) across CIF AR-10 (top two rows), CIF AR-100 (middle two rows) and ImageNet16-120 (bottom two rows). The
figures depict the mean values with lines and the standard deviation with shaded areas over 21 runs.
310 Informatica 47 (2023) 303–314 V o et al.
IGD Hypervolume
T est
accuracy
Search cost
(hours)
Space: T est accuracy - FLOPs
(1) 0. 0217± 0. 0087 0. 5165± 0. 002646. 34± 0. 35 161. 8
(2) 0. 0296± 0. 0089 0. 5062± 0. 006246. 25± 0. 15 0. 6
(3) 0. 0192± 0. 0165 0. 5193± 0. 003246. 41± 0. 43 163. 7
(4) 0. 0151± 0. 0019 0. 5189± 0. 001746. 57± 0. 05 2. 3
Space: T est accuracy - Latency
(1) 0. 0281± 0. 0047 0. 5171± 0. 001946. 62± 0. 52 162. 2
(2) 0. 0543± 0. 0112 0. 4852± 0. 011846. 52± 0. 17 1. 1
(3) 0. 0192± 0. 0165 0. 5012± 0. 005946. 41± 0. 43 163. 7
(4)0. 04428± 0. 00620. 4922± 0. 008646. 57± 0. 05 2. 3
Space: T est accuracy - #Parameters
(1) 0. 0194± 0. 0067 0. 5171± 0. 001946. 46± 0. 24 161. 8
(2) 0. 0264± 0. 0114 0. 5092± 0. 004746. 40± 0. 18 0. 8
(3) 0. 0194± 0. 0165 0. 5191± 0. 003246. 41± 0. 43 163. 7
(4) 0. 0153± 0. 0019 0. 5186± 0. 001746. 57± 0. 05 2. 3
Space: T est accuracy - #MACs
(1) 0. 0198± 0. 0073 0. 5153± 0. 003946. 17± 0. 46 161. 8
(2) 0. 0188± 0. 0020 0. 5161± 0. 002046. 57± 0. 02 0. 6
(3) 0. 0156± 0. 0107 0. 5188± 0. 003246. 41± 0. 43 163. 7
(4) 0. 0148± 0. 0022 0. 5181± 0. 001746. 57± 0. 05 2. 3
T able 3: Results of search and evaluation directly on
ImageNet16-120: (1) MOENAS, (2) TF-MOENAS, (3)
MaOENAS, (4) TF-MaOENAS. Results that are underlined
indicate the best method and results that are bolded de-
note the best method with statistical significance (p-value
< 0.01)
Alg.
CIF AR-10
(direct)
CIF AR-100
(transfer)
ImageNet16-120
(transfer)
Search cost
(hours)
Space: T est accuracy - FLOPs
(1) 0. 0198± 0. 01710. 0465± 0. 0183 0. 0316± 0. 0147 53. 7
(2) 0. 0250± 0. 01330. 0322± 0. 0103 0. 0400± 0. 0145 0. 7
(3) 0. 0308± 0. 01770. 0299± 0. 0106 0. 0230± 0. 0091 54. 8
(4) 0. 0096± 0. 0021 0. 0125± 0. 0017 0. 0161± 0. 0016 2. 7
Space: T est accuracy - Latency
(1) 0. 0228± 0. 0019 0. 0419± 0. 0056 0. 0416± 0. 0103 54. 1
(2) 0. 0532± 0. 00560. 0932± 0. 0100 0. 0841± 0. 0175 1. 1
(3) 0. 0277± 0. 00600. 0390± 0. 0093 0. 0369± 0. 0049 54. 8
(4) 0. 0412± 0. 00600. 0612± 0. 0091 0. 0577± 0. 0097 2. 7
Space: T est accuracy - #Parameters
(1) 0. 0180± 0. 01380. 0413± 0. 0171 0. 0342± 0. 0139 53. 8
(2) 0. 0314± 0. 01700. 0502± 0. 0144 0. 0306± 0. 0092 0. 8
(3) 0. 0309± 0. 01760. 0300± 0. 0106 0. 0231± 0. 0090 54. 8
(4) 0. 0098± 0. 0022 0. 0124± 0. 0017 0. 0164± 0. 0017 2. 7
Space: T est accuracy - #MACs
(1) 0. 0195± 0. 01310. 0348± 0. 0129 0. 0250± 0. 0069 53. 8
(2) 0. 0189± 0. 00690. 0322± 0. 0085 0. 0197± 0. 0036 0. 8
(3) 0. 0266± 0. 01500. 0247± 0. 0083 0. 0188± 0. 0060 54. 8
(4) 0. 0104± 0. 0023 0. 0137± 0. 0024 0. 0163± 0. 0022 2. 7
T able 4: IGD on transfer learning task: (1) MOENAS, (2)
TF-MOENAS, (3) MaOENAS, (4) TF-MaOENAS. Results
that are underlined indicate the best method and results that
are bolded denote the best method with statistical signifi-
cance (p-value< 0.01)
spaces simultaneously , rather than having to run separately
TF-MOENAS many times for each bi-objective space.
W e note that the variation in the obtained results across
the datasets (see T ables 1 , 2 , 3 ) can be attributed to the
following reasons. First, the performance metrics (i.e., ac-
curacy or synflow ) and some complexity metrics (e.g., la-
tency or FLOPs) of each candidate architecture vary across
the datasets (e.g., the accuracy of an architecture on CIF AR-
10 is dif ferent from its accuracy on ImagetNet16-120).
Therefore, the IGD and hypervolume results of each NAS
method are dif ferent from one dataset to another . Second,
we assess the ef fectiveness of NAS methods using the test
accuracy after 200 training epochs but, during the search
process of each NAS algorithm, the validation accuracy af-
ter 12 training epochs (for training-based approaches) or
Alg.
CIF AR-10
(direct)
CIF AR-100
(transfer)
ImageNet16-120
(transfer)
Search cost
(hours)
Space: T est accuracy - FLOPs
(1) 1. 0332± 0. 00130. 7962± 0. 00430. 5167± 0. 0051 53. 7
(2) 1. 0223± 0. 00270. 7830± 0. 00540. 5061± 0. 0057 0. 7
(3) 1. 0334± 0. 00110. 7996± 0. 00230. 5191± 0. 0028 54. 8
(4) 1. 0298± 0. 00220. 7958± 0. 00390. 5169± 0. 0015 2. 7
Space: T est accuracy - Latency
(1) 1. 0050± 0. 00060. 7589± 0. 00280. 4861± 0. 0056 54. 1
(2) 0. 9431± 0. 02000. 6425± 0. 02840. 4164± 0. 0179 1. 1
(3) 0. 9967± 0. 01680. 7545± 0. 01590. 4897± 0. 0067 54. 8
(4) 0. 9581± 0. 00980. 7234± 0. 01400. 4710± 0. 0055 2. 7
Space: T est accuracy - #Parameters
(1) 1. 0332± 0. 00140. 7963± 0. 00440. 5155± 0. 0055 53. 8
(2) 1. 0233± 0. 00270. 7824± 0. 00540. 5056± 0. 0057 0. 8
(3) 1. 0334± 0. 0011 0. 7995± 0. 0023 0. 5189± 0. 0028 54. 8
(4) 1. 0296± 0. 00230. 7954± 0. 00400. 5166± 0. 0016 2. 7
Space: T est accuracy - #MACs
(1) 1. 0331± 0. 00170. 7964± 0. 00480. 5165± 0. 0055 53. 8
(2) 1. 0280± 0. 00340. 7803± 0. 00580. 5042± 0. 0060 0. 8
(3) 1. 0333± 0. 00110. 7992± 0. 00230. 5186± 0. 0028 54. 8
(4) 1. 0292± 0. 00250. 7947± 0. 00420. 5160± 0. 0016 2. 7
T able 5: Hypervolume on transfer learning task: (1)
MOENAS, (2) TF-MOENAS, (3) MaOENAS, (4) TF-
MaOENAS. Results that are underlined indicate the best
method and results that are bolded denote the best method
with statistical significance (p-value< 0.01)
synflow (for training-free approaches) are employed as
the performance objective (see experimental details in Sec-
tion 4.1 ). The correlation of 12-epoch validation accuracy
or synflow with the final test accuracy (after 200 epochs)
varies per dataset [ 1 ] (e.g., the correlation coef ficients of
synflow for CIF AR-10, CIF AR-100, and ImageNet16-120
are 0.74, 0.76, and 0.75, respectively). Therefore, the rank-
ings of the considered NAS methods might dif fer across the
datasets.
4.4 T ranferability
This section explores the potential of transfer learning in
NAS by evaluating the transferability of architectures dis-
covered through multi-objective and many-objective NAS
problem formulations. The final approximation front (i.e.,
the elitist archive) of architectures on CIF AR-10 is re-
evaluated on CIF AR-100 and ImageNet16-120 for their
performance and complexity . T ransfer learning in NAS of-
fers several benefits, including the reduced computational
cost and the potential for faster deployment of deep learn-
ing models in real-world applications by identifying archi-
tectures that are highly transferable across datasets.
T able 4 and T able 5 show that TF-MaOENAS yields
better IGD compared to other methods, whereas MaOE-
NAS outperforms other methods in hypervolume in most
cases. In terms of test accuracy , TF-MaOENAS also com-
pletely surpasses most of the approaches in T able 6 , with
better accuracy and lower search costs. It indicates that
TF-MaOENAS using the training-free performance met-
ric synflow are more ef fective at transferring knowledge
from one dataset to another . Besides, both penta-objective
approaches TF-MaOENAS and MaOENAS give better
IGD and hypervolume, respectively , than bi-objective ap-
proaches. Although the four TF-MOENAS approaches
have lower computing time, the optimization result of TF-
MaOENAS is a penta-objective approximation front that
Lightweight Multi-Objective and Many-Objective Problem Formulations… Informatica 47 (2023) 303–314 31 1
CIF AR-10
(direct)
CIF AR-100
(transfer)
ImageNet16-120
(transfer)
Search cost
(hours)
Manually designed
ResNet [ 14 ] 93. 97 70. 86 43. 63 -
W eight sharing
RSPS [ 16 ] 87. 66± 1. 69 58. 33± 4. 34 31. 14± 3. 88 2. 1
DAR TS [ 17 ] 54. 30± 0. 00 15. 61± 0. 00 16. 32± 0. 00 3. 0
GDAS [ 10 ] 93. 51± 0. 13 70. 61± 0. 26 41. 84± 0. 90 8. 0
SETN [ 9 ] 86. 19± 4. 63 56. 87± 7. 77 31. 90± 4. 07 8. 6
ENAS [ 24 ] 54. 30± 0. 00 15. 61± 0. 00 16. 32± 0. 00 3. 6
Non-weight sharing
RS [ 2 ] 93. 70± 0. 36 71. 04± 1. 07 44. 57± 1. 25 3. 3
BOHB [ 13 ] 93. 61± 0. 52 70. 85± 1. 28 44. 42± 1. 49 3. 3
NASWOT
∗ [ 22 ] 93. 84± 0. 23 71. 56± 0. 78 45. 67± 0. 64 -
Evolution
REA [ 27 ] 93. 92± 0. 30 71. 84± 0. 99 45. 54± 1. 03 3. 3
TF-MOENAS
∗ †
[ 7 ] 94. 16± 0. 22 72. 75± 0. 63 46. 61± 0. 46 2. 87
MOENAS ( valacc - FLOPs)
†
94. 28± 0. 17 72. 68± 0. 71 46. 50± 0. 68 53. 7
TF-MOENAS ( synflow - FLOPs)
∗ †
94. 29± 0. 17 73. 22± 0. 71 46. 31± 0. 40 0. 7
MOENAS ( valacc - Latency)
†
94. 30± 0. 09 73. 00± 0. 32 46. 35± 0. 43 54. 1
TF-MOENAS ( synflow - Latency)
∗ †
94. 29± 0. 14 73. 17± 0. 25 46. 28± 0. 31 1. 1
MOENAS ( valacc - #Parameters)
†
94. 27± 0. 18 72. 72± 0. 69 46. 31± 0. 68 53. 8
TF-MOENAS ( synflow - #Paramters)
∗ †
94. 24± 0. 22 72. 81± 0. 76 46. 31± 0. 32 0. 8
MOENAS ( valacc - #MACs)
†
94. 24± 0. 22 72. 60± 0. 77 46. 37± 0. 74 53. 8
TF-MOENAS ( synflow - #MACs)
∗ †
94. 35± 0. 03 73. 15± 0. 07 46. 47± 0. 00 0. 8
MaOENAS
†
94. 27± 0. 13 72. 94± 0. 33 46. 53± 0. 27 54. 8
TF-MaOENAS
∗ †
94. 37± 0. 00 73. 51± 0. 00 46. 51± 0. 04 2. 7
Optimal 94. 37 73. 51 47. 31 -
* T raining-Free †Multi-Objective/Many-Objective
T able 6: Accuracy on the transfer learning task. Previous studies’ results are adopted from [ 1 1 , 22 ]. Results that are
underlined indicate the best method
contains much more insightful information, which can be
obtained in one run and easily projected into any lower -
dimensional objective spaces for intuitive Pareto front in-
vestigations.
5 Conclusions
This paper described dif ferent multi-objective and many-
objective problem formulations for NAS, i.e., MONAS and
MaONAS, which can be solved by multi-objective evolu-
tionary algorithms, such as NSGA-II. W e showed that the
training-free metric synflow can be used as a proxy metric
for the network accuracy performance during NAS, without
requiring any training epochs. Experimental results demon-
strated the benefits of using training-free approaches, espe-
cially the many-objective TF-MaOENAS, including com-
putational ef ficiency , search ef fectiveness and insightful
decision-making capabilities. These benefits were due to
the ability to obtain top-performing architectures on both
direct and transfer learning tasks, and the resulting penta-
objective fronts of non-dominated architectures, which pro-
vided beneficial trade-of f information among the concerned
objectives.
Refer ences
[1] Mohamed S. Abdelfattah, Abhinav Mehrotra, Lukasz
Dudziak, and Nicholas Donald Lane. 2021. Zero-Cost
Proxies for Lightweight NAS. In ICLR 2021 . Open-
Review .net. https://openreview.net/forum?
id=0cmMMy8J5q
[2] James Ber gstra and Y oshua Bengio. 2012. Random
Search for Hyper -Parameter Optimization. J. Mach.
Learn. Res. 13 (2012), 281–305. https://doi.
org/10.5555/2503308.2188395
[3] Peter A. N. Bosman and Dirk Thierens. 2003. The
balance between proximity and diversity in multiob-
jective evolutionary algorithms. IEEE T rans. Evol.
Comput. 7, 2 (2003), 174–188. https://doi.org/
10.1109/TEVC.2003.810761
[4] W uyang Chen, Xinyu Gong, and Zhangyang W ang.
2021. Neural Architecture Search on ImageNet in
Four GPU Hours: A Theoretically Inspired Perspec-
tive. In ICLR 2021 . OpenReview .net. https://
openreview.net/forum?id=Cnon5ezMHtu
[5] Kalyanmoy Deb. 2001. Multi-objective optimization
using evolutionary algorithms . W iley .
[6] Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and
T . Meyarivan. 2002. A fast and elitist multiobjective
genetic algorithm: NSGA-II. IEEE T rans. Evol. Com-
put. 6, 2 (2002), 182–197. https://doi.org/10.
1109/4235.996017
[7] T u Do and Ngoc Hoang Luong. 2021. T raining-
Free Multi-objective Evolutionary Neural Architec-
ture Search via Neural T angent Kernel and Num-
ber of Linear Regions. In ICONIP 2021 (Lec-
tur e Notes in Computer Science, V ol. 13109) ,
T eddy Mantoro, Minho Lee, Media Anugerah A yu,
Kok W ai W ong, and Achmad Nizar Hidayanto (Eds.).
Springer , 335–347. https://doi.org/10.1007/
978- 3- 030- 92270- 2_29
[8] Xuanyi Dong, Lu Liu, Katarzyna Musial, and Bogdan
Gabrys. 2022. NA TS-Bench: Benchmarking NAS Al-
gorithms for Architecture T opology and Size. IEEE
312 Informatica 47 (2023) 303–314 V o et al.
T rans. Pattern Anal. Mach. Intell. 44, 7 (2022), 3634–
3646. https://doi.org/10.1109/TPAMI.2021.
3054824
[9] Xuanyi Dong and Y i Y ang. 2019. One-Shot Neu-
ral Architecture Search via Self-Evaluated T emplate
Network. In ICCV 2019 . IEEE, 3680–3689. https:
//doi.org/10.1109/ICCV.2019.00378
[10] Xuanyi Dong and Y i Y ang. 2019. Searching for a
Robust Neural Architecture in Four GPU Hours. In
CVPR 2019 . Computer V ision Foundation / IEEE,
1761–1770. https://doi.org/10.1109/CVPR.
2019.00186
[1 1] Xuanyi Dong and Y i Y ang. 2020. NAS-Bench-
201: Extending the Scope of Reproducible Neu-
ral Architecture Search. In ICLR 2020 . OpenRe-
view .net. https://openreview.net/forum?id=
HJxyZkBKDr
[12] Thomas Elsken, Jan Hendrik Metzen, and Frank Hut-
ter . 2019. Neural Architecture Search: A Survey . J.
Mach. Learn. Res. 20 (2019), 55:1–55:21. http:
//jmlr.org/papers/v20/18- 598.html
[13] Stefan Falkner , Aaron Klein, and Frank Hutter .
2018. BOHB: Robust and Ef ficient Hyperparame-
ter Optimization at Scale. In ICML 2018 (Pr oceed-
ings of Machine Learning Resear ch, V ol. 80) , Jen-
nifer G. Dy and Andreas Krause (Eds.). PMLR, 1436–
1445. http://proceedings.mlr.press/v80/
falkner18a.html
[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
Sun. 2016. Deep Residual Learning for Image Recog-
nition. In CVPR 2016 . IEEE Computer Society , 770–
778. https://doi.org/10.1109/CVPR.2016.90
[15] Zhilin He. 2023. Improved Genetic Algorithm in
Multi-objective Car go Logistics Loading and Distri-
bution. Informatica 47, 2 (2023). https://doi.
org/10.31449/inf.v47i2.3958
[16] Liam Li and Ameet T alwalkar . 2019. Random
Search and Reproducibility for Neural Architecture
Search. In UAI 2019 (Pr oceedings of Machine Learn-
ing Resear ch, V ol. 1 15) , Amir Globerson and Ri-
cardo Silva (Eds.). AUAI Press, 367–377. http:
//proceedings.mlr.press/v115/li20c.html
[17] Hanxiao Liu, Karen Simonyan, and Y iming Y ang.
2019. DAR TS: Dif ferentiable Architecture Search.
In ICLR 2019 . OpenReview .net. https://
openreview.net/forum?id=S1eYHoC5FX
[18] Zhichao Lu, Ran Cheng, Y aochu Jin, Kay Chen
T an, and Kalyanmoy Deb. 2022. Neural Architec-
ture Search as Multiobjective Optimization Bench-
marks: Problem Formulation and Performance As-
sessment. IEEE T ransactions on Evolutionary Com-
putation (2022). https://doi.org/10.1109/
TEVC.2022.3233364
[19] Zhichao Lu, Kalyanmoy Deb, Erik D. Good-
man, W olfgang Banzhaf, and V ishnu Naresh Bod-
deti. 2020. NSGANetV2: Evolutionary Multi-
objective Surrogate-Assisted Neural Architecture
Search. In ECCV 2020 (Lectur e Notes in Com-
puter Science, V ol. 12346) , Andrea V edaldi, Horst
Bischof, Thomas Brox, and Jan-Michael Frahm
(Eds.). Springer , 35–51. https://doi.org/10.
1007/978- 3- 030- 58452- 8_3
[20] Zhichao Lu, Ian Whalen, Y ashesh D. Dhebar , Kalyan-
moy Deb, Erik D. Goodman, W olfgang Banzhaf, and
V ishnu Naresh Boddeti. 2020. NSGA-Net: Neural
Architecture Search using Multi-Objective Genetic
Algorithm (Extended Abstract). In IJCAI 2020 , Chris-
tian Bessiere (Ed.). ijcai.or g, 4750–4754. https:
//doi.org/10.24963/ijcai.2020/659
[21] Hoang N. Luong and Peter A. N. Bosman. 2012. Eli-
tist Archiving for Multi-Objective Evolutionary Al-
gorithms: T o Adapt or Not to Adapt. In PPSN XII
(Lectur e Notes in Computer Science, V ol. 7492) , Car -
los A. Coello Coello, V incenzo Cutello, Kalyanmoy
Deb, Stephanie Forrest, Giuseppe Nicosia, and Mario
Pavone (Eds.). Springer , 72–81. https://doi.
org/10.1007/978- 3- 642- 32964- 7_8
[22] Joseph Charles Mellor , Jack T urner , Amos J. Storkey ,
and Elliot J. Crowley . 2020. Neural Architecture
Search without T raining. CoRR abs/2006.04647
(2020). arXiv:2006.04647 https://arxiv.org/
abs/2006.04647
[23] Sarat Mishra and Sudhansu Kumar Mishra. 2020.
Performance Assessment of a set of Multi-Objective
Optimization Algorithms for Solution of Economic
Emission Dispatch Problem. Informatica 44, 3
(2020), 349–360. https://doi.org/10.31449/
inf.v44i3.1969
[24] Hieu Pham, Melody Y . Guan, Barret Zoph, Quoc V .
Le, and Jef f Dean. 2018. Ef ficient Neural Archi-
tecture Search via Parameter Sharing. In ICML 2018
(Pr oceedings of Machine Learning Resear ch, V ol. 80) ,
Jennifer G. Dy and Andreas Krause (Eds.). PMLR,
4092–4101. http://proceedings.mlr.press/
v80/pham18a.html
[25] Quan Minh Phan and Ngoc Hoang Luong. 2021. Ef-
ficiency Enhancement of Evolutionary Neural Ar -
chitecture Search via T raining-Free Initialization. In
(NICS) 2021 . 138–143. https://doi.org/10.
1109/NICS54270.2021.9701573
[26] Quan Minh Phan and Ngoc Hoang Luong. 2023.
Enhancing multi-objective evolutionary neural archi-
tecture search with training-free Pareto local search.
Lightweight Multi-Objective and Many-Objective Problem Formulations… Informatica 47 (2023) 303–314 313
Appl. Intell. 53, 8 (2023), 8654–8672. https://
doi.org/10.1007/s10489- 022- 04032- y
[27] Esteban Real, Alok Aggarwal, Y anping Huang, and
Quoc V . Le. 2019. Regularized Evolution for Image
Classifier Architecture Search. In AAAI 2019 . AAAI
Press, 4780–4789. https://doi.org/10.1609/
aaai.v33i01.33014780
[28] Hidenori T anaka, Daniel Kunin, Daniel L. K.
Y amins, and Surya Ganguli. 2020. Pruning neural
networks without any data by iteratively conserving
synaptic flow . In NeurIPS 2020 , Hugo Larochelle,
Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina
Balcan, and Hsuan-T ien Lin (Eds.). https://
proceedings.neurips.cc/paper/2020/hash/
46a4378f835dc8040c8057beb6a2da52- Abstract.
html
[29] An V o, T an Ngoc Pham, V an Bich Nguyen, and
Ngoc Hoang Luong. 2022. T raining-Free Multi-
Objective and Many-Objective Evolutionary Neural
Architecture Search with Synaptic Flow . In SoICT
2022 . ACM, 1–8. https://doi.org/10.1145/
3568562.3568569
[30] Barret Zoph and Quoc V . Le. 2017. Neural Archi-
tecture Search with Reinforcement Learning. In ICLR
2017 . OpenReview .net. https://openreview.
net/forum?id=r1Ue8Hcxg
314 Informatica 47 (2023) 303–314 V o et al.