https://doi.org/10.31449/inf.v45i3.3313 Informatica 45 (2021) 487–488 487
Heterogeneous Face Recognition from Facial Sketches
Ivan Gruber
UWB, Faculty of Applied Sciences, DEPT of Cybernetics, Pilsen, Czech Republic
E-mail: grubiv@kky.zcu.cu
Thesis summary
Keywords: generative adversarial networks, image translation, neural networks, machine learning
Received: September 21, 2020
This paper presents a short summary of a dissertation thesis [1]. The thesis presents a novel ap-
proach named X-Bridge for image-to-sketch translation for automatic heterogeneous face recognition.
X-Bridge is based on a conditional adversarial network with an additional reconstruction path and a
shared-latent space assumption between the original and the reconstruction path. With these modiﬁca-
tions, the results provided by X-Bridge overcome other tested state-of-the-art methods. Code is available
at: https://github.com/YvanG/Cross-modal-Bridge.
Povzetek: Povzetek doktorske disertacije na kratko opisuje prepoznavanje obrazov iz skic.
1 Introduction
Generative adversarial networks (GANs) [2] have gained a
huge amount of popularity in recent years thanks to their
ability to generate photo-realistic results and the ability
to capture important image details during image-to-image
translation task. In this paper, a novel approach called X-
Bridge is presented. X-Bridge is designed speciﬁcally as a
cross-modal bridge in the heterogeneous face recognition
task. X-Bridge is a supervised method and its structure
is based on a conditional GAN, however, it also assumes
shared-latent space across two different domains. To fully
demonstrate the abilities of the X-Bridge approach, we test
it on the arguably very challenging task of facial sketch-to-
image translation using CUFSF dataset [3].
Figure 1: Cross-domain translation comparison. There are
input images (the ﬁrst column), translated corresponding
sketches using Pix2pix (the second column), MUNIT (the
third column), X-Bridge (the fourth column), and ground-
truth outputs (the last column).
Figure 2: Cross-domain translation comparison for non-
frontal view. The order of methods is the same as in Fig.
1.
2 X-Bridge method
X-Bridge contains two main paths based: translation path,
and reconstruction path. These paths can be imagined as
two separate GANs. They both have their own generator
and discriminator, whereas both of them share one shared
encoder. Each path has its own speciﬁc task. The task of
the translation path, based on the conditional GAN, is to
translate an input image from its domain into the other do-
main. On the other hand, the task of the reconstruction
path, based on vanilla GAN, is to reconstruct the original
input. Via this process, the reconstruction path motivates
the shared encoder to preserve important features, to gener-
alize better, and to learn important regularities. To further
improve features propagation through the networks, skip
connections in the form of channel concatenation between
the last four layers of the encoder and the ﬁrst four lay-
ers of the generators are added. Both of the paths utilize
traditional adversarial loss, which can be for the domain
translation problem expressed as follows:
L(EG;D) = ^ x
[logD(^ x)]+
+ x;z
[log(1  D(EG(x;z)))];
wherex is a real image from the ﬁrst domain, ^ x is a real im-
488 Informatica 45 (2021) 487–488 I. Gruber
age from the second domain, the encoder, and the generator
are together denoted asEG andD stands for the discrimi-
nator, andz is a vector from the shared-latent space.
Moreover, both paths utilizeL
1
distance deﬁned as fol-
lows:
L
1
(EG) = x;^ x;z
[k^ x  EG
1
(x;z)k
1
]:
The ﬁnal loss is then deﬁned as a sum of both losses for
both paths.
3 Experiments
Several deep generative models were proposed for image-
to-image translation in recent years. Most of existing ap-
proaches are based on supervised learning, however, mod-
els based on unsupervised learning became very popular
lately. To benchmark X-Bridge, we decide to use two sig-
niﬁcant methods, one from each group. We compare it
with the Pix2pix approach [4] and the MUNIT approach
[5]. Both of these methods provide state-of-the-art results
in image-to-image translation tasks, where Pix2pix is a su-
pervised method the same as X-Bridge, whereas MUNIT
is unsupervised. All the methods were trained and tested
on the CUFSF dataset containing 1194 facial photo-sketch
pairs.
In the ﬁrst experiment, we test the translation of frontal
views, see Fig. 1. All methods provide very realistic and
precise results, however, we argue that both supervised
models outperform the MUNIT approach, which has prob-
lems generating sharp images and therefore also small de-
tails. Pix2pix and X-Bridge reach comparable results. In
the second experiment, where we test non-frontal cases, X-
Bridge over-performed other methods, see Fig. 2 in terms
of generalization, precision, and facial features preserva-
tion. All the methods were also tested on the IIIT-D Sketch
dataset with comparable results.
4 Conclusion
We argue that qualitative results provided by X-Bridge
overcome other tested methods. In our future research,
ﬁrstly, we would like to propose a suitable metric to objec-
tively compare the performance of methods in the image-
to-image translation tasks, which is, to this day, non-
existent. Secondly, we would like to address the ambiguity
issue, which is a critical problem in the heterogeneous face
recognition task.
Acknowledgement
This work was supported by the Ministry of Education of
the Czech Republic, project No. LTARF18017.
References
[1] I. Gruber, “Heterogeneous face recognition from facial
sketches,” 2019.
[2] I. J. Goodfellow and al., “Generative adversarial nets,”
in Proceedings of the 27th International Conference
on Neural Information Processing Systems - Volume
2, ser. NIPS’14. Cambridge, MA, USA: MIT Press,
2014, pp. 2672–2680.
[3] W. Zhang, X. Wang, and X. Tang, “Coupled
information-theoretic encoding for face photo-sketch
recognition,” in Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition (CVPR),
2011.
[4] P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-
to-image translation with conditional adversarial net-
works,” CoRR, vol. abs/1611.07004, 2016.
[5] X. Huang, M. Liu, S. J. Belongie, and J. Kautz,
“Multimodal unsupervised image-to-image transla-
tion,” CoRR, vol. abs/1804.04732, 2018.