https://doi.org/10.31449/inf.v43i1.2716 Informatica 43 (2019) 117 –121 117
  
Facial Expression Recognition Based on Local Features and 
Monogenic Binary Coding 
Zhangbao Chen 
School of Electronic and Electrical Engineering, Bengbu University, Bengbu, Anhui 233030, China 
Corresponding address: No. 1866, Caoshan Road, Bengbu, Anhui 233030, China 
E-mail: zbczbao@126.com 
 
Keywords: monogenic binary coding, facial expression recognition, local features, Japanese female facial expression 
Received: March 11, 2019 
Fast developing facial expression reciognition is one of the significant recognition technologies for 
biological features with high applied value. In this study, a monogenic binary coding algorithm was 
considered to illustrate a good matching with local features through the analysis of monogenic signal 
theory and monogenic binary algorithm. Then, the results of the facial expression recognition simulation 
experiment of monogenic based  classical facial expression database, the Japanese Female Facial 
Expression (JAFFE) Database, and the results of traditional Local Binary Patterns-Sparse 
Representation-based Classification (LBP-SRC) residual fusion method were compared to illustrate the 
efficiency of the monogenic binary coding algorithm in the aspect of facial recognition and to provide a 
basis for the application of the monogenic signal theory in facial expression. 
Povzetek: Opisana je nova varianta algoritma za hitro prepoznavanje izrazov na obrazu . 
1 Introduction 
Language is the most common way to convey information 
in daily life. However, the study shows that facial 
expressions contain more information than languages do 
and some cannot be easily recognized by humans 
themselves [1]. Thus, using computers to analyze facial 
expressions and to recognize important information has 
become a critical research topic in computer realm and the 
relative results have obtained a remarkable progress in the 
aspects of mental illness treatment, polygraph detection, 
distance education, human-computer interaction and so 
on. The rough operation flow of facial expression 
recognition is face detection and preprocessing, facial 
feature extraction, and facial expression classification [2], 
where face detection and preprocessing refer to the 
separation of the target face from the image; facial feature 
extraction refers to the extraction of some facial features 
that can reflect human emotions from the face, and facial 
expression classification is a classification of the  
extracted facial features as a specific expression. In the 
aspects of the extraction of facial features, there exist 
many mature technologies, including the extraction 
methods based on geometric features, overall statistical 
features, characteristic frequency rate, and motion 
features. Mollahosseini et al. [3] proposed a method to 
recognize facial expression with deep neural networks 
with good performance in blurred images. Dapogny et al. 
[4] proposed a method of facial recognition based on 
pairwise conditional random forests and its improvement 
in recognition was remarkable by learning random forests 
from different derivative characteristics in paired image. 
Lee et al. [5] proposed a new method of facial expression 
recognition based on sparse representation which could 
reduce the variance within class and emphasize to inquire 
the facial expressions in face images at the same time. The 
monogenic binary coding algorithm is relatively more 
widely used among many facial expression recognition 
technologies, and many improved methods with high 
recognition rate have been derived over time. Yang et al. 
[6] proposed a monogenic binary pattern that combined 
monogenic orientations and monogenic amplitudes to 
apply on facial recognition realm, which obtained good 
results. Zhang et al. [7] proposed a monogenic binary 
algorithm based on local features which was relatively 
easier in calculation and improved the recognition 
accuracy further. In this study, facial expression 
recognition algorithm based on local features and 
monogenic binary coding was analyzed to build the facial 
expression recognition model through the deep 
understanding of the monogenic signal theory and the 
monogenic binary coding and the accuracy rate was 
examined by Japanese Female Facial Expression (JAFFE) 
Database, the classic facial expression database. Some 
suggestions about the methods of facial expression 
recognition were provided through the illustration of the 
wonderful features of monogenic binary coding in facial 
expression recognition realm, which was shown by the 
comparison of accuracy rate with the traditional facial 
recognition method. 
2 Monogenic signal theory 
Hilbert transform is one of the methods to describe signal 
analysis and complex envelope in the field of mathematics 
and signal processing and it can be roughly described as 
118 Informatica 43 (2019) 117 –121 Z. Chen  
the convolution of a signal ) (t s with 
t 
1
 to get the signal 
) ( ' t s [8]. 
The formula of Hilbert transform is: 



 
d
t
s
t s
t
t s

+
 −
−
= =
) ( 1
) ( *
1
) ( '
.           (1) 
Hilbert transform keeps the amplitude of each 
frequency component of the target signal and the phase 
exhibits a ° 90 shift. 
Riesz transform is a two-dimensional extension of the 
Hilbert transform [9]. The monogenic signal based on 
Riesz transform can convert the one-dimensional signal 
into two-dimensional signal. The transformation form of 
Riesz transform in the spatial domain is: 
 














=








=
) ( *
2
) ( *
2
) (
) (
) (
3
3
z f
z
y
z f
z
x
z f
z f
z f
y
x
R


,  (2) 
where the convolution kernel of Risez transform is 








3 3
2
,
2 z
y
z
x
 
, and * is convolution operator. The two-
dimensional frequency response of the Riesz filter is 
y
jy
F
x
jx
F
y x
− = − = ,
. The monogenic signal of a picture 
can be defined as the combination of the signal ) (z f and 
Risez transform. The combination mode is: 
) ( ) ( ) ( ) ( ) , ( ) ( ) ( z jf z if z f z f j i z f z f
y x R S
+ + = + =
. (3) 
The monogenic signal based on its features similarly 
as real and imaginary components can be orthogonally 
decomposed into local amplitude 
A
, local phase 

, and 
local orientation θ . The three mentioned above relatively 
describe the local energy information, the local structure 
information and the geometric orientation information of 
the monogenic signal. The calculation method is: 









=
+
− =
+ + =
) arctan(
) arctan( ) sgn(
2 2 2
x
y
y x
x
y x
f
f
f
f f
f
f f f A


.    (4) 
3 Monogenic binary coding 
It is necessary to analyze from several perspectives with 
different indexes to better describe the detailed 
information of an image. According to monogenic signal 
theory, an image can be extracted three specific pieces of 
information, including amplitude A , phase

, and 
orientation θ , then the local features of the image in 
details is described [10]. 
The feature images which have been decomposed can 
be named as monogenic binary coding amplitude image 
A JPG −
, monogenic binary coding phase image 
 − JPG
, 
and monogenic binary coding orientation image 
 − JPG
. 
The coding images which have been decomposed are all 
composed of two parts, intensity coding of monogenic 
local imaginary part and monogenic local variable coding. 
The coding formula after the combination is: 
   , , , ) ( ), ( ), ( ) (
' '
A N z C z C z C z N JPG
c N c y c x c
= = −
, (5) 
where 
c
z refers to the central pixel of a selected area. 
3.1 Intensity binary coding of monogenic 
local imaginary part 
Since the Riesz transform is unsymmetrical in the 
monogenic signal, the features of the real part are 
suppressed, the representations of imaginary part are 
obvious, the decomposition effects are good, and the 
robustness is strong to factors such as illumination 
changes. In the image, the imaginary part after the 
monogenic binary coding can represent the intensity of 
local feature information of the central pixel. As for a pixel 
c
z , its intensity coding of monogenic local imaginary part 
is 
  ) ( ), (
' '
c y c x
z C z C
, and the coding rule is: 
y x n
z f
z f
z C
c n
c n
c n
, ,
0 ) ( , 1
0 ) ( , 0
) (
'
=





= ,      (6) 
where 
y x
f f , are the horizontal and vertical output of 
the imaginary part of Riesz transform represented by 
monogenic signal. The quadrant was expressed in the 
coordinate system in Figure 1. 
[1,0] [0,0]
[1,1] [0,1]
fx
fy
 
Figure 1: Quadrant coding of features of monogenic 
imaginary part in each place 
3.2 Monogenic local variable binary 
coding 
Local binary pattern (LBP) coding can be used in 
monogenic amplitude, local XOR pattern (LXP) can be 
used in monogenic orientation and monogenic phase and 
Facial Expression Recognition Based on... Informatica 43 (2019) 117 –121 119 
the robustness of codes can be strengthened through 
blocking by the angles when generating the specific codes. 
Local binary coding of monogenic amplitude usually 
uses round LBP operators. The working principle is to 
measure the local energy condition by comparing the 
amplitudes of the surrounding pixels and the central pixel. 
c
z is set as the central pixel and there are P proximal 
points in the surrounding area. The calculation method of 
amplitude binary coding of 
c
z is as follows: 
i
c
P
i
c A
z A i A s z C 2 )) ( ) ( ( ) (
1
0
 − =

−
=
,           (7) 
where  ) (i A is the amplitude value of the i -th 
proximal point, ) (
c
z A is the proximal value of central 
pixel, and 
i
2 is the factor that converts binary system to 
decimal system. The function ) (x s is a sign function and 
its formula is as follows: 





=
0 , 0
0 , 1
) (
x
x
x s .                      (8) 
The XOR pattern coding of monogenic orientation 
and monogenic phase. Local orientation describes the 
major information of local structure, and local phase 
describes the texture information of the image. Both of 
them can be represented with the angles from 0 to 360, 
however, the special points will be too dispersed and 
unstable to extract local features, which refers to relatively 
worse robustness, in the situation where the groups are too 
much and trivial. Due to the strengthening of robustness 
of coding, it is necessary to divide those angles into 
several intervals to generate aggregation effect and 
highlight the features and steadiness of the centralized 
region. 
In this study, the region, [ ) 360 , 0 , was divided into 6 
intervals. Different phases and orientation would be 
distributed to corresponding regions through categories 
and it was defaulted that the phases and orientation in the 
same region represented the same local features. The 
partition function ) (x D is shown as follows: 











 
 
 
 
 
 
=
360 300 6
300 240 5
240 180 4
180 120 3
120 60 2
60 0 1
) (
x
x
x
x
x
x
x D
.                     (9) 
The method of XOR pattern coding of central pixel 
c
z with N proximal points is as follows: 
 , , 2 ) (
1
0
=  =

−
=
M C z C
i
N
i
M
i c M
,           (10) 
where 
M
i
C is defined as follows: 




=
=
)) ( ( )) ( ( , 1
)) ( ( )) ( ( , 0
) (
i D z D
i D z D
z C
c
c
c i
 
 

,       (11) 




=
=
)) ( ( )) ( ( , 1
)) ( ( )) ( ( , 0
) (
i D z D
i D z D
z C
c
c
c i
 
 

,      (12) 
where ()
ic
Cz

 is the binary coding of local phase, 
) (
c i
z C

 is the binary coding of local orientation, 
) (
c
z 
 is 
the local phase of the center pixel 
c
z , 
) (i 
 is the phase 
of the i -th proximal point, ) (
c
z θ is the local orientation 
of the center pixel 
c
z , and ) (i θ is the orientation of the i
-th proximal point. 
4 Facial expression recognition 
simulation experiment based on 
local features and monogenic 
binary coding 
4.1 Experimental preparation 
The experiment was run on a desktop computer with a 
Windows 10 32-bit system, quad-core 3.30 GHz, and 4 
GB memory. 
This experiment is a non-specific facial expression 
experiment, using JAFFE Database which is a classic 
facial expression database. The database consists of 7 
expressions, each with 3 to 4 sample images taken from 
10 Japanese women, and there were 213 facial 
expression images. The seven types of expressions are 
sadness, joy, anger, nausea, surprise, fear and neutral 
expression [11]. 
4.2 Experimental procedure 
Test set and training set were selected firstly. The 
experiment was divided into 10 rounds. In every round, 9 
out of 10 females and about 190 facial expressions would 
be selected as the training set, and the remaining one about 
20 facial expressions was as the test set. After the end of 
the 10th round, the average value of the recognition rates 
was calculated as the result of this experiment.  The 
extracted expressions of a woman in the library are in 
Figure 2. 
 
Figure 2: Seven expression examples in JAFFE 
In order to ensure the validity of the recognition, it 
was firstly necessary to perform preprocessing. A uniform 
cropping template was taken to remove redundant 
information such as hair and neck, and then the processed 
120 Informatica 43 (2019) 117 –121 Z. Chen  
images were obtained after being equalized and scaled. 
The examples are shown in Figure 3. 
The feature graphs of monogenic binary coding of 
each image could be obtained with the method introduced 
in this study and the feature graphs 
A JPG −
, 
 − JPG
, 
and 
 − JPG
 could be generated correspondingly with 
amplitude A , phase 

, and orientation θ . Eight 
proximal points in the proximal region could be obtained 
with the radius of 2 in the LBP calculation. The coding 
values of LBP coding and LXP coding were both in the 
range from 0 to 255; the monogenic local intensity coding 
was non-successive, from 0 to 1024. 
A histogram was established with the three features to 
further construct the feature histogram of the partial block. 
The images were divided into 15 non-overlapping blocks 
provided that the faces were standardized. Then every 
block were divided again into 
3 3 
 sub-blocks. The feature 
graphs of the sub-blocks in the three dimensions were put 
in series to obtain the histogram of the sub-blocks. The 
feature histograms of every partial block could be 
obtained as a proof after being connected. 
Fisher linear discriminant analysis was used to reduce 
dimension of the training sets block by block [12]. The 
original image had a large amount of information and a 
large number of dimensions. If the identification was 
directly recognized, a "dimension disaster" might occur, 
thus it was necessary to perform dimension reduction at 
the beginning. Firstly, the linear discriminant analysis was 
used to learn the feature histogram matrix of each local 
block, and the best projection vector was obtained. Then, 
it was convolved with the histogram to obtain the reduced-
dimensional histogram. 
The cosine distance method was used to calculate the 
similarity between the test set and the training set in each 
local block, and the recognition was accumulated at the 
final [13]. The facial recognition of monogenic binary 
coding after fusion was supposed as 
Com JPG −
, and the 
results of weighting fusion is as follows: 
  − − − −
+ + =
JPG JPG A JPG Com JPG
S S S S 46 . 0 27 . 0 27 . 0
. (13) 
The LBP-SRC residual fusion algorithm with high 
recognition rate in traditional face recognition technology 
was selected [14], the results are modeled and calculated 
in the method described in this study, and the calculation 
results were obtained after being compared with the 
monogenic binary coding method. 
4.3 Experimental results and analysis 
The comparison results of the recognition rates are 
showed in Table 1. 
 
 
 
Type of 
recognition 
algorithm 
LBP-
SRC 
residual 
fusion 
A JPG −  − JPG
 
 − JPG Com JPG − 
The 
success 
rate of 
recognition
/% 
71.15 66.37 68.92 70.38 78.71 
Table 1: Comparison results of algorithm recognition 
rate. 
It could be easily seen that the traditional LBP-SRC 
residual fusion method had a relatively good recognition 
rate, which reached 71.15%; although the three separate 
monogenic binary coding calculation methods also had 
relatively higher facial expression recognition rate, their 
rates were lower than those of the traditional LBP-SRC 
residual fusion method. 
The algorithm based on local features and monogenic 
binary coding integrated the recognition characteristics of 
the three features, and reduced the volatility and limitation 
of recognition. Therefore, the highest facial expression 
recognition rate obtained was 78.71%. 
The corresponding facial expression recognition rate 
based on different expression categories is shown in the 
Figure 4. 
 
Figure 4: Comparison of five algorithms in the 
recognition rates of seven categories of expressions. 
It could be seen that the recognition rates of the five 
modes were different in different types of expressions. 
The total recognition rate seemed to be higher in the 
expressions of joy, nausea, and surprise. The most easily 
recognized expression was surprise. Sadness, anger, fear, 
and neutral expression were more difficult to recognize 
because of the small degree of facial muscle changes. The 
most difficult expression to recognize was fear. 
After comparing the recognition rates of different 
algorithms in the same expressions, it was known that the 
recognition rates of 
Com JPG −
 was higher than those of 
A JPG −
, 
 − JPG
, and 
 − JPG
 except the neutral 
expression, and the highest recognition rate of 
Com JPG −
 
was in the three expressions of joy, nausea, and surprise. 
After comparing of 
Com JPG −
 and LBP-SRC residual 
fusion, it was known that the recognition rates of 
Com JPG −
 in the expressions of sadness, joy, anger, 
nausea, and surprise were higher than those of LBP-SRC 
 
Figure 3: The example of preprocessed results. 
Facial Expression Recognition Based on... Informatica 43 (2019) 117 –121 121 
residual fusion, but the recognition rates in the expressions 
of fear and neutral expression were lower than those of 
LBP-SRC residual fusion. 
Based on the above research and considerations, it 
could be said that the algorithm based on local features 
and monogenic binary coding was a more effective facial 
expression recognition method with higher recognition 
efficiency. 
5 Conclusion 
Nowadays, facial recognition technology is widely used. 
In this study, facial expression recognition based on local 
features and monogenic binary coding algorithm was 
analyzed to illustrate the principles of monogenic signal 
theory and monogenic binary coding, and then the 
corresponding facial expression recognition model of the 
monogenic binary coding algorithm was constructed 
according to the theory, which was then applied to the 
JAFFE Database samples.  Compared to the traditional 
LBP-SRC residual fusion method, the effectiveness of the 
monogenic binary coding algorithm in the face 
recognition field was shown, providing some references 
for related research. 
References 
[1] Kim JG, Ko JG, Lee SJ (2016) Demo:MilliCat: Real-
Time Autonomous Image Suggestion for Mobile 
Messaging. International Conference on Mobile 
Systems. https://doi.org/10.1145/2938559.2938585. 
[2] Kim BK, Roh J, Dong SY, Lee SY. (2016) 
Hierarchical committee of deep convolutional neural 
networks for robust facial expression recognition. 
Journal on Multimodal User Interfaces, 10(2), pp. 
173-189. 
https://doi.org/10.1007/s12193-015-0209-0. 
[3] Mollahosseini A, Chan D, Mahoor MH (2016) 
Going deeper in facial expression recognition using 
deep neural networks. 2016 IEEE Winter Conference 
on Applications of Computer Vision (WACV), IEEE, 
Lake Placid, NY, USA. 
https://doi.org/10.1109/WACV.2016.7477450. 
[4] Dapogny A, Bailly K, Dubuisson S (2016) Pairwise 
Conditional Random Forests for Facial Expression 
Recognition. IEEE International Conference on 
Computer Vision, IEEE, Santiago, Chile. 
https://doi.org/10.1109/ICCV.2015.431. 
[5] Lee SH, Ro YM (2017) Intra-Class Variation 
Reduction Using Training Expression Images for 
Sparse Representation Based Facial Expression 
Recognition. IEEE Transactions on Affective 
Computing, 5(3), pp. 340-351. 
https://doi.org/10.1109/TAFFC.2014.2346515. 
[6] Yang M, Zhang L, Zhang L, Zhang D (2010) 
Monogenic Binary Pattern (MBP): A Novel Feature 
Extraction and Representation Model for Face 
Recognition. International Conference on Pattern 
Recognition, IEEE, Istanbul, Turkey. 
https://doi.org/10.1109/ICPR.2010.657. 
[7] Yang M, Zhang L, Shiu CK, et al. (2012) Monogenic 
Binary Coding: An Efficient Local Feature 
Extraction Approach to Face Recognition. IEEE 
Transactions on Information Forensics and Security, 
7(6), pp. 1738-1751. 
https://doi.org/10.1109/TIFS.2012.2217332. 
[8] Culiuc A, Kesler R, Lacey MT (2016) Sparse 
Bounds for the Discrete Cubic Hilbert Transform. 
https://doi.org/10.2140/apde.2019.12.1259. 
[9] Lunyov AA, Malamud MM (2016) On the Riesz 
basis property of root vectors system for 2 × 2 Dirac 
type operators. Journal of Mathematical Analysis & 
Applications, 441(1), pp. 57-103. 
https://doi.org/10.1016/j.jmaa.2016.03.085. 
[10] Wang Z, Sun X, Sun L (2016) Face Recognition 
Using Monogenic Signal Feature and Discriminative 
Manifold Preserving Embedding. Journal of 
Computational and Theoretical Nanoscience, 
13(4):2537-2550. 
https://doi.org/10.1166/jctn.2016.4615. 
[11] Saaidia M, Zermi N, Ramdani M (2016) Multiple 
Image Characterization Techniques for Enhanced 
Facial Expression Recognition. Intelligent Systems 
Technologies and Applications. Springer 
International Publishing. 
https://doi.org/10.1007/978-3-319-23036-8_43. 
[12] Mahdianpari M, Salehi B, Mohammadimanesh F, 
Brisco B, Mahdavi S, Amani M, Granger JE. (2018) 
Fisher Linear Discriminant Analysis of coherency 
matrix for wetland classification using PolSAR 
imagery. Remote Sensing of Environment, 206, pp. 
300-317. https://doi.org/10.1016/j.rse.2017.11.005. 
[13] Wang X, Yang S, Zhao Y, Wang Y (2018) Lithology 
identification using an optimized KNN clustering 
method based on entropy-weighed cosine distance in 
Mesozoic strata of Gaoqing field, Jiyang depression. 
Journal of Petroleum Science and Engineering, 166, 
pp. 157-174. 
https://doi.org/10.1016/j.petrol.2018.03.034. 
[14] Ameur B, Masmoudi S, Derbel AG, Hamida AB 
(2016) Fusing Gabor and LBP feature sets for KNN 
and SRC-based face recognition. International 
Conference on Advanced Technologies for Signal & 
Image Processing, Monastir, Tunisia. 
https://doi.org/10.1109/ATSIP.2016.7523134. 
  
122 Informatica 43 (2019) 117 –121 Z. Chen