https://doi.org/10.31449/inf.v44i4.3142 Informatica 44 (2020) 459–467 459 
Probabilistic Weighted Induced Multi-Class Support Vector 
Machines for Face Recognition 
Aniruddha Dey 
Department of Information Technology, MAKAUT, Salt Lake, Kolkata, India 
E-mail: anidey007@gmail.com 
 
Shiladitya Chowdhury 
Department of Master of Computer Application, Techno India, Kolkata, India 
E-mail: dityashila@yahoo.com 
Keywords: face recognition, weighted multi-class SVM, optimal separating hyperplane, probabilistic method. 
Received: April 29, 2020 
Abstract: This paper deals with a probabilistic weighted multi-class support vector machines (WMSVM) 
for face recognition. The support vector machines (SVM) has been applied to many application fields 
such as pattern recognition in last decade. The support vector machines determine the hyperplane which 
separates largest fraction of samples of the similar class on the same side. The SVM also maximizes the 
distance from the either class to the separating hyperplane. It has been observed that in many realistic 
applications, the achieved training data is frequently tainted by outliers and noises. Support vector 
machines are very sensitive to outliers and noises. It may happen that a number of points in the training 
dataset are misplaced from their true position or even on the wrong side of the feature space. The 
weighted support vector machines are designed to overcome the outlier sensitivity problem of the 
support vector machines. The main issue in the training of the weighted support vector machines 
algorithm is to build up a consistent weighting model which can imitate true noise distribution in the 
training dataset, i.e., reliable data points should have higher weights, and the outliers should have lower 
weights. Therefore, the weighted support vector machines are trained depending on the weights of the 
data points in the training set. In the proposed method the weights are generated by probabilistic 
method. The weighted multi-class support vector machines have been constructed using a combination 
of the weighted binary support vector machines and one-against-all decision strategies. Numerous 
experiments have been performed on the AR, CMU PIE and FERET face databases using different 
experimental strategies. The experimental results show that the performance of the proposed method is 
superior to the multi-class support vector machines in terms of recognition rate. 
Povzetek: Opisana je metoda podpornih vektorjev za prepoznavanje obrazov. 
1 Introduction 
The SVM can be considered as an estimated 
implementation of the structural risk minimization 
method [1]. In 1998, Vapnik first devised the SVM to 
address the pattern classification and recognition 
problem [2]. The objective of the support vector 
machines is to determine the hyperplane that divides 
largest fraction of images in the related class on the same 
adjacent, whereas maximizing the space from the both 
class to the separating hyperplane. This separating 
hyperplane is known as optimal separating hyperplane 
(OSH). The OSH minimizes the misclassification risk. It 
may be noted that in many realistic applications, some 
training data points are placed far away from the accurate 
position or even on the wrong side of the feature space. 
These data points are called outliers. In general, the 
training dataset is severely affected by the outliers and 
different kind of noises. The SVMs are actual sensitive to 
outliers and different kind of noises. Therefore, in the 
training phase, the outliers with large Lagrangian 
coefficient can become a support vector [3]. In the past 
few decades, wide ranges of techniques have been 
introduced by several researchers to solve the 
aforementioned bottleneck of the SVM. Zhang [4] 
proposed central SVM (CSVM) in which class centres 
are used to build the support vector machines. For each 
training data point, the adaptive margin SVM (AMSVM) 
training algorithm [5] depends on the utilization of the 
adaptive margins. Song et al. [6], [7] proposed a robust 
SVM (RSVM) in which to generate an adaptive margin, 
the space between centre of every class of the training 
sample and the data point is computed. But this method 
has a drawback because it is very difficult to tune the 
penalty parameter. The method uses the averaging 
method which is partly sensitive for outliers and noises. 
Authors in [8] and [9] proposed fuzzy SVM (FSVM) to 
eliminate the outlier sensitivity problem. To moderate the 
effect of outliers, the method applies the fuzzy 
membership’s values to the training data. Membership 
function selection is main drawback for the FSVM. Cao 
et al. [10] proposed the support vector novelty detector 
460 Informatica 44 (2020) 459–467 A. Dey et al. 
(SVND) which detects the outliers more appropriately 
from the normal data points, and solve one-class 
classification problem. 
Some new improvements on the support vector 
machines can be establish in the literature review. Quan 
et al. [11] established the weighted least squares support 
vector machine (WLS-SVM) local region algorithm. This 
algorithm calculates the nonlinear time series, as well as 
performs robust estimation for regression using the 
limited observations. In this method, there is a simple 
and effectual technique to model parameter selection 
based on the leave one-out cross-validation strategy. A 
weighting method on Lagrangian SVM (LSVM) is 
proposed by Hwang et al. [12]. This method deals with 
the imbalanced data classification problem. In this 
method, a weight parameter is added to the LSVM 
design. Therefore, the method can get better performance 
for the minority class with minimum control on 
classification performance of the majority class. Yu [13] 
proposed the asymmetric weighted least squares support 
vector machine (LSSVM) combined learning procedure. 
This methodology is based on the evolutionary 
programming (EP), and is used for software repository 
mining. A nonparallel plane classifier, namely, weighted 
twin support vector machines with local information 
(WLTSVM) is proposed by Ye et al. [14]. This method 
mines underlying similarity information within the 
samples as much as possible. Shao et al. [15] proposed 
the weighted Lagrangian twin support vector machines 
(WLTSVM) for the imbalanced data classification. 
Xanthopoulos et al. [16] suggested the weighted support 
vector machines for automated procedure checking and 
early error diagnosis. The robust LS-SVM (RLS-SVM) 
is proposed by Yang et al. [17], and the method is 
established on the truncated least squares loss function 
for classification and regression with noises. Zhang et al. 
[26] proposed an emotion recognition system based on 
facial expression images. In this work, the bi-orthogonal 
wavelet entropy is used to extract multi-scale features 
and the fuzzy multi-class support vector machine is used 
as classifier. More recently, Wang et al. offered a new 
intelligent emotion recognition system where stationary 
wavelet entropy are used to extract feature values and a 
single hidden layer feed forward neural network is 
employed as the classifier [27]. Aburomman and Reaz 
proposed ensemble classifiers are generated using the 
novel methods as well as the weighted majority 
algorithm (WMA) technique [28]. Some learning based 
discriminant analysis techniques have been suggested, 
such as local structure preserving discriminant analysis 
[29], Discriminant similarity and variance preserving 
projection [30] to abuse the label info contained in the 
data. Shiet al. established 3D face recognition method 
based on LBP and SVM.Hu and Cui proposed Digital 
image recognition based on Fractional order PCA-SVM 
coupling algorithm [32]. By improve the SVM gender 
classification accuracy using clustering and incremental 
learning suggested by Dagher and Azar [33].Karet al. 
face expression recognition system based on ripplet 
transform type II and least square SVM [34]. 
In this study, the probabilistic weighted multi-class 
support vector machine is devised to address the outlier 
sensitivity problem. The main issue in the training 
samples of the weighted support vector machines 
algorithm is to improve a reliable weighting model which 
can reflect true noise distribution in the training data, i.e., 
reliable data points should have higher weights, and the 
outliers should have lower weights. Therefore, dissimilar 
weights are allocated to different data points. Therefore, 
as per relative importance of the data points in the 
training set, the training algorithm of the weighted SVM 
determines the decision surface. The probabilistic 
method is used to generate the weights of the proposed 
probabilistic weighted multi-class support vector 
machines training algorithm. These weights are 
incorporated with all data points of the training set. The 
weighted support vector machines training algorithm 
maximizes the margin of separation with the help of 
weights to prevent some points. In this work, the 
generalized two-dimensional Fisher’s linear discriminant 
(G-2DFLD) technique is applied for feature extraction 
[18]. The extracted features are applied on the proposed 
probabilistic weighted multi-class support vector 
machines for training, classification and recognition. The 
empirical results on the AR, CMU PIE and FERET face 
database illustrate that the proposed probabilistic 
weighted multi-class support vector machines 
(WMSVM) perform better than the multi-class SVM, in 
terms of face recognition. 
Rest of the paper is ordered as follows. The basic 
idea of the SVM is given in Section 2. The proposed 
weight generating scheme, based on the probabilistic 
method, is discussed in Section 3. Section 4 describes the 
weighted support vector machines. The weighted multi-
class support vector machines are defined in Section 5. 
The simulation results on the AR, CMU PIE, and FERET 
face databases are described in Section 6. Section 7 
contains the concluding remarks. 
2 Revisited support vector machines 
The support vector machines were developed for binary 
pattern classification problem [1 -3]. It has been seen that 
in case of pattern classification problem, the SVMs 
provide satisfactory performance. The basic idea of the 
binary-class SVMs [1- 3] is to split two classes by a 
hyperplane. This separating hyperplane is created from 
the available training samples. The support vector 
machines find the hyperplane that splits largest fraction 
of samples of the alike class on the similar side, while 
maximizing the space from the each class to the 
separating hyperplane. This separating hyperplane is 
known as optimal separating hyperplane (OSH). The 
OSH reduces the misclassification risk. 
3 Weight generation by the 
probabilistic method 
Although, the support vector machines are very powerful 
for solving classification problem, however, it has some 
limitations as it treats all the training data points of a 
Probabilistic Weighted Induced Multi-Class Support Vector ... Informatica 44 (2020) 459–467 461 
given class uniformly. It has been seen that, all the data 
points of the training set are not equally important for 
classification and recognition purpose in many real world 
application domains. This limitation of the support vector 
machines can be overcome by designing the weighted 
support vector machines. In the weighted support vector 
machines, each and every data points are treated 
separately according to their weights. 
The main issue of the training algorithm of the 
weighted support vector machines is to develop a reliable 
weighting model which can reflect actual distribution in 
the training set. The reliable data points should have 
higher weights, and the outliers should have lower 
weights. Therefore, dissimilar weights are assigned to 
different data points. The decision surface generated by 
the weighted SVM training algorithm considers the 
relative significance of data points in the training set. The 
weights employed in the proposed probabilistic weighted 
multi-class support vector machines are generated by the 
probabilistic method. 
Let the c
th
 class has N c numbers of training samples. 
We consider the positive samples are belonging to class 
y 1 and negative samples are belonging to class y 2 to 
design the weighted SVM for c
th
 class. 
Let P(y j); 2 , 1  j , defined the prior probability of 
the sample which is included in y j class. The prior 
probability of the sample belonging to y 1 class can be 
described as follows: 
N
N
y P
c
= ) (
1
   (1) 
Similarly, the prior probability of the sample 
belonging to y 2 class can be demonstrated as follows: 
N
N N
y P
c
−
= ) (
2
     (2) 
Now for a positive training sample x i the weight a i is 
illustrated as follows: 
      (3) 
Similarly, in case of a negative training sample x i the 
weight a i is generated as follows: 
(4) 
It is to be noted that 1  
i
a  , and ε ) 0 (   is 
sufficiently small. The term ) | (
i j
y P x ; 2 , 1  j is called 
posterior probability, i.e., probability of the class is y j 
after we have performed measurement on the data x i. 
Similarly, the term
) | (
j i
y P x
; 2 , 1  j is called 
conditional probability i.e., the probability that the class 
y j has the feature value x i. The equations (3) and (4) 
ensure that the lower weights are assigned to outliers or 
close to outliers. 
Every measurement must be assigned to one of these 
two classesy 1 or y 2. Therefore, 
1 ) | (
2
1
=

=
i
j
j
y P x
  (5) 
The posterior probability of the sample 
xi
) 2 , 1 ); | ( (  j y P
i j
x
 is used as weight for designing 
the proposed probabilistic weighted multi-class support 
vector machines. 
4 Weighted support vector machines 
It has been seen that the training dataset is often tainted 
by outliers and noises in many real world applications. 
The support vector machines are very sensitive to 
outliers and noises. It may so happen that some patterns 
in the training set are outliers and misplaced far away 
from the true position or even on the wrong side of the 
feature space. During the training process, the outlier 
with large Lagrangian coefficient can become a support 
vector. The optimal hyperplane obtained by the support 
vector machines depends only on small part of the data 
points, i.e., support vectors. So, in presence of outliers, 
the decision boundary obtained by the support vector 
machines training algorithm deviate severely from the 
optimal separating hyperplane. 
The weighted support vector machines are designed 
to address this issue. In weighted support vector 
machines, the data points of the training set are treated 
differently according to their weights. The training 
algorithm gives more effort to correctly classify more 
important data points (i.e., the data points with larger 
weights) while caring less effort to less important data 
points (i.e., the data points with lower weights, probably 
outliers). 
Let B be a set of labeled training samples associated 
with weights: 
} 1 , 1 { ; ; )} , , {(
1
− +    =
= i
d
i
N
i i i i
y a y B x x      (6) 
where, x i is the input pattern for the i
th
 training sample, a i 
is the weight assigned to x i, and y i is the class of the x i. In 
the proposed probabilistic weighted multi-class support 
vector machines, the weight is generated by the weight 
generating technique described in section 3. 
To achieve better performance, the weighted support 
vector machines training algorithm maximizes the 
margin of separation. The optimal separating hyperplane 
in the case of weighted support vector machines 
minimizes the following function: 

=
+ = 
N
i
i i
T
a C a
1
2
1
) , , (   ω ω ω
 (7) 
with constraints defined [1, 2].  
In the optimization problem, the effect of the 
parameter 
i
 is reduced by the small value of a i. 
Therefore, the training algorithm of the weighted SVM 
considers the corresponding point (x i, y i) as less 
significant for classification. 
The solution to the optimization problem (7), subject 
to the constraints defined in [1, 2], is given by the saddle 
point of the following Lagrange function: 
− + =

=
N
i
i i
T
i
a C λ b L
1
2
1
) , , , (   ω ω ω
 
 
= =
− + − + 
N
i
i i
N
i
i i
T
i i
b y
1 1
) 1 ) ) (( (     x ω
       
(8) 
462 Informatica 44 (2020) 459–467 A. Dey et al. 
By expanding equation (8) term by term, the 
following equation is obtained. 
 
= =
−  − + =
N
i
i
T
i i
N
i
i i
T
i
y λ a C λ b L
1 1
) (
2
1
) , , , ( x ω ω ω ω  
  
= = =
− +
N
i
N
i
N
i
i i i i i
λ λ y λ b
1 1 1


=
−
N
i
i i
1
 
       (9)
 
The Lagrange multipliers 
i
 are presented in 
equations (8) and (9) to ensure the non-negativity of 
slack variables
i
 . At saddle point, the Lagrange 
function (8) has to be minimized with respect to ω, b, 
and
i
 . It has to be also maximized with respect to 
i
λ 
where, C a λ
i i
  0 . 
We can convert the Lagrange function (8) into its 
corresponding dual problem as follows: 
)} , , , ( min { max ) ( max
, ,
λ b L λ
i
b
i

   
ω ω =
           
(10) 
Three optimal conditions can be derived from 
equation (9) as follows: 
0 ) , , , (
1

=
= − =
N
i
i i i i
y λ λ b L x ω ω
ω



            
(11) 
0 ) , , , (
1

=
= =
N
i
i i i
y λ λ b L
b



ω
                   (12) 
and 
0 ) , , , ( = − − =
i i i i
i
λ Ca λ b L  


ω
                 
(13) 
The dual objective function can be obtained by 
substituting equations (11), (12) and (13) into the right 
side of the Lagrange function (9). Therefore, the dual 
problem for the weighted SVM can be formulated as 
follows: 
Maximized: 
) (
2
1
) (
1 1 1
j i j i j
N
i
N
j
i
N
i
i
y y R x x  − =
  
= = =
   
        
(14) 
with constraints defined in SVM and 
C a
i i
   0
 ; i=1,2,..,N       (15) 
It can be seen that by setting 
1 =
i
a
 for all i, the 
weighted support vector machines will be similar to the 
support vector machines. There is only one free 
parameter (i.e., C) in support vector machines; whereas, 
in addition to C, the number of free parameters in 
weighted support vector machines is equal to the number 
of training samples. 
It has been observed that the face individuals are 
highly non-linear because of the variations in facial 
expression, illumination condition, pose, etc. So, it is 
necessary to non-linearly map each sample into a high-
dimensional feature space using a non-liner 
function
d D
D d
  →  ; : 
, and then the linear support 
vector machines can be implemented in high dimensional 
feature space. A positive definite kernel function K is 
selected a priori to perform inner product of vectors in 
the feature space to avoid explicit mapping  and 
computational burden in the high-dimensional feature 
space. The kernel function can be defined as follows: 
) ( ) ( ) , (
j j
x x x x    =
i i
K
           (16) 
where, 
) (
i
x 
 is the transformed vector of the person x i 
by the non-linear function  . 
The polynomial and Gaussian radial basis function 
kernels are two well-known kernel functions: 
Polynomial kernel:  
r
i i
K ) ( ) , (
j j
x x x x  =
                         
(17) 
Gaussian radial basis function: 










−
− =
2
2
2
exp ) , (

j
j
x x
x x
i
i
K
                    
(18) 
where, r is a positive integer and 0   . 
In the proposed probabilistic weighted multi-class 
support vector machines, we used the Gaussian radial 
basis function as kernel function. Therefore, the dual 
objective function (14) can be rewritten as follows: 
Maximized: 
) , (
2
1
) (
1 1 1
j i j i j
N
i
N
j
i
N
i
i
K y y F x x    
  
= = =
− =
       
(19) 
with constraints defined in equations (15). 
It can be observed that the objective function to be 
maximized for the dual problem of the support vector 
machines and weighted support vector machines is the 
same. The support vector machines are differing from the 
weighted support vector machines in that the constraint 
C
i
   0 is replaced with more stringent 
constraint C a
i i
   0 . The constraint optimization for 
the weighted support vector machines, and computations 
of the optimum values of the weight vector  and bias 
proceed in the same way as in the case of the support 
vector machines. 
Solving equation (19) with constraints defined in 
equations (15) determines the optimum Lagrange 
multipliers
i o,
 . Putting the values of optimum Lagrange 
multipliers 
i o,
 in equation, the optimum weight vector 
ω o can be obtained. 
The Karush-Kuhn-Tucker (KKT) conditions in case 
of weighted support vector machines can be defined  
N i
i i
,..., 2 , 1 ; 0 = =    (20) 
By combining equations (13) and (20) the following 
equation can be formed: 
N i C a
i i i
,..., 2 , 1 ; 0 ) ( = = −             (21) 
From SVM, it can be observed that 
0 =
i
 ;If C a
i i
   (22) 
Probabilistic Weighted Induced Multi-Class Support Vector ... Informatica 44 (2020) 459–467 463 
The optimum bias b ois determined by taking any data 
point  in the training set for which
C a
i i o
 
,
0 
, 
and therefore
0 =
i

, and using that data point. 
In the proposed probabilistic weighted multi-class 
support vector machines, we solved the dual objective 
function using the sequential minimal optimization 
(SMO) algorithm [20]. 
5 Weighted multi-class support 
vector machines 
The weighted multi-class support vector machines are 
constructed using a combination of the weighted support 
vector machines and the decision strategy to decide the 
class of the input pattern. Each weighted SVM is 
separately trained. The weighted multi-class support 
vector machines can be implemented using the one-
against-all [1] and one-against-one [21] decision 
strategies. The one-against-all decision strategy is 
adopted in the proposed probabilistic weighted multi-
class SVM to classify samples, as it requires less amount 
of memory. This decision strategy is stated as follows: 
Let the training set 
} ,..., 2 , 1 { }; ,... 2 , 1 { }; , , { M j N i a c T
i j i
  = x
 be the collection 
of the training sample, its class, and weight, respectively. 
We designed the weighted SVM for each class by 
discriminating that class from the rest of (M-1) classes. 
Therefore, in this methodology, we have used M number 
of weighted support vector machines. The set of training 
samples and their required outputs (x i, y i) are used to 
design the weighted SVM for class l. For a training 
sample x i, the required output y i is formulated as follows: 



 −
= +
=
l c if
l c if
y
j
j
i
1
1
  (23) 
The desired output of the positive and negative 
samples are 1 + =
i
y and 1 − =
i
y , respectively. 
The classifier recognizes a test sample by using the 
winner-takes-all decision strategy. Let the test sample x 
is recognized as class c. The output of the classifier is 
defined as follows: 
  ) ( max
arg
x
l
f c =
;  M l ,..., 2 , 1 = (24) 
where, 
) (x
l
f
 is the output of the discriminant 
function of the weighted SVM constructed for class l. 
6 Empirical results 
We evaluate the performance of the proposed 
probabilistic weighted multi-class support vector 
machines on the AR face database [22], [23], CMU PIE 
face database [24], and FERET face database [25]. 
Figure 1 (i), (ii), and (iii) displays the face images of a 
individual from the AR, CMU PIE, FERET face 
database. The effectiveness of the weighted multi-class 
support vector machines has also been tested on a 
synthetic dataset. 
The AR face record contains of 26 different frontal 
subject faces of 126 individual, among them 56 females 
and 70 males. Individuals are collected in two different 
sessions divided by two weeks with variation in facial 
expressions, illumination condition, and occlusion [22, 
23].In the CMU PIE face database, there are 41,368 face 
images of 68 persons (subject) each of 13 different poses, 
43 different illumination conditions, and 4 different 
 
 
 (i) 
 
 
 (ii) 
 
 (iii) 
Figure 1: Few face images of person from the (i) AR, (ii) CMU PIE (iii) FERET face database. 
 
464 Informatica 44 (2020) 459–467 A. Dey et al. 
expressions. The FERET face database [25] is used to 
measure the ability of the face recognition system to 
handle large databases, changes in people’s appearance 
over time, variations in illumination, scale, and pose. 
Figure 1 (iii) shows example images of a subject from 
the FERET face database. In this work, experiments are 
carried out using two standard testing methodology, 
namely, i) FERET Tests September 1996 testing 
methodology, and ii) FRVT Tests May 2000 testing 
methodology. In FERET Tests September 1996 testing 
methodology, the frontal face images of 1196 subjects 
are present. The training set contains 1196 face images, 
one image from each of 1196 distinct subjects. In this 
testing methodology, there are four test sets, namely, 
fafb, fafc, Dup I and Dup II. The test sets fafb, fafc, Dup 
I and Dup II contain 1195, 194, 722 and 234 images, 
respectively. In FRVT Tests May 2000 testing 
methodology, the face images of 200 subjects are 
present. The training set contains 200 frontal images, one 
image per subject from 200 distinct subjects. In this 
Classifier 
Recognition rate (%) 
1
st
 Experimental 
Strategy 
2
nd
 Experimental 
Strategy 
Probabilistic weighted multi-class 
support vector machines 
82.50 
(38×38) 
62.50 
(36×36) 
Multi-class support vector machines 
82.00 
(38×38) 
61.91 
(36×36) 
Table 1: Comparisons among the probabilistic weighted multi-class support vector machines and the multi-class 
support vector machines in terms of recognition rates using the performance evaluation over time (first) and 
performance evaluation with occluded images (second)experimental strategy on the AR face database. In table within 
the parentheses represent the number of features size. 
Classifier 
Avg. recognition rate (%) 
first experimental strategy second experimental strategy 
k=5 k=10 k=15 k=20 k=5 k=10 k=15 k=20 
Probabilistic 
weighted multi-
class support 
vector machines 
75.31 
(26×26) 
86.56 
(24×24) 
88.65 
(20×20) 
89.04 
(20×20) 
80.86 
(24×24) 
86.53 
(20×20) 
92.78 
(20×20) 
98.18 
(20×20) 
Multi-class support 
vector machines 
75.28 
(26×26) 
86.52 
(24×24) 
88.59 
(20×20) 
88.96 
(20×20) 
80.32 
(24×24) 
85.86 
(20×20) 
91.89 
(20×20) 
97.49 
(20×20) 
Table 2: Comparisons of the probabilistic weighted multi-class support vector machines and the multi-class support 
vector machines in terms of average recognition rates for the performance evaluation with pose and expression 
variations (first) and performance evaluation with illumination variation (second) experimental strategy on the CMU 
PIE face database. Figures within the parentheses denote the number of features. 
Classifier 
Recognition rate (%) 
FERET Tests September 1996 testing 
methodology 
FRVT 2000 Tests May 2000 testing methodology 
fafb fafc Dup I Dup II P1_probe P2_probe P3_probe P4_probe 
Probabilistic 
weighted 
multi-class 
support 
vector 
machines 
98.33 
(20×20) 
97.94 
(18×18) 
89.34 
(22×22) 
83.76 
(18×18) 
68.50 
(20×20) 
49.25 
(22×22) 
28.50 
(22×22) 
22.25 
(24×24) 
Multi-class 
support vector 
machines 
98.16 
(20×20) 
96.91 
(18×18) 
88.78 
(22×22) 
83.33 
(18×18) 
67.75 
(20×20) 
48.75 
(22×22) 
27.75 
(22×22) 
21.75 
(24×24) 
Table 3: Comparison of performances between the probabilistic weighted multi-class support vector machines and the 
multi-class support vector machines in terms of recognition rates using FERET Tests September 1996 testing 
methodology and FRVT 2000 Tests May 2000 testing methodology on the FERET face database. Figures within the 
parentheses denote the number of features. 
 
Probabilistic Weighted Induced Multi-Class Support Vector ... Informatica 44 (2020) 459–467 465 
testing methodology, there are four test sets, namely, 
P1_probe, P2_probe, P3_probe and P4_probe.  
The comparison of performances between the 
probabilistic weighted multi-class support vector 
machines and the multi-class support vector machines in 
terms of recognition rates are illustrated in Table 1, 2, 3 
on the AR, CMU-PIE, and FERET face database, 
respectively. From experimental results, it can be again 
observed that the performance of the probabilistic 
weighted multi-class support vector machines is better 
than the multi-class support vector machines in terms of 
recognition rate. 
In this experiment, a synthetic dataset E containing 
2D data from two different classes is randomly 
generated. In this dataset there are 50 data points, where 
25 data points belong to one class and remaining 25 data 
points belong to another class. Let the dataset E can be 
defined as follows: 
} 1 , 1 { ; ; )} , {(
2 50
1
− +    =
= i i i i i
y y E x x
        
(25) 
To test the effectiveness of the proposed 
probabilistic weighted multi-class support vector 
machines, the data present in the dataset E are separately 
applied on both the multi-class support vector machines 
as well as on the probabilistic weighted multi-class 
support vector machines. The optimal separating 
hyperplane generated by the multi-class support vector 
machines and the probabilistic weighted multi-class 
support vector machines are shown in Figures 2(a) and 
2(b), respectively. 
The encircled data points are support vectors and the 
distance between the two dotted lines is the margin of 
separation between two classes in both Figures. The line 
between these two dotted lines is optimal separating 
hyperplane. In case of the multi-class support vector 
machines, 11 data points are present within the margin of 
separation region, as shown in Figure 2(a). Whereas, in 
case of the probabilistic weighted multi-class support 
vector machines, 10 data points are present within the 
margin of separation region, as shown in Figure 2(b). 
Therefore, the probabilistic weighted multi-class support 
vector machines successfully reduces the probability of 
misclassification, and produces better generalization than 
that with the multi-class support vector machines. 
 
 (a)  (b) 
Figure 2: Comparative study in terms of the optimal separating hyperplane generation (a): multi-class support vector 
machines (b): proposed probabilistic weighted multi-class support vector machines, on the dataset E. 
7 Conclusion 
In this paper, we present the probabilistic weighted 
multi-class support vector machines for efficient face 
recognition. Support vector machines usually used for 
pattern classification and recognition as well as computer 
vision domains due to its high generalization ability. 
However, support vector machines have some limitations 
because it treats all the training data points of a given 
class uniformly. As a result, in presence of outliers the 
training algorithm of the support vector machines can 
make the decision boundary to be deviated severely from 
the optimal hyperplane. This limitation of support vector 
machines can be overcome by the weighted support 
vector machines where each data point is treated 
separately according to its weight. In the proposed 
probabilistic weighted multi-class support vector 
machines, a reliable weighting model is developed where 
higher weights are assigned to reliable data points, and 
lower weights are assigned to outliers. These weights are 
generated by the probabilistic method; therefore it will 
take more computing times due to the weight generating 
algorithm. The training algorithm of the probabilistic 
weighted support vector machines learns the decision 
surface according to the relative importance of the 
training data. The proposed probabilistic weighted multi-
class support vector machines have been constructed 
using a combination of weighted binary support vector 
machines and one-against-all decision strategy. Several 
experiments have been carried out on the AR, CMU PIE 
and FERET face databases using different experimental 
strategies. The facial features extracted by the G-2DFLD 
method are separately applied on both the proposed 
probabilistic multi-class support vector machines as well 
as on the weighted multi-class support vector machines 
for training, classification and recognition. The 
  
466 Informatica 44 (2020) 459–467 A. Dey et al. 
experimental results show that the performance of the 
probabilistic weighted multi-class support vector 
machines is superior to the multi-class support vector 
machines in terms of recognition rate. 
8 Acknowledgement 
The authors would also like to thank Dr. Sayan Kahali 
for several discussions which improve the presentation of 
the paper considerably. 
9 References 
[1] L.J. Cao, K.S. Chau, W.K. Chong, H.P. Lee, and 
Q.M. Gu, “A comparison of PCA, KPCA and ICA 
for dimensionality reduction in support vector 
machine”, Neurocomputing, Vol. 55, No. 1-2, pp. 
321-336, 2003. 
https://doi.org/10.1016/S0925-2312(03)00433-8 
[2] V.N. Vapnik, Statistical Learning Theory, John 
Wiley & Sons, New York, 1998. 
[3] C.J.C. Burges, “A tutorial on support vector 
machines for pattern recognition”, Data Mining and 
Knowledge Discovery, Vol. 2, No. 2, pp. 121-167, 
1998. 
https://doi.org/10.1023/A:1009715923555 
[4] X. Zhang, “Using class-center vectors to build 
support vector machines”, Proc. of the IEEE Signal 
Processing Society Workshop, pp. 3-11, 1999. 
[5] R. Herbrich, and J. Wetson, “Adaptive margin 
support vector machines for classification”, Proc of 
the Ninth International Conference on Artificial 
Neural Networks, Vol. 2, pp. 880-885, 1999. 
https://doi.org/10.1049/cp:19991223 
[6] Q. Song, W. Hu, and W Xie, “Robust support 
vector machine with Bullet hole image 
classification”, IEEE Transactions on Systems, Man 
and Cybernetics, Vol. 32, No 4, pp. 440-448, 2002. 
https://doi.org/10.1109/TSMCC.2002.807277 
[7] W.J. Hu, and Q. Song, “An accelerated 
decomposition algorithm for robust support vector 
machines”, IEEE Transactions on Circuits and 
Systems II, Vol. 51, No. 5, pp. 234-240, 2004. 
https://doi.org/10.1109/TCSII.2004.824044 
[8] C. Lin, and S. Wang, “Fuzzy support vector 
machines”, IEEE Transactions on Neural Networks, 
Vol. 13, No. 2, pp. 464-471, 2002. 
https://doi.org/10.1109/72.991432 
[9] C. Lin, and S. Wang, “Training algorithms for 
fuzzy support vector machines with noisy data”, 
Pattern Recognition Letters, Vol. 25, No. 2, pp. 
1647-1656, 2004. 
https://doi.org/10.1016/j.patrec.2004.06.009 
[10] L.J. Cao, H.P. Lee, and W.K. Chong, “Modified 
support vector novelty detector using training data 
with outliers”, Pattern Recognition Letters, Vol. 24, 
No. 14, pp. 2479-2487, 2003. 
https://doi.org/10.1016/S0167-8655(03)00093-X 
[11] [11] T. Quan, X. Liu, and Q.Liu, “Weighted 
least squares support vector machine local region 
method for nonlinear time series prediction”, 
Applied Soft Computing, Vol. 10, No. 2, pp. 562-
566, 2010. 
https://doi.org/10.1016/j.asoc.2009.08.025 
[12] J. P. Hwang, S. Park, and E. Kim, “A new weighted 
approach to imbalanced data classification problem 
via support vector machine with quadratic cost 
function”, Expert Systems with Applications, Vol. 
38, No. 7, pp. 8580-8585, 2011. 
https://doi.org/10.1016/j.eswa.2011.01.061 
[13] L. Yu, “An evolutionary programming based 
asymmetric weighted least squares support vector 
machine ensemble learning methodology for 
software repository mining”, Information Sciences, 
Vol. 191, pp. 31-46, 2012. 
https://doi.org/10.1016/j.ins.2011.09.034 
[14] Q. Ye, C. Zhao, S. Gao, and H. Zheng, “Weighted 
twin support vector machines with local 
information and its application”, Neural Networks, 
Vol. 35, pp. 31-39, 2012. 
https://doi.org/10.1016/j.neunet.2012.06.010 
[15] Y. Shao, W.Chen, J. Zhang, Z. Wang, and N. Deng, 
“An efficient weighted Lagrangian twin support 
vector machine for imbalanced data classification”, 
Pattern Recognition, Vol. 47, No. 9, pp. 3158-3167, 
2014. 
https://doi.org/10.1016/j.patcog.2014.03.008 
[16] P. Xanthopoulos, and T. Razzaghi, “A weighted 
support vector machine method for control chart 
pattern recognition”, Computers & Industrial 
Engineering, Vol. 70, pp. 134-149, 2014. 
https://doi.org/10.1016/j.cie.2014.01.014 
[17] X. Yang, L. Tan, and L. He, “A robust least squares 
support vector machine for regression and 
classification with noise”, Neurocomputing, Vol. 
140, pp. 41-52, 2014. 
https://doi.org/10.1016/j.neucom.2014.03.037 
[18] S. Chowdhury, J.K. Sing, D.K. Basu, and M. 
Nasipuri, “Face recognition by generalized two-
dimensional FLD method and multi-class support 
vector machines”, Applied Soft Computing, Vol. 11, 
No. 7, pp. 4282-4292, 2011. 
https://doi.org/10.1016/j.asoc.2010.12.002 
[19] C. Cortes, and V. Vapnik, “Support-vector 
network”, Machine Learning, Vol. 20, No. 3, pp. 
273-297, 1995. 
https://doi.org/10.1007/BF00994018 
[20] J. Platt, “Fast training of support vector machines 
using sequential minimal optimization”, Advances 
in Kernel Methods-Support Vector Learning, MIT 
Press, Cambridge, pp. 185-208, 1999. 
[21] S. Knerr, L. Personnaz, and G. Dreyfus, “Single-
layer learning revisited: A stepwise procedure for 
building and training a neural network”, 
Neurocomputing, Vol. 68, pp. 41-50, 1990. 
https://doi.org/10.1007/978-3-642-76153-9_5 
[22] A.M. Martinez, and R. Benavente, “The AR face 
database”, CVC Technical Report. #24, June 1998. 
[23] A.M. Martinez, and A.C. Kak, “PCA versus LDA”, 
IEEE Transactions on Pattern Analysis and 
Machine Intelligence, Vol. 23, No. 2, pp. 228-233, 
2001. 
Probabilistic Weighted Induced Multi-Class Support Vector ... Informatica 44 (2020) 459–467 467 
https://doi.org/10.1109/34.908974 
[24] T. Sim, S. Baker, and M. Bsat, “The CMU pose, 
illumination, and expression (PIE) database”, Proc. 
of the Fifth IEEE International Conference on 
Automatic Face and Gesture Recognition, pp. 46-
51, 2002. 
[25] P.J. Phillips, H. Wechsler, J. Huang, and P.J. Rauss, 
“The FERET database and evaluation procedure for 
face-recognition algorithms”, Image and Vision 
Computing, Vol. 16, No. 5, pp. 295-306, 1998. 
https://doi.org/10.1016/S0262-8856(97)00070-X 
[26] Y. Zhang, Z. Yang, H. Lu, X. Zhou, P. Phillips, Q. 
Liu, and S. WANG, “Facial emotion recognition 
based on biorthogonal wavelet entropy, fuzzy 
support vector machine, and stratified cross 
validation”, Emotion-aware Mobile Computing, 
Vol. 4, pp. 8375 – 8385, 2016. 
https://doi.org/10.1109/ACCESS.2016.2628407 
[27] S. Wang, P. Phillips, Z. Dong, Y. Zhang, 
“Intelligent facial emotion recognition based on 
stationary wavelet entropy and Jaya algorithm”, 
Neurocomputing, Vol. 272, pp. 668-676, 2018. 
https://doi.org/10.1016/j.neucom.2017.08.015 
[28] A.A. Aburomman, M.B.I. Reaz, “A novel SVM-
kNN-PSO ensemble method for intrusion detection 
system”, Applied Soft Computing. Vol. 38, pp. 
360–372. 2016. 
https://doi.org/10.1016/j.asoc.2015.10.011 
[29] P. Huang, C. Chen, Z. Tang, and Z. Yang, “Feature 
extraction using local structure preserving 
discriminant analysis”, Neurocomputing, Vol. 140, 
pp. 104-113, 2014. 
https://doi.org/10.1016/j.neucom.2014.03.031 
[30] P. Huang, C. Chen, Z. Tang, and Z. Yang, 
“Discriminant similarity and variance preserving 
projection for feature extraction,” Neurocomputing, 
Vol. 139, pp. 180-188, 2014. 
https://doi.org/10.1016/j.neucom.2014.02.047 
[31] L. Shi, X. Wang, and Y. Shen, “Research on 3D 
face recognition method based on LBP and SVM” 
Optik, Vol. 220, pp., 2020. 
https://doi.org/10.1016/j.ijleo.2020.165157 
[32] L. Hu, J. Cui“Digital image recognition based on 
Fractional-order-PCA-SVM coupling algorithm” 
Measurement, Vol. 145, pp. 150 -159, 2019. 
https://doi.org/10.1016/j.measurement.2019.02.006 
[33] I. Dagher, F. Azar“Improving the SVM gender 
classification accuracy using clustering and 
incremental learning”, WileyExpert System, Vol. 
145, pp. 1 -17, 2019. 
https://doi.org/10.1111/exsy.12372 
[34] N.B. Kar, K.S. Babu, A.K. Sangaiah, S 
Bakshi“Face expression recognition system based 
on ripplet transform type II and least square 
SVM”Multimedia Tools Application, Vol. 78, 
pp.4789–4812, 2019. 
https://doi.org/10.1007/s11042-017-5485-0 
468 Informatica 44 (2020) 459–467 A. Dey et al.