https://doi.org/10.31449/inf.v48i12.6151 Informatica 48 (2024) 15–32 15 
Facial Sentiment Analysis Using Convolutional Neural Network and 
Fuzzy Systems 
Ahmed R. Kadhim
1
, Raidah S. Khudeyer
1
, Maytham Alabbas
2 
1 
Department of Computer Information System, College of Computer Science and Information Technology, University 
of Basrah, Basrah, Iraq 
2 
Department of Computer Science, College of Computer Science and Information Technology, University of Basrah, 
Basrah, Iraq 
E-mail: ahmed07727043448@gmail.com, raidah.khudayer@uobasrah.edu.iq, ma@uobasrah.edu.iq 
Keywords: fuzzy neural networks systems, convolutional neural networks (CNNs), facial expression recognition 
Received: May 5, 2024 
This study provides a detailed study of a Convolutional Neural Network (CNN) model optimized for facial 
expression recognition with Fuzzy logic using Fuzzy2DPooling and Fuzzy Neural Networks (FNN), and 
discusses data augmentation in model optimization. It highlights important roles. performance. First, the 
effectiveness of the models in classifying emotions from FER2013, RAB-DB, and CK+ datasets was 
evaluated by a 5-fold cross-validation method, which showed that the accuracy varied widely among 
different emotion classes and was affected by overfitting. It turned out to be easy. The integration of data 
augmentation techniques, including random rotation, translation, and inversion, significantly improved 
the model's generalization capabilities. This was evidenced by higher accuracy and more consistent loss 
curves observed across all folds. After augmentation, the model showed significant improvement, 
achieving average test accuracies of 98.95% on FER2013, 99.99% on RAF-DB, and 100% on CK+ across 
all folds. Despite these advances, challenges specific to certain classes of emotions remain, highlighting 
the need for continued model refinement. This study concludes that data augmentation is an important 
step in developing robust facial expression recognition models and has potential benefits for a variety of 
applications requiring accurate emotion recognition. 
Povzetek: V tej študiji so uporabljene in izboljšane konvolucijske nevronske mreže z mehko logiko, da se 
poveča njihova učinkovitost na področju analize obraznih čustev.
1 Introduction 
In recent years, the field of artificial intelligence has 
become one of the most important fields of life because 
of the role it plays in different fields, which includes 
many different areas, the most important of which is 
pattern recognition. Pattern recognition aims to make it 
possible to analyze and define different patterns in data, 
which are often complex, as it enables algorithms that 
work in the field of artificial intelligence to extract 
valuable information. Pattern recognition [1] represents 
an important part of modern artificial intelligence 
systems. It also attempts to identify patterns and create a 
simulation of the human brain, which contributes to the 
advancement of artificial intelligence and making the 
most of complex data, as this data can be sound, image, 
text, or even video. There are many reasons why pattern 
recognition is important, the most important of which is 
that it predicts the simplest parts of data that cannot be 
tracked by classifying unseen data. Pattern recognition 
can be divided into three distinct models: Statistical 
Pattern Recognition, Syntactic Pattern Recognition, and 
Neural Pattern Recognition [2]. Statistical Pattern 
Recognition this type of pattern recognition involves 
studying and identifying patterns in historical statistical 
data, learning from examples, and then collecting 
observations until the model is able to generalize to apply 
the observations to previously unseen data [3]. Syntactic 
Pattern Recognition [4] because it is based on simpler 
sub-patterns known as primitives, this concept is also 
known as structural pattern recognition. For instance, 
words are included in this category. The primitives' 
relationships are characterized as the pattern. For 
example, primitive words join to build sentences and 
messages. Neural Pattern Recognition artificial neural 
networks are used in this model [5]. After learning 
intricate nonlinear input-output relations, the networks 
adjust according to available data. Large parallel 
computing systems comprised of numerous fundamental 
processors and their connections are used in this concept. 
They are able to use sequential training procedures take 
in complex nonlinear input-output conversations, and 
then adapt themselves to correspond with the data. There 
are two different machine learning and pattern 
recognition algorithms: Supervised Algorithms and 
Unsupervised Algorithms [6]. Supervised Algorithms 
another name for supervised algorithms is classification. 
This algorithm employs a two-step process to identify 
patterns. The development and construction of the model 
are covered in the first step. The prediction of newer or 
unseen objects is covered in the second step. 
Unsupervised Algorithms a "group by" strategy is 
16   Informatica 48 (2024) 15–32                                                                                                                                   A. R. Kadhim 
  
preferred by unsupervised algorithms. In order to provide 
predictions, these algorithms look for patterns in the data 
and classify them based on similarity, such as 
dimensions. Pattern recognition has a wide range of 
applications: Image Recognition, Text Pattern 
Recognition, Fingerprint Scanning, Seismic Activity 
Analysis, Audio and Voice Recognition, social media, 
Cybersecurity and many others [7]. In this paper will 
focus on Image Recognition.  Today, security and 
surveillance systems from several industries use image 
recognition tools. These gadgets record and keep an eye 
on several video streams simultaneously. It helps in 
identifying possible attackers. Business centers, 
information technologies companies, and production 
facilities use the same image recognition technology as 
face ID systems. Facial Expression Recognition (FER) 
system presents another corollary of the same 
application. Here, human emotions of an audience are 
analyzed and detected in real-time through the 
application of pattern recognition to video and image data 
[8]. Sentiment analysis, intent, and mood recognition are 
the main goals of these systems. Deep learning 
algorithms are thus employed to identify patterns in 
people's body language and facial emotions. 
Organizations can utilize this data to improve client 
experience by fine-tuning their marketing initiatives. 
Facial Expression Recognition is a technology used for 
analyzing sentiments by different sources, such as images 
and videos. It is a member of the family of technologies 
known as "affective computing," which draws 
extensively from Artificial Intelligence technologies [9]. 
Affective computing is an intersecting area of research 
on computers' capacity to recognize and interpret human 
emotions and affective states. Facial expressions are 
kinds of non-verbal expression, giving suggests for 
human emotions. Psychology (Ekman and Friesen 2003; 
Lang et al. 1993) and specialists in the field of human-
computer interaction (Cowie et al. 2001; Abdat et al. 
2011) have invested decades researching how to 
comprehend these indications of emotion. The 
widespread adoption of cameras and the most recent 
developments in machine learning, biometric analysis, 
and pattern recognition have all contributed significantly 
to the FER technology's development. 
Human interactions are a tapestry woven with the threads 
of spoken words, physical gestures, and a rich spectrum of 
facial expressions. As we navigate through our daily lives, 
our faces un- wittingly broadcast a myriad of emotions, 
communicating non-verbally with a complexity that 
language alone cannot capture. In this intricate dance of 
social interaction, technology has the potential to become 
a transformative partner, unlocking a deeper understanding 
of human sentiment. By harnessing advanced artificial 
intelligence, particularly in the realm of facial emotion 
recognition, we stand on the brink of an era where 
machines can not only ’see’ but can also ’comprehend’ the 
silent language of our emotions.  This leap forward offers 
profound implications for personalized communication, 
tailored services, and empathetic machine-human 
interfaces. However, this technological pursuit is not 
without its challenges [10]. The endeavor to translate the 
transient and often ambiguous canvas of the human face 
into a digital lexicon of emotions entails a nuanced 
recognition of the interplay between facial muscle 
movements and their corresponding emotional states. It 
requires an algorithmic sensitivity to the context and 
cultural underpinnings that shape emotional expression. As 
researchers and engineers strive to bridge this gap, they 
grapple with the complexities of creating systems robust 
enough to interpret the subtle signals of our emotive 
expressions in real-time and in the uncontrolled, diverse 
settings of our natural environments. The journey from the 
theoretical understanding of facial expressions to practical, 
real-world application is the specific focus of the proposed 
model, aiming to elevate the capability of machines to 
interpret human emotions with unprecedented accuracy 
and sensitivity. 
2 Related work 
Facial Expression Recognition serve as a universal medium 
for people to convey emotions. This universality has 
spurred interest in various sectors like robotics, healthcare, 
and driving assistance systems, where facial expression 
analysis tools, often based on image processing, are being 
actively developed. Table 1 shows related work on 
processing three datasets: FER2013, RAF-DB and CK + 
datasets. 
   In academic research, the FER2013 dataset has been a 
focal point for several sentiment analysis studies.  One 
study [11] utilized Random Search algorithm, initially 
achieving a 72.16% accuracy rate using the FER2013 
dataset. Another research [12] involved designing VGGNet 
architecture, fine-tunes its hyper parameters model for 
sentiment analysis, reaching an accuracy of 73.28%. 
Additionally, a different study [13] achieved a 72.81% 
accuracy rate by training a Fuzzy optimized CNN-RNN 
architecture with the FER2013 dataset. This particular 
study method for facial expression recognition achieved a 
certain improvement in the recognition effect of different 
facial expression datasets compared to current popular 
algorithms. [14] in sentiment analysis, further highlighting 
the potential of these technologies in various application 
areas using extraction of multi-layer representation 
information using asymmetric region local binary pattern 
(AR-LBP) and divided local directional pattern (DLDP) 
which achieved accuracy 91%.  In the realm of facial 
expression recognition using deep learning, several studies 
have made significant strides using the FER2013 dataset. A 
study referenced in [15] developed an "ConvNet," utilizes a 
four-layer convolutional neural network (CNN) 
architecture for facial emotion recognition that, after being 
trained for minimal number of epochs on the FER2013 
dataset, achieves validation accuracy ranging from 65% to 
70% when considering different datasets used for 
experiments, outperforming other existing models. 
   Another research effort [16] extensively explored various 
CNN models, pre-trained frameworks, and training 
methodologies, offering a comparative analysis with an 
improvement of up to 6% and a total accuracy of up to 70%. 
Further advancing in this field, another study [17] 
Facial Sentiment Analysis Using Convolutional Neural Network…                                            Informatica 48 (2024) 15–32       17                                                                                                                             
conducted a comprehensive evaluation of thirteen different 
vision transformer (ViT) models for facial emotion 
recognition using three datasets: RAF-DB, FER2013, and a 
new balanced FER2013 dataset.  However, the accuracy 
achieved was 74.20%.  A different approach was taken in 
[18], where the focus was on RAF-DB dataset using Self-
Cure Network (SCN) effectively suppresses uncertainties in 
large-scale FER and prevents over-fitting of uncertain facial 
images, achieving 88.14% accuracy on RAF-DB dataset. 
   In recent studies, various approaches have been employed 
to enhance face mask detection using deep learning 
techniques. ResNet18 [19] achieved high accuracy in 
detecting covered and uncovered faces through a dual-stage 
combination of neural networks featuring convolutional 
architecture. ResNet18 model outperformed all other models 
with an 86.02% test accuracy on the RAF-DB dataset. 
   Shan Li and colleagues [20] undertook a study using 
Emotion-Conditional Adaption Network (ECAN), a deep 
learning framework, to learn domain-invariant and 
discriminative feature representations. The ECAN aimed to 
match both the marginal and conditional distributions across 
domains simultaneously. Jiawei Shi and Songhao Zhu 
researchers [21] focused on leveraging Convolutional 
Neural Networks (CNNs), deep learning, and image super-
resolution techniques. Specifically, they developed a novel 
architecture called Amending Representation Module 
(ARM) to enhance facial expression representation. Despite 
challenges with the datasets used for training and 
evaluation, the ARM Net demonstrated a promising 
accuracy rate of 90.42%.  
   Delian Ruan and her group [22] combined deep learning 
and machine vision using FDRL method consists of a 
backbone network, a Feature Decomposition Network 
(FDN), a Feature Reconstruction Network (FRN), and an 
Expression Prediction Network (EPN). Their approach 
indicated superior accuracy in their research. 
   JI-HAE KIM and team [23] introduced "the geometric 
feature-based network learns the coordinate change of 
action units (AUs) landmarks", which are muscles that 
move mainly when making facial expressions, which 
achieved an impressive validation accuracy of 96.46% on 
CK+ dataset. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Sonali Sawardekar and associates [24] explored 
automated learning using Efficient Local Binary Pattern 
(LBP) images and Convolutional Neural Network (CNN) 
for classification. 
   YU MIAO and her team [25] worked with 
convolutional neural network model called MobileNet for 
both offline and real-time recognition, a renowned deep 
learning method, and a CNN, achieving validation 
accuracy 96.92% on the 6-class CK+ dataset.  
   Serenada Salma Shafira and her team [26] developed a 
Face Mask Detection System using the feature extraction 
stage includes the use of Histogram of Oriented Gradient 
(HOG) and Local Binary Pattern (LBP) features. This 
system was notable for its ability to comparison of HOG 
and LBP feature extraction methods for facial expression 
identification. The accuracy achieved by the Extreme 
Learning Machine (ELM) classifier using the Histogram 
of Oriented Gradient (HOG) feature is 63.86% for the 
FER2013 dataset and 99.79% for the CK+ dataset. In [27], 
the authors utilized MobileNet and ResNet-18 algorithm 
to achieve the highest classification accuracy on the RAF-
DB and FER2013 datasets. The accuracy results for the 
proposed method were 90.81% for RAF, and 77.83% for 
FER2013. 
   In [28], the authors proposed trained four models with 
different architectures using the FER-2013 dataset, 
including a shallow convolutional neural network, 
ResNet50, VGG16 with weights from ImageNet, and 
VGG16 with weights from VGGFaceNet to optimize the 
hyper parameters of ensemble model. The paper suggests 
that in the future, researchers should consider training 
models with different structures but similar accuracy 
scores for ensemble applications.  
   In [29], the study introduces the Rayleigh loss concept, 
which aims to extract a discriminative representation by 
minimizing within-class distances and maximizing inter-
class distances simultaneously. This loss function has a 
Euclidean form and can be easily optimized with SGD 
and combined with other forms. The authors also use a 
weighted Softmax loss, which measures the uncertainty of 
a given sample by considering its distance to the class 
center.
18   Informatica 48 (2024) 15–32                                                                                                                                   A. R. Kadhim 
  
Table 1: Summarization of the related works. 
 
 
Ref. 
 
Year 
 
Model 
 
Dataset 
 
Contributions 
 
Limitations 
 
Accur 
acy % 
[24] 2018 LBP and CNN CK+ • Efficient Local Binary Pattern 
(LBP) images and 
Convolutional Neural Network 
(CNN) are used for facial 
expression recognition, which 
has achieved great success in 
the field of image processing 
and recognition. 
• The evaluation of the algorithm is based 
on a single dataset (Cohn-Kanade), 
which may not fully represent the 
diversity of facial expressions in real-
world scenarios. 
90.00 
[20] 2019 Emotion-
Conditional 
Adaption Network 
(ECAN) 
RAF-DB • Ability to bridge the 
discrepancy of both marginal 
and conditional distribution 
between source and target 
domains, improving cross-
database facial expression 
recognition. 
• Since the provided sources do not 
mention any weaknesses or limitations 
of the ECAN method, it is not possible 
to provide any specific weaknesses 
based on the information given. 
89.69 
[23] 2018 The geometric 
feature-based 
network learns the 
coordinate change 
of action units 
(AUs) 
CK+ 
• The appearance feature-based 
network extracts holistic 
features of the face using 
preprocessed LBP images, 
which are robust in the facial 
expression recognition system. 
• It is important to consider that the 
algorithm's performance may vary 
depending on the dataset used for 
evaluation and the specific facial 
expressions being recognized. 
96.46 
[25] 2019 MobileNet CK+ 
• The FER process consists of 
three stages: preprocessing, 
face detection, and emotion 
classification, which allows 
for a systematic and efficient 
approach to recognizing facial 
expressions. 
• The Hear cascade classifiers used for 
face detection may not be robust 
enough to accurately detect faces in all 
lighting conditions or with occlusions. 
96.92 
[26] 2019 HOG and LBP CK+ 
• The feature extraction stage 
incorporates the Histogram of 
Oriented Gradient (HOG) and 
Local Binary Pattern (LBP) 
features, which are widely 
used and have been shown to 
provide good results in facial 
expression recognition. 
• The study only compares two feature 
extraction methods, HOG and LBP, and 
does not explore other potential 
methods that could potentially improve 
accuracy. 
99.79 
[18] 2020 Self-Cure Network 
(SCN) 
RAF-DB • Self-Cure Network (SCN) 
effectively suppresses 
uncertainties in large-scale 
Facial Expression Recognition 
(FER) and prevents over-
fitting of uncertain facial 
images. 
• The evaluation of the proposed method 
is primarily focused on synthetic FER 
datasets and the authors' collected Web 
Emotion dataset, which may limit the 
generalizability of the results. The 
evaluation of the proposed method is 
primarily focused on synthetic FER 
datasets and the authors' collected Web 
Emotion dataset, which may limit the 
generalizability of the results. 
88.14 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Facial Sentiment Analysis Using Convolutional Neural Network…                                            Informatica 48 (2024) 15–32       19                                                                                                                             
 
[29] 2020 Rayleigh loss RAF-BD • The Rayleigh loss aims to 
learn discriminative features in 
FER. 
• Like many traditional loss functions, 
the Rayleigh loss is sensitive to outliers. 
Outliers in the dataset can 
disproportionately affect the model's 
training, leading to suboptimal 
performance. 
87.97 
 
 
 
 
 
 
 
 
[11] 2021 Random Search 
algorithm 
FER2013 • Method of optimizing hyper 
parameters of a CNN for facial 
emotion recognition. 
 
• Optimization was done on a small 
number of hyper parameters.  
 
72.16 
[12] 2021 VGGNet  Cosine 
Annealing 
FER2013 • Achieve highest single-
network accuracy on FER2013 
without extra training data.  
• Does not explore the use of auxiliary 
training data to improve the model's 
performance on FER2013, which may 
limit  the generalizability of the 
findings. 
73.28 
[13] 2021 Fuzzy optimized 
CNN-RNN 
FER2013 • Traditional facial expression 
recognition methods are not 
intelligent enough. 
• Applied affine transformation to 
increase the number of datasets. 
72.81  
[21] 2021 Amending 
Representation 
Module (ARM) 
RAF-DB • The ARM module 
outperforms current state-of-
the-art methods in facial 
expression recognition, 
achieving high validation 
accuracies on benchmark 
datasets such as RAF-DB, 
Affect-Net, and SFEW. 
• There is no analysis or discussion on the 
computational complexity or efficiency 
of the proposed method. 
90.42 
[22] 2021 Feature 
Decomposition 
and 
Reconstruction 
Learning (FDRL) 
RAF-DB • The FDRL method effectively 
models both the shared 
information across different 
expressions and the unique 
information for each 
expression, leading to 
improved recognition 
accuracy. 
• The paper focuses more on highlighting 
the benefits and superior performance 
of the FDRL method compared to other 
state-of-the-art methods, rather than 
discussing its weaknesses. 
89.47  
[14] 2022  AR-LBP-DLDP FER2013 • The algorithm utilizes a multi-
feature fusion approach, 
combining the local features 
extracted using asymmetric 
region local binary pattern 
(AR-LBP) and divided local 
directional pattern (DLDP) 
with global features extracted 
by a convolutional neural 
network (CNN) . 
• Without further information or analysis, 
it is difficult to determine any potential 
weaknesses in the proposed method. 
91 
[15] 2022 ConvNet FER2013 • The model's training accuracy 
was achieved in a short 
number of epochs, indicating 
its efficiency and 
effectiveness. 
• The identification rate for classifying 
disgust and fear was relatively low at 
45% and 41% respectively, suggesting 
room for improvement in recognizing 
these specific emotions. 
70 
 
 
 
 
 
 
 
 
20   Informatica 48 (2024) 15–32                                                                                                                                   A. R. Kadhim 
  
[16] 2022 DCNN FER2013 • The proposed hybrid model 
for Facial Expression 
Recognition (FER) combines a 
Deep Convolutional Neural 
Network (DCNN) and Haar 
Cascade deep learning 
architectures, which enhances 
filtering depth and facial 
feature extraction. 
• The model showed reduced 
classification accuracy for the "disgust" 
and "fear" emotions, which may be 
attributed to the limited number of 
training set samples for these classes. 
70 
 
 
 
 
 
 
 
 
[17] 2023 ViT models FER2013 • present a new, balanced 
dataset called 
FER2013balanced, which 
addresses the imbalance 
problem in the FER2013 
dataset and serves as a reliable 
baseline for FER research. 
• The evaluation of ViT models on 
FER2013balanced dataset does not 
consider the potential biases introduced 
during the data augmentation process. 
74.20 
[19] 2023 ResNet18 RAF-DB • The ResNet18 model 
outperformed all other models 
with an 86.02% test accuracy 
on the RAF-DB dataset 
• The ResNet18 model outperformed 
other models on the RAF-DB dataset 
with an 86.02% test accuracy, but the 
specific performance on individual 
emotions is not provided. 
86.02 
[27] 2023 MobileNet and 
ResNet-18 
RAF-DB • Aimed to increase FER 
accuracy by minimizing intra-
class distance and maximizing 
inter-class distance. 
• Similar facial expressions and 
variations not related to facial 
expressions make performance 
improvement difficult. 
90.81 
[28] 2023 Ensemble model FER2013 • Examined different decision-
making processes of shallow 
and deep networks. 
• Deeper models lose some information. 71.84 
Our method  Fuzzy 
Optimized 
CNNs 
FER2013 • Improved Handling of 
Uncertainty. 
• Enhanced Interpretability. 
• Better Handling of Noisy 
Data. 
• Improved Classification 
Accuracy. 
• Reduction in Overfitting. 
• Complexity in Design and 
Implementation. 
• Limited Standardization and 
Framework Support. 
• Limited Generalization to All Types of 
Data. 
98.95 
Our method  Fuzzy 
Optimized 
CNNs 
RAF-DB • Improved Handling of 
Uncertainty. 
• Enhanced Interpretability. 
• Better Handling of Noisy 
Data. 
• Improved Classification 
Accuracy. 
• Reduction in Overfitting. 
• Complexity in Design and 
Implementation. 
• Limited Standardization and 
Framework Support. 
• Limited Generalization to All Types of 
Data. 
99.99 
Our method  Fuzzy 
Optimized 
CNNs 
CK+ • Improved Handling of 
Uncertainty. 
• Enhanced Interpretability. 
• Better Handling of Noisy 
Data. 
• Improved Classification 
Accuracy. 
• Reduction in Overfitting. 
• Complexity in Design and 
Implementation. 
• Limited Standardization and 
Framework Support. 
• Limited Generalization to All Types of 
Data. 
100 
Facial Sentiment Analysis Using Convolutional Neural Network…                                            Informatica 48 (2024) 15–32       21                                                                                                                             
Current state-of-the-art methods in CNN-based image 
classification primarily rely on crisp and deterministic 
approaches, which excel with clean and well-defined data. 
However, these methods struggle with uncertainty and 
ambiguity, leading to potential misclassifications in 
scenarios where the data is not clear-cut. In contrast, 
Fuzzy Optimized CNNs integrate fuzzy logic to handle 
uncertainty and ambiguity more effectively. By utilizing 
fuzzy rules and membership functions to model vagueness 
in images, they improve robustness and generalization, 
filling a critical gap where current methods often falter. 
   Another significant shortcoming of current SOTA 
methods is their limited interpretability. These models are 
frequently perceived as "black boxes," making it difficult 
for users to understand the decision-making processes. 
Fuzzy Optimized CNNs, on the other hand, provide rule-
based explanations for their decisions. This approach 
enhances interpretability, allowing users to see how fuzzy 
rules influence classifications and making the model's 
behavior more transparent. This improvement is 
especially important in applications requiring high levels 
of trust and understanding, such as medical diagnosis. 
   In terms of robustness to noisy data, existing CNN 
methods often require extensive data preprocessing to 
handle noise effectively. Their performance can degrade 
significantly when faced with noisy or corrupted data. 
Fuzzy Optimized CNNs are inherently robust to noise and 
imperfections due to the nature of fuzzy logic, which 
reduces the need for complex preprocessing. This 
capability ensures more reliable performance in real-
world conditions where data is frequently noisy. 
   Classification accuracy is another area where Fuzzy 
Optimized CNNs can provide substantial benefits. While 
current methods achieve high accuracy with large, clean, 
and balanced datasets, their performance tends to suffer 
with limited, imbalanced, or noisy datasets. By 
incorporating fuzzy rules to refine decision boundaries, 
Fuzzy Optimized CNNs can achieve better performance 
in these challenging scenarios, leveraging the flexibility of 
fuzzy logic to enhance overall accuracy. 
   Flexibility and adaptability are additional strengths of 
Fuzzy Optimized CNNs. Current SOTA methods have 
fixed architectures once trained and require significant 
retraining to adapt to new data or conditions. In contrast, 
Fuzzy Optimized CNNs offer flexible rule-based 
adjustments without the need for extensive retraining. 
This flexibility allows the model to adapt more easily to 
new data or changing conditions, addressing a key 
limitation of traditional CNN approaches. 
   However, it is important to note that while current 
SOTA methods are highly optimized for scalability and 
can handle very large datasets efficiently, they often 
require significant computational resources. The 
integration of fuzzy logic in Fuzzy Optimized CNNs adds 
complexity, which may pose scalability challenges with 
very large datasets. Despite this potential drawback, the 
ability of Fuzzy Optimized CNNs to handle uncertainty, 
noise, and ambiguity more effectively justifies the added 
complexity, particularly in specific applications where 
these factors are prevalent. 
   In conclusion, while current state-of-the-art methods in 
CNN-based image classification are powerful and 
efficient, particularly with clean and balanced datasets, 
they exhibit significant limitations in handling 
uncertainty, noise, and providing interpretability. Fuzzy 
Optimized CNNs address these gaps by integrating fuzzy 
logic, offering enhanced robustness, accuracy, and 
transparency. Despite challenges in complexity and 
scalability, the benefits in real-world applications, where 
data is often noisy and ambiguous, make Fuzzy Optimized 
CNNs a valuable addition to the field. 
3 Tools 
In this section, we delve into the essential basis critical for 
comprehending the center strategies valuable to our 
proposed approach: Convolutional Neural Networks 
(CNNs) and Fuzzy Logic. By offering an introductory 
overview, we purpose to clarify the conceptual 
underpinnings and operational ideas at the back of those 
methodologies, paving the manner for a complete 
information in their software inside our framework. 
Through this exploration, we lay the basis for a nuanced 
dialogue on the combination and synergy of CNNs and 
Fuzzy Logic, propelling ahead the discourse on 
progressive answers in our domain. 
3.1 Convolutional Neural Network 
The CNN structure is composed of various components, 
including convolution layers, pooling layers, and fully 
connected layers. A common design involves multiple sets 
of convolution layers and a pooling layer, repeated 
throughout the architecture and training process. 
A Convolutional Neural Network (CNN) is constructed by 
stacking multiple building blocks, including convolution 
layers, pooling layers (such as max pooling), and fully 
connected (FC) layers. The model's effectiveness with 
specific kernels and weights is evaluated using a loss 
function during forward propagation on a training dataset. 
Learnable parameters, such as kernels and weights, are 
then adjusted based on the loss value using 
backpropagation with the gradient descent optimization 
algorithm [30]. Here is a breakdown of the main CNN 
layers: 
3.2 Input layer 
The raw input data is represented by this layer, which is 
usually an image. This layer treats every pixel in the image 
as a node, and its depth relates to the number of color 
channels (three in the case of RGB images).  
 
3.3 Convolutional layers 
The fundamental component of a CNN is the 
convolutional layer. It processes the input data using 
convolutional operations, which include extracting local 
patterns and features by swiping a tiny filter—also 
referred to as a kernel—across the input. Spatial 
22   Informatica 48 (2024) 15–32                                                                                                                                   A. R. Kadhim 
  
hierarchies of features are captured with the aid of the 
convolution procedure. 
3.4 Pooling (subsampling) layer 
By reducing the spatial dimensions of the input volume, 
pooling layers help to improve the learned features' 
invariance to changes in scale and orientation while also 
lowering the computational complexity of the network. A 
typical pooling method called "max pooling" keeps the 
maximum value in a given small area. 
3.5 Flatten layer 
The multi-dimensional output of the preceding layers is 
transformed into a one-dimensional vector using this 
layer. In order to prepare the data for input into a fully 
linked layer, it "flattens" it. 
3.6 Fully connected (dense) layer 
Every neuron in one layer communicates with every other 
layer's neuron through a completely linked layer. These 
layers, which are usually located at the end of the network, 
combine the characteristics that the preceding layers have 
learnt to perform tasks related to regression or 
classification. 
Among the deep neural network classes, convolutional 
neural networks (CNNs) (Figure 1) perform at computer 
vision applications like identifying objects, image 
segmentation, and image recognition. 
 
         Figure 1: Convolution Neural Network architecture 
 
3.7 Fuzzy logic 
Things that are unclear or ambiguous are referred to as 
fuzzy. Because we frequently find ourselves in situations 
in the actual world when we are unable to decide whether 
a condition is true or untrue, fuzzy logic (Figure 2) offers 
incredibly useful thinking flexibility [31]. We may then 
take into account the uncertainties and errors of any given 
circumstance.  
 
 
Figure 2: Fuzzy Logic Framework. 
 
Fuzzy Logic is a type of many-valued logic wherein, as 
opposed to merely the conventional values of true or 
false, the truth values of variables can be any real integer 
between 0 and 1. It is a mathematical technique for 
modeling vagueness and uncertainty in decision-making 
and is used to cope with imprecise or unclear 
information. Numerous fields, including artificial 
intelligence, image processing, natural language 
processing, control systems, and medical diagnostics, 
employ fuzzy logic.  There are four components to its 
architecture: 
 
3.8 Rule base 
Based on linguistic data, the experts have created a set of 
rules and IF-THEN conditions that govern the decision-
making mechanism. Fuzzy controllers may be designed 
and tuned using a variety of efficient techniques thanks to 
recent advances in fuzzy theory. The majority of these 
advancements result in less ambiguous rules. 
3.9 Fuzzification 
This process turns inputs, such as crisp numbers, into 
fuzzy sets. Crisp inputs are essentially the precise inputs—
temperature, pressure, rpms, etc.—that are detected by 
sensors and sent to the control system for processing. 
3.10 Inference engine 
Selects which rules should be executed based on the input 
field by calculating the degree to which the current fuzzy 
input matches each rule. The control actions are then 
created by combining the fired rules. 
3.11 Defuzzification 
It is employed to transform the inference engine's fuzzy 
sets into a crisp value. To lower the error, the most 
appropriate defuzzification technique is used with a 
particular expert system. 
 
 
 
 
 
Facial Sentiment Analysis Using Convolutional Neural Network…                                            Informatica 48 (2024) 15–32       23                                                                                                                             
3.12 Fuzzy Neural Network’s 
Artificial intelligence that blends aspects of neural 
networks and fuzzy logic is referred to Fuzzy Neural 
Networks (FNN) [32]. Systems that employ both of these 
technologies together can be more adaptable and 
effective than those that do so alone. Many applications, 
such as data mining, image recognition, and control 
systems, have made use of FNN systems. Fuzzy logic 
works well with inadequate or inaccurate input, whereas 
neural networks excel at identifying patterns. Therefore, 
FNN systems are capable of handling both the 
unstructured data that fuzzy logic excels at processing 
and the structured data that neural networks excel at 
handling. Artificial intelligence that incorporates aspects 
of neural networks with fuzzy logic is known as FNN 
systems. FNN systems have several drawbacks in 
addition to their many benefits. Designing and training 
FNN systems can be challenging, which is one of its 
limitations. This is due to the fact that fuzzy logic 
inference rules and neural network training methods are 
needed for FNN systems. It might be difficult to find the 
ideal ratio of rules to algorithms. The primary contrast 
between FNN and conventional artificial neural network 
(ANN) lies in the fuzzy inference layer. In ANN, the 
multiplication of weights occurred, followed by the 
aggregation of the results. Conversely, FNN associates 
input values with membership functions, while the fuzzy 
rule amalgamates the membership values. Ultimately, the 
values in the concluding layer of the FNN encapsulate the 
values in the fuzzy inference layer. Convolutional layers 
typically acquire the ability to derive distinctive features 
from the input data provided during the training process. 
The feature information derived from the convolutional 
layer is aggregated by the fully connected layers. In 
contradistinction to the fuzzy neural network, the pixels 
within the feature map exhibit crisp values instead of 
fuzzy values. 
Our Fuzzy optimized CNN involved the integration of a 
convolutional neural network with a fuzzy neural 
network, in which the FNN summarizes the feature facts 
from each fuzzy maps. The maps graded using fuzzy sets 
in the membership function are called fuzzy maps. M 
fuzzy maps, where M is the number of fuzzy sets in the 
membership function, will be produced for each feature 
map. As an example, let us consider three fuzzy sets, M 
= 3 i.e. our membership function, "Negative, Zero, and 
Positive"; there are k = 100 final convolutional feature 
maps, and each map is a 3 7 3 image. In other words, the 
architecture we have has k x M = 300 fuzzy maps. But 
the excessive number of input units causes enormous 
computations. To summarize the feature information, we 
thus employ a fuzzy neural network with semi-connected 
layers. Stated differently, rather than forming entire 
feature maps, each input of the FNN becomes a feature 
map. Next, there are k separate inference engines, all of 
which have the same fuzzy rule. The combination of 
fuzzy inference becomes too complex for traditional 
FNN to compute when the input number is too big. To 
cut down on computation, MISO FNN allows you to split 
the input unit and combine the output with each inference 
result. Let's say the input units were divided into k sets. 
That is 𝑥 𝑖 
𝑓 , f  = 1,2, …. , k , i = 1,2, … , (n/k) . The fuzzy 
rule will be:  
            𝑅 𝑓 ,I
  : IF  𝑥 1
𝑓 is 𝐴 1
𝐼 and … 𝑥 𝑛 𝑓 is 𝐴 𝑛 𝐼 
          THEN 𝑦 1
 is 𝑤 1
𝑓 ,𝐼 and … 𝑦 𝑚 is 𝑤 𝑚 𝑓 ,𝐼 
where 𝐴 1
𝐼 is the fuzzy set using the 𝐼 th fuzzy rule. The 
output from fuzzy inference needs to be defuzzified to 
crisp values. For a conventional formula for 
defuzzification: 
 
𝑦 𝑖 = 
∑ 𝑤 𝑗 𝐼 ℎ
𝐼 =1
(∏ ϻ
𝐴 𝑖 𝐼 𝑛 𝑖 =1
( 𝑥 𝑖 ) )
∑ (∏ ϻ
𝐴 𝑖 𝐼 𝑛 𝑖 =1
( 𝑥 𝑖 ) )
ℎ
𝐼 =1
 
where the membership function selected by the fuzzy 
rule is denoted b ϻ𝐴 𝑖 𝐼 ( 𝑥 𝑖 ) . 
Next, each outcome of the fuzzy inference engines is 
compiled by the defuzzifier layer. Here, we suggest a 
new architecture that combines the fuzzy engine from 
FNN with the features from CNN. The benefits of both 
network architectures are combined in this method. 
Assume that the feature maps with the final 
convolutional layer have the following shapes: h, x, 
and w. There are w fuzzy sets in the membership 
function and k maps total. The fuzzy inference 
parameters would be nk x h x w if the maps were fed 
directly into FNN. It is impossible to calculate such a 
huge number for the fuzzy inference. It is impossible 
to calculate such a huge number for the fuzzy 
inference. Convolutional feature maps and a fuzzifier 
layer were combined using a modified FNN. Although 
normalizing the fuzzy inference layer's output is 
conventionally advised, we decide against doing so for 
this layer's formula. 
4 Proposed method 
Although convolutional neural networks (CNNs) and 
fuzzy logic are two different ideas, they may be 
coupled in some situations to improve a system's 
capabilities, especially when handling uncertainty and 
imprecision. The core of our Fuzzy optimized CNN 
model (Figure 3) consists of multiple layers, each 
serving a specific purpose in the feature extraction 
process. The initial layers are convolutional layers 
equipped with filters that perform edge detection and 
capture basic patterns within the images. As we 
progress deeper into the network, the convolutional 
layers become more sophisticated, capable of 
identifying complex structures and features that are 
significant for distinguishing between different facial 
expressions. To enhance the model’s capability to 
generalize and to incorporate the fuzziness inherent in 
human emotion classification, we introduce fuzzy 
logic into the pooling layers of the network. Fuzzy 
pooling layers replace traditional max pooling, 
allowing the model to retain more information by 
considering the degree of membership of pixels in the 
pooled feature maps, which is essential in capturing the 
24   Informatica 48 (2024) 15–32                                                                                                                                   A. R. Kadhim 
  
nuances of facial expressions. 
The proposed Fuzzy Optimized CNN model introduces 
novel enhancements by integrating fuzzy logic into the 
CNN architecture. This integration aims to improve 
feature extraction and classification accuracy, 
particularly in scenarios involving uncertainty, 
ambiguity, and noise. The key components of this 
methodology include the use of fuzzy2Dpooling 
instead of traditional pooling layers and the 
replacement of fully connected (FC) layers with a 
Fuzzy Neural Network. Additionally, data 
augmentation is used to enhance the training dataset, 
which is crucial for improving the model's 
generalization capability. The primary goal of using 
data augmentation was to increase the variability of the 
training data. By exposing the model to a wider range 
of image transformations, we aimed to enhance its 
ability to learn robust features and reduce the risk of 
overfitting. This process resulted in a significantly 
larger and more diverse training dataset, which is 
crucial for improving the overall performance and 
generalization of the Fuzzy Optimized CNN model. 
The initial layers of the Fuzzy Optimized CNN model 
are standard convolutional layers that extract features 
from the input images. These layers apply convolution 
operations using learned filters to detect various 
patterns and features within the images, such as edges, 
textures, and more complex structures. Instead of using 
traditional max-pooling or average-pooling layers, the 
proposed model employs fuzzy2Dpooling. This 
pooling method enhances feature extraction by 
considering the degree of membership of features 
within fuzzy sets. Fuzzy2Dpooling improves the 
model's ability to retain significant features and reduce 
the loss of critical information, which is a common 
limitation of traditional pooling methods. The Fuzzy 
Neural Network (FNN) enhances the model's ability to 
classify images by incorporating human-like reasoning 
and handling uncertainty more effectively than 
traditional FC layers. 
The architecture further includes batch normalization 
layers that standardize the inputs to a layer, 
accelerating the training process and improving the 
overall stability of the neural network. Following the 
convolutional and pooling layers, we flatten the feature 
maps to create a single long feature vector, which is 
then fed into a series of FNN layers which summarize 
the details of the features. Put differently, rather of 
creating whole feature maps, each input of the FNN 
becomes a feature map. Here, we suggest a novel 
architecture that combines the fuzzy engine from FNN 
with the features from CNN. Culminating in a softmax 
output layer that classifies the images into one of the 
seven emotion categories defined in the dataset. 
This Fuzzy optimized CNN model is not static; it is 
iteratively refined through a training process that 
employs K-Fold Cross-Validation, ensuring that the 
model is not overly fitted to a particular subset of the 
data. Data augmentation techniques, such as rotation 
and flipping, are applied to create variations in the 
dataset, further aiding the robustness of the model. The 
output of this architecture is a comprehensive 
representation of the data, with the capability to 
accurately classify facial expressions into discrete 
emotion classes. The model’s performance is 
meticulously evaluated using a variety of metrics, 
including accuracy, loss, and a confusion matrix, 
providing a detailed account of the model’s strengths 
and areas for improvement. Through this structured 
approach, we aim to develop a Fuzzy optimized CNN 
model that not only performs well on the dataset but 
also generalizes to new, unseen data with high 
reliability.  
 
 
 
Figure 3: Our proposed Fuzzy Optimized CNN 
framework. 
 
The flowchart (Figure 4) delineates a structured 
approach to constructing a CNN with the integration of 
fuzzy logic. The methodology commences with the 
commencement stage, which leads to the preparation of 
input data, presumably a collection of facial images for 
the model. This preparation involves ensuring that the 
data is sufficiently preprocessed to meet the 
requirements of the subsequent stages. If the data  
is not adequately preprocessed, the flow reverts to 
continue with the preparation process. 
Once the data is confirmed to be sufficiently 
preprocessed, the procedure advances to the K-Fold 
Cross-Validation setup, with the number of folds 
specified as five.  This is a validation technique used to 
assess the model’s ability to generalize to an independent 
dataset and involves partitioning the data into k distinct 
subsets. Parallel to this, there is an application of data 
augmentation techniques, which serve to artificially 
expand the dataset by generating new, varied data points. 
This is essential for improving the robustness and 
performance of the CNN model. 
Following the data augmentation, there is a checkpoint 
to verify if the model has been compiled successfully. If 
the compilation is unsuccessful, the model requires 
adjustments or recompilation. Conversely, if successful, 
the process transitions to the training phase where the 
model is trained using the prepared and augmented data. 
Training the model is followed by an evaluation of its 
performance to determine if the model’s accuracy and 
general behavior are satisfactory. If the model’s 
performance is unsatisfactory, it necessitates a return to 
the training phase for further refinement. If the model is 
deemed satisfactory, the final stages involve calculating 
Facial Sentiment Analysis Using Convolutional Neural Network…                                            Informatica 48 (2024) 15–32       25                                                                                                                             
the average test accuracy by evaluating the model on test 
data, plotting learning and validation curves for visual 
performance assessment, and generating a confusion 
matrix and classification report which provide insights 
into the model’s predictive capabilities. 
 
 
Figure 4: Flowchart of the current work. 
 
 
 
 
5 Experiment results 
5.1   Datasets 
The Facial Expression Recognition 2013 (FER2013) 
dataset comprises seven distinct emotion categories: anger, 
disgust, fear, happiness, sadness, surprise, and a neutral 
state. In contrast, the CohnKanade dataset includes an 
additional category for contempt. A Kaggle competition 
was held focusing on the accurate recognition of facial 
expressions within the FER2013 dataset. Table 1 in the 
referenced material presents a detailed breakdown of the 
frequency of each emotion within both datasets. The 
FER2013 dataset is particularly tailored for the analysis 
and identification of various facial expressions. 
 
Table 2: Distribution of emotions in the FER2013, 
CK+, and RAF-DB datasets. 
 
 
 
5.2 FER2013 dataset 
The FER2013 [33] dataset, sourced from a specialized 
challenge, is composed of grayscale facial images. These 
images, each with a resolution of 48 x 48 pixels, showcase 
a diverse range of ages and facial expressions. The dataset 
categorizes these images into seven distinct emotion 
classes: anger, disgust, fear, happiness, sadness, surprise, 
and neutral. In total, it comprises 28,709 images, of which 
3,589 are allocated for validation and another 3,589 for 
testing purposes. This dataset is unique in its inclusion of 
faces from various age groups and orientations, making it 
highly suitable for studies and applications in facial 
expression recognition. Figure 5 provides a glimpse into 
the array of sample images from the FER2013 dataset. 
   
                   
        Figure 5: Sample Images from the FER2013 
Dataset. 
 
 
 
 
26   Informatica 48 (2024) 15–32                                                                                                                                   A. R. Kadhim 
  
5.3 RAF-DB dataset 
The RAF-DB [34], or Real-world Affective Faces 
Database, is an extensive collection of emotional facial 
images with labels. It features 29,702 images, portraying 
1,000 different subjects each ex- hibiting a range of facial 
expressions. These images, gathered from the Internet, 
represent a broad spectrum of ages, genders, ethnic 
backgrounds, and lighting conditions. The dataset is 
bifurcated into a training set, which includes 23,702 
images, and a test set, comprising 5,999 images. Each 
image is categorized under one of seven emotional 
expressions. Figure 6 illustrates a selection of example 
images from the RAF-DB dataset. 
 
  
Figure 6: Sample Images from the RAF-DB 
Dataset. 
 
5.4 CK+ dataset 
The Kaggle CK+ [35] dataset is a comprehensive facial 
expression dataset that includes 981 individual 
expressions, spanning across seven emotions: anger, 
contempt, disgust, fear, happiness, sadness, and surprise. 
The images in this dataset are grayscale and uniformly 
cropped to a dimension of 48 x 48 pixels. This dataset is 
organized into training, validation, and test sets, catering 
to a diverse range of research requirements in facial 
expression recognition. The CK+ dataset is renowned for 
its balance and variety, offering a rich assortment of 
facial expressions, head poses, and various conditions, 
making it one of the most pivotal datasets in the field. 
Figure 7 showcases some example images from the CK+ 
dataset. 
 
 
         Figure 7: Sample Images from the CK+ 
Dataset. 
5.5 Parameters settings 
Please see Table 3 for better understanding of the 
parameters utilized in the present study.  Basic CNNs 
architecture and Fuzzy are the two parameter categories 
in the table. The fundamental architecture of CNNs is 
specified in the initial category of parameters. The first 
set of parameters describes the basic structure of CNNs. 
These include things like the number of convolutional 
layers, pooling layers, convolutional layer activation 
function, padding setup, the stride, hidden layer 
specifications, fully connected (FC) layer activation 
function, output layer activation function, loss function 
of choice, optimizer selection, metrics used, batch size, 
and designated epochs. The second set of parameters 
governs the Fuzzy logic layers like Fuzyy2Dpooling and 
Neuro-Fuzzy layers’ functionality. 
 
Table 3: Parameters used for fuzzy optimized CNNs. 
 
CNNs parameters values 
No. of convolutional layers 8 
Activation function for 
convolutional layers  
Relu 
Stride  1 
padding  same 
Hidden layers 114 
Activation function for FC 
layers  
Relu 
Activation function for the 
output layer 
softmax 
Loss function Categorical-
Crossentropy 
Optimizer SGD 
Metrics accuracy 
Epochs 30 
Batch-size 32 
Fuzzy Logic Layers values 
No. of Fuzzy2DPooling 
layers 
4 
No. of neuro-fuzzy layers 100 
 
We used Google Colab, a tool that offers a free 
environment and uses hardware acceleration to speed 
up Python 3 development. GPU T4, which stands for 
Graphics Processing Unit Tesla V100 Tensor Core, is 
the hardware that this service is using. The GPU T4 has 
a reputation for providing excellent performance and 
efficiency when it comes to deep learning and machine 
learning activities. When using Google Colab, users 
may do computationally demanding tasks, including 
training deep neural networks, much more quickly 
when they have access to GPU T4 as opposed to 
operating in a CPU-only environment. Because Google 
Colab can accelerate difficult computations using 
hardware, it is a more appealing option for academics, 
developers, and data scientists working on machine 
learning projects. 
 
6 Results and analysis 
In this section, we will illustrate the results of our CNN 
model’s performance in recognizing facial expressions, 
both before and after the implementation of data 
augmentation techniques. This comparative analysis is 
Facial Sentiment Analysis Using Convolutional Neural Network…                                            Informatica 48 (2024) 15–32       27                                                                                                                             
crucial as it provides insights into the effectiveness of data 
augmentation in improving the model’s accuracy and its 
ability to generalize across a broader range of facial 
expressions. 
Data augmentation is a powerful strategy in machine 
learning that artificially enhances the size and quality of 
training datasets by introducing variations in the data. 
These variations include random transformations such as 
rotations, translations, scaling, and horizontal flipping, 
which help the model become invariant to such changes 
and prevent overfitting. 
Prior to the application of data augmentation, the baseline 
performance of the CNN model will be presented.   We will 
analyze the model’s accuracy, and the classification report, 
which reveal the initial ability of the model to classify the 
seven different emotions within three datasets: FER2013, 
RAF-DB and CK+. This initial performance serves as a 
benchmark for the subsequent improvements that data 
augmentation aims to achieve. 
Subsequently, we will discuss the impact of data 
augmentation on the model’s performance. The same 
metrics—accuracy and classification report—will be used 
to quantify the improvements. The comparison will 
highlight how data augmentation influences the model’s 
performance in terms of its ability to learn from a more 
varied and representative set of facial expression data. 
Through this comparative approach, we aim to demonstrate 
the significance of data augmentation in enhancing the 
robustness of our CNN model. The results will show 
whether the augmented data has indeed led to a more 
accurate and generalizable model for facial expression 
recognition, thus validating the use of data augmentation 
as a beneficial technique in the training process. 
We report our approach's findings and evaluate its 
accuracy against alternative techniques using the 
FER2013, RAF-DB and CK+ datasets. Additionally, we 
display the ideal structures discovered in this investigation. 
The experimental findings of our strategy in comparison to 
other approaches are displayed in tables 4-5-6. 
 
Table 4: Comparison of accuracy of our method and 
different models on the FER2013 dataset. 
 
Ref. Model Acc.% 
[11] Random Search algorithm 72.16 
[12] VGGNet  Cosine Annealing 73.28 
[13] Fuzzy optimized CNN-RNN 72.81 
[14] AR-LBP-DLDP 91 
[15] ConvNet 70 
[16] DCNN 70 
[17] ViT models 74.20 
[28] Ensemble model 71.84 
 Our method (Fuzzy Optimized 
CNNs) 
98 
 
On the FER2013 dataset, Table 4 shows that the Fuzzy 
optimized CNNs approach (98%) performs better than the 
majority of other strategies. Random Search algorithm 
(72.16), VGGNet Cosine Annealing (73.28), Fuzzy 
optimized CNN-RNN (72.81) were all less accurate than 
Fuzzy optimized CNNs. Additionally, its performance was 
comparable to that of the top-performing methods, AR-
LBP-DLDP (91), ViT models (74) and Ensemble model 
(71.84). The findings indicate that the Fuzzy optimized 
CNNs approach demonstrates a high level of 
competitiveness and potentially surpasses other 
optimization techniques in terms of accuracy when utilized 
with the FER2013 dataset.  The Fuzzy optimized CNNs 
model that was designed experienced testing on FER2013 
dataset. 
 
Table 5: Comparison of accuracy of our method and 
different models on the RAF-DB dataset. 
                                             
Ref. Model Acc.% 
[18] Self-Cure Network (SCN) 88.14 
[19] ResNet18 86.02 
[20] Emotion-Conditional Adaption 
Network (ECAN) 
89.69 
[21] Amending Representation 
Module (ARM) 
90.42 
[22] Feature Decomposition and 
Reconstruction Learning (FDRL) 
89.47 
[27] MobileNet and ResNet-18 90.81 
[29] Rayleigh loss 87.79 
 Our method (Fuzzy Optimized 
CNNs) 
99 
 
Table 5 demonstrates that the Fuzzy optimized CNNs 
technique achieves the highest accuracy on the RAF-DB 
dataset, with an accuracy of 99%. The accuracy of this 
approach exceeds that of all the other methods listed, 
including Self-Cure Network (SCN) at 88.14%, ResNet18 
at 86.02%, Emotion-Conditional Adaption Network 
(ECAN) at 89.69%, Amending Representation Module 
(ARM) at 90.42%, Feature Decomposition and 
Reconstruction Learning (FDRL) at 89.47%, MobileNet 
and ResNet-18 at 90.81%, and Rayleigh loss at 87.79%. 
The Fuzzy optimized CNNs method shows promise as an 
effective approach for image classification tasks, 
especially when dealing with datasets like RAF-DB that 
consist of numerous classes. 
 
Table 6: Comparison of accuracy of our method and 
different models on the CK+ dataset. 
 
Ref. Model Acc.% 
[23] AUs 96.46 
[24] LBP and CNN 90.00 
[25] MobileNet 96.92 
[26] HOG and LBP 99.79 
 Our method (Fuzzy 
Optimized CNNs) 
100 
 
28   Informatica 48 (2024) 15–32                                                                                                                                   A. R. Kadhim 
  
The findings presented in Table 5 highlight the superior 
performance of the Fuzzy optimized CNNs technique, 
which achieved an accuracy of 100% on the CK+ dataset. 
This accuracy rate surpasses that of all other methods 
examined, including AUs at 96.46%, LBP and CNN at 
90.00%, MobileNet at 96.92%, HOG and LBP at 
99.79%. The results suggest that the Fuzzy optimized 
CNNs method holds great potential as an efficient 
approach for image classification tasks, especially when 
working with datasets like CK+. 
The confusion matrix, also referred to as an error matrix, 
is a tabular display that serves as a specific table layout 
to demonstrate the effectiveness of an algorithm in a 
classification task. In this matrix, each row signifies the 
predicted class, whereas each column signifies the actual 
class (or vice versa). The purpose of this matrix is to 
reveal instances where a facial expression recognition 
(FER) system may mistake one class for another, 
indicating confusion between the two classes. The 
recognition accuracy for seven distinct facial expressions 
is depicted in Figure 8 for the FER13 dataset, Figure 9 
for the RAF-DB dataset, and Figure 10 for the CK+ 
dataset (refer to confusion matrix). The FER2013 dataset 
indicates that the accuracy of recognizing emotions such 
as anger, disgust, fear, and sadness is inferior to the 
overall recognition accuracy achieved by the model when 
tested on the dataset. 
 
 
 
 
Figure 8: Confusion Matrix for FER2013 
Dataset. 
 
For the RAF-BD dataset, Figure 9 shows the confusion 
matrix of the predicted results with training, validation, 
and testing set. 
 
Figure 9: Confusion Matrix for RAF-DB 
Dataset.     
 
For CK+, Figure 10 shows the confusion matrix of 
the predicted results with training, validation, and test 
set. 
 
 
 
Figure 10: Confusion Matrix for CK+ Dataset.  
 
In comparing the results of Fuzzy Optimized CNNs with 
those from related work, several key differences and 
improvements emerge, particularly due to the novel 
integration of fuzzy logic into the CNN framework. This 
integration involves the use of fuzzy2Dpooling instead of 
traditional pooling layers for improved feature extraction, 
and replacing fully connected (FC) layers with a Fuzzy 
Facial Sentiment Analysis Using Convolutional Neural Network…                                            Informatica 48 (2024) 15–32       29                                                                                                                             
Neural Network to enhance classification performance. 
One of the most significant improvements is in handling 
uncertainty and ambiguity. Traditional CNN methods often 
struggle with ambiguous data, leading to 
misclassifications. Our results show that Fuzzy Optimized 
CNNs, which utilize fuzzy2Dpooling, significantly reduce 
misclassification rates in scenarios with high ambiguity. 
Fuzzy2Dpooling allows the model to better capture and 
represent the inherent vagueness in the data, leading to 
more robust classification outcomes. This advantage is 
particularly pronounced in datasets with inherent 
ambiguities, such as medical images where diagnostic 
boundaries can be unclear. By explicitly modeling 
uncertainty, our method provides a more reliable 
performance compared to standard CNN approaches. 
Our method also demonstrates greater robustness to noisy 
data compared to traditional CNNs, which often require 
extensive preprocessing to handle noise effectively. The 
inherent properties of fuzzy logic, combined with 
fuzzy2Dpooling, provide our model with a higher 
resilience to noise, maintaining higher accuracy and 
reliability in noisy environments. This reduces the need for 
complex data preprocessing steps and ensures more 
dependable performance in real-world conditions. 
When it comes to classification accuracy, our results 
indicate that Fuzzy Optimized CNNs outperform 
traditional CNNs, especially in scenarios with limited, 
imbalanced, or noisy datasets. The flexibility of fuzzy logic 
allows for refined decision boundaries, which, when 
combined with fuzzy2Dpooling and a Fuzzy Neural 
Network, enhances the model's ability to generalize from 
less-than-ideal data. This improvement is particularly 
beneficial in fields where obtaining large, clean, and 
balanced datasets is challenging, such as in medical 
diagnostics or remote sensing. 
In conclusion, Fuzzy Optimized CNNs present several 
novel contributions to the field of image classification. By 
incorporating fuzzy2Dpooling for improved feature 
extraction and replacing fully connected layers with a 
Fuzzy Neural Network for better classification, our 
approach addresses critical gaps in handling uncertainty, 
enhancing interpretability, robustness to noise, and 
adaptability. These improvements make Fuzzy Optimized 
CNNs a valuable and innovative addition to the current 
state-of-the-art, offering practical benefits and superior 
performance in real-world applications characterized by 
ambiguity and noise. 
7 Conclusion and future work 
In this paper, an ensemble-based deep recognition 
algorithm was proposed for which Fuzzy optimized 
CNNs model were trained independently. The structures 
of the Fuzzy optimized CNNs model we adopted to 
experiment on the FER13, RAF-DB, and CK+ datasets 
were simple to complex. A novel strategy for optimizing 
CNNs is presented in this research, employing the fuzzy 
logic method. This approach showcases multiple 
advantages, skillfully managing the trade-off between 
accuracy, computational efficiency, and training time. 
Furthermore, it attains outstanding classification 
accuracy when tested on the FER2013, RAF-DB and 
CK+ datasets. The utilization of Fuzzy optimized CNNs 
in this optimization method proves to be more effective 
than other algorithms that demand extensive 
computational resources and time. Consequently, it 
emerges as a highly viable choice for practical 
applications. The proposed technique offers a way to 
seamlessly incorporate CNNs into real-world scenarios, 
particularly in settings where resources are limited and 
time is of the essence. Future research endeavors could 
focus on examining how adaptable the SSA-based 
optimization technique is to various deep-learning 
architectures and tasks that extend beyond computer 
vision. In this study, we have introduced a novel 
approach, the Fuzzy Optimized Convolutional Neural 
Network (CNN), which integrates fuzzy logic to 
significantly enhance the capabilities of traditional CNN 
architectures. By replacing conventional pooling layers 
with fuzzy2Dpooling and employing a Fuzzy Neural 
Network instead of traditional fully connected layers, our 
model addresses key challenges in image classification 
related to uncertainty, ambiguity, and noise. Our 
experiments have demonstrated that the Fuzzy Optimized 
CNN outperforms traditional CNNs in handling 
ambiguous and noisy data. The fuzzy2Dpooling layers 
preserve critical features by considering the degree of 
membership within fuzzy sets, enhancing the model's 
ability to extract meaningful information from complex 
images. Meanwhile, the Fuzzy Neural Network provides 
interpretable classification decisions by using fuzzy 
rules, making the decision-making process transparent 
and understandable. A crucial aspect of our methodology 
was the use of data augmentation to increase the diversity 
of the training dataset. This augmentation strategy, 
incorporating rotations, translations, scaling, flipping, 
and noise addition, enriched the dataset and improved the 
model's generalization capabilities. By expanding the 
training dataset with augmented samples, we ensured that 
the model could effectively learn from a broader range of 
image variations. Exploring methods to optimize the 
computational efficiency of fuzzy2Dpooling and the 
Fuzzy Neural Network, particularly for large-scale 
datasets. Techniques such as parallel processing and 
hardware acceleration (e.g., GPUs or TPUs) could 
accelerate training and inference. Investigating advanced 
fuzzy membership functions that can better capture 
complex relationships in image data. Adaptive fuzzy 
membership functions could dynamically adjust to the 
data distribution, improving the model's adaptability 
across different domains. 
 
References  
[1] Theodoridis, S., Pikrakis, A., KoutroumbasCavouras, 
D. (2010). Introduction to Pattern Recognition: a 
Matlab approach. Retrieved from 
http://cds.cern.ch/record/1338559. 
[2] Albano, C., Dunn, W., Edlund, U., Johansson, E., 
30   Informatica 48 (2024) 15–32                                                                                                                                   A. R. Kadhim 
  
Nordén, B., Sjöström, M., & Wold, S. (1978). Four 
levels of pattern recognition. Analytica Chimica 
Acta, 103(4), 429–443. 
https://doi.org/10.1016/s0003-2670(01)83107-
x%20 . 
[3]   Jain, A., Duin, P., & Mao, N. J. (2000). Statistical 
pattern recognition: a review. IEEE Transactions on 
Pattern Analysis and Machine Intelligence, 22(1), 
4–37. https://doi.org/10.1109/34.824819 
[4]   Syntactic Pattern Recognition, applications. (1977). 
Communication and cybernetics. 
https://doi.org/10.1007/978-3-642-66438-0 
[5]   Ripley, B. D. (1996). Pattern recognition and neural 
networks. 
https://doi.org/10.1017/cbo9780511812651 
[6] Alloghani, M., Al-Jumeily, D., Mustafina, J., 
Hussain, A., & Aljaaf, A. J. (2019). A Systematic 
Review on Supervised and Unsupervised Machine 
Learning Algorithms for Data Science. In 
Unsupervised and semi-supervised learning (pp. 3–
21). https://doi.org/10.1007/978-3-030-22475-2_1 
[7]  Mandalapu, E. N., & Rajan, E. G. (2009). Rajan 
Transform and its Uses in Pattern Recognition. 
Informatica, 33(2), 205–212. Retrieved from 
https://www.informatica.si/index.php/informatica/
article/download/239/236 
[8]  Liang, C., & Dong, J. (2023). A survey of deep 
learning-based facial expression recognition 
research. Frontiers in Computing and Intelligent 
Systems, 5(2), 56–60. 
https://doi.org/10.54097/fcis.v5i2.12445 
[9]   Revina, I., & Emmanuel, W. S. (2021b). A survey 
on human face expression recognition techniques. 
Journal of King Saud University. Computer and 
Information Sciences/Maǧalaẗ Ǧamʼaẗ Al-malīk 
Saud : Ùlm Al-ḥasib Wa Al-maʼlumat, 33(6), 619–
628. https://doi.org/10.1016/j.jksuci.2018.09.002 
[10] Rajan, S., Chenniappan, P., Devaraj, S., & Madian, 
N. (2019). Facial expression recognition 
techniques: a comprehensive survey. IET Image 
Processing, 13(7), 1031–1040. 
https://doi.org/10.1049/iet-ipr.2018.6647 
[11] Vulpe-Grigorasi, A., & Grigore, O. (2021). 
Convolutional Neural Network Hyperparameters 
optimization for Facial Emotion Recognition. 
https://doi.org/10.1109/atee52255.2021.9425073 
[12] Khaireddin, Y., & Chen, Z. (2021). Facial Emotion 
Recognition: State of the Art Performance on 
FER2013. arXiv (Cornell University). 
https://doi.org/10.48550/arxiv.2105.03588 
[13] Zhang, D., & Tian, Q. (2021). A Novel Fuzzy 
Optimized CNN-RNN Method for Facial Expression 
Recognition. Elektronika Ir Elektrotechnika, 27(5), 
67–74. https://doi.org/10.5755/j02.eie.29648 
[14] Yaermaimaiti, Y., Kari, T., & Zhuang, G. (2022). 
Research on facial expression recognition based 
on an improved fusion algorithm. Nonlinear 
Engineering, 11(1), 112–122. 
https://doi.org/10.1515/nleng-2022-0015 
[15] Debnath, T., Reza, M. M., Rahman, A., Band, S. 
S., & Alinejad-Rokny, H. (2021). Four-layer 
Convnet to facial emotion recognition with 
minimal epochs and the significance of data 
diversity. Research Square (Research Square). 
https://doi.org/10.21203/rs.3.rs-511221/v1 
[16] Ozioma, C. O., Kanyifeechukwu, J. O., Hashim, I. 
B., & Daniel, O. (2022). Hybrid Facial 
Expression Recognition (FER2013) Model for 
Real-Time Emotion Classification and 
Prediction. 1(1), 63–71. 
https://doi.org/10.54646/bijiam.011 
[17] Bobojanov, S., Kim, B. M., Arabboev, M., & 
Begmatov, S. (2023). Comparative Analysis of 
Vision Transformer Models for Facial Emotion 
Recognition Using Augmented Balanced 
Datasets. Applied Sciences, 13(22), 12271. 
https://doi.org/10.3390/app132212271 
[18] Wang, K., Peng, X., Yang, J., Lu, S., & Qiao, Y. 
(2020). Suppressing Uncertainties for Large-
Scale Facial Expression Recognition. arXiv 
(Cornell University). 
https://doi.org/10.48550/arxiv.2002.10392 
[19] Qutub, A. a. H., & Atay, Y. (2023). Deep Learning 
Approaches for Classification of Emotion 
Recognition based on Facial Expressions. Nexo, 
36(05), 1–18. 
https://doi.org/10.5377/nexo.v36i05.17181 
[20] Li, S., & Deng, W. (2019). A Deeper Look at 
Facial Expression Dataset Bias. arXiv. 
https://doi.org/10.48550/ARXIV.1904.11150 
[21] Shi, J., & Zhu, S. (2021). Learning to Amend 
Facial Expression Representation via De-albino 
and Affinity. arXiv (Cornell University). 
https://doi.org/10.48550/arxiv.2103.10189 
[22] Ruan, D., Yan, Y., Lai, S., Chai, Z., Shen, C., & 
Wang, H. (2021). Feature Decomposition and 
Reconstruction Learning for Effective Facial 
Expression Recognition. 
https://doi.org/10.1109/cvpr46437.2021.00757 
[23] Kim, J. H., Kim, B. G., Roy, P. P., & Jeong, D. M. 
(2019). Efficient Facial Expression Recognition 
Algorithm Based on Hierarchical Deep Neural 
Network Structure. IEEE Access, 7, 41273–
41285. 
https://doi.org/10.1109/access.2019.2907327 
[24] Sawardekar S, Naik SR (2018) Facial expression 
recognition using efficient LBP and CNN. Int Res 
J Eng Technol (IRJET).5(6):2273–2277 
[25] Miao, Y., Dong, H., Jaam, J. M. A., & Saddik, A. 
E. (2019). A deep learning system for recognizing 
facial expression in Real-Time. ACM 
Transactions on Multimedia Computing, 
Communications and Applications/ACM 
Transactions on Multimedia Computing 
Communications and Applications, 15(2), 1–20. 
Facial Sentiment Analysis Using Convolutional Neural Network…                                            Informatica 48 (2024) 15–32       31                                                                                                                             
https://doi.org/10.1145/3311747 
[26] Shafira, S. S., Ulfa, N., Wibawa, H. A., &    
Rismiyati, N. (2019). Facial Expression 
Recognition Using Extreme Learning Machine. 
https://doi.org/10.1109/icicos48119.2019.89824
43 
[27] Lee, D. H., & Yoo, J. H. (2023b). CNN Learning 
Strategy for Recognizing Facial Expressions. IEEE 
Access, 11, 70865–70872. 
https://doi.org/10.1109/access.2023.3294099 
[28] Song, H. (2023). Comparison of Different Depth of 
Convolutional Neural Network Deep and shallow 
CNN comparison based on FER-2013. Highlights 
in Science, Engineering and Technology, 41, 80–
86. https://doi.org/10.54097/hset.v41i.6746 
[29] Fan, X., Deng, Z., Wang, K., Peng, X., & Qiao, Y. 
(2020). Learning Discriminative Representation 
For Facial Expression Recognition From 
Uncertainties. 
https://doi.org/10.1109/icip40778.2020.9190643 
[30] Hershey, S., Chaudhuri, S., Ellis, D. P. W., 
Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal, 
M., Platt, D., Saurous, R. A., Seybold, B., Slaney, 
M., Weiss, R. J., & Wilson, K. (2017). CNN 
architectures for large-scale audio classification. 
https://doi.org/10.1109/icassp.2017.7952132 
[31] Woolf, P. J., & Wang, Y. (2000). A fuzzy logic 
approach to analyzing gene expression data. 
Physiological Genomics/Physiological Genomics 
(Print), 3(1), 9–15. 
https://doi.org/10.1152/physiolgenomics.2000.3.1.
9 
[32] Hsu, M. J., Chien, Y. H., Wang, W. Y., & Hsu, C. 
C. (2020). A Convolutional Fuzzy Neural Network 
Architecture for Object Classification with Small 
Training Database. International Journal of Fuzzy 
Systems, 22(1), 1–10. 
https://doi.org/10.1007/s40815-019-00764-1 
[33] FER-2013. (2020, July 19). Kaggle. 
https://www.kaggle.com/datasets/msambare/fer20
13 
[34] RAF-DB DATASET. (2023, September 20). 
Kaggle. 
https://www.kaggle.com/datasets/shuvoalok/raf-
db-dataset 
[35] CK+ dataset. (2023, September 20). Kaggle.    
https://www.kaggle.com/datasets/shuvoalok/ck- 
dataset/discussion?sort=undefined 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32   Informatica 48 (2024) 15–32                                                                                                                                   A. R. Kadhim