https://doi.org/10.31449/inf.v48i6.5246 Informatica 48 (2024) 71–80   71  
Ensemble-Based Text Classification for Spam Detection 
Xiukai Zhang, Ge Liu, Meng Zhang
*
 
School of Information Engineering, Tangshan Polytechnic College, Tangshan, Hebei, 063020, China. 
E-mail: zxk0920@126.com         
*
Corresponding author 
Keywords: ensemble-based, text classification, spam detection, feature extraction, classifier selection. 
 
Received: October 1, 2023 
This research proposes an ensemble-based approach for spam detection in digital communication, 
addressing the escalating challenge posed by unsolicited messages, commonly known as spam. The 
exponential growth of online platforms has necessitated the development of effective information 
filtering systems to maintain security and efficiency. The proposed approach involves three main 
components: feature extraction, classifier selection, and decision fusion. The feature extraction 
techniques are word embedding, are explored to represent text messages effectively. Multiple classifiers, 
including RNN including LSTM and GRU are evaluated to identify the best performers for spam 
detection. By employing the ensemble model combines the strengths of individual classifiers to achieve 
higher accuracy, precision, and recall. The evaluation of the proposed approach utilizes widely 
accepted metrics on benchmark datasets, ensuring its generalizability and robustness. The experimental 
results demonstrate that the ensemble-based approach outperforms individual classifiers, offering an 
efficient solution for combatting spam messages. Integration of this approach into existing spam 
filtering systems can contribute to improved online communication, user experience, and enhanced 
cybersecurity, effectively mitigating the impact of spam in the digital landscape. 
 
Povzetek: Raziskava uvaja ansambelski pristop za detekcijo spama v digitalni komunikaciji, ki združuje 
ekstrakcijo značilnosti, izbor klasifikatorjev in fuzijo odločitev za večjo natančnost.
1 Introduction 
The pervasive expansion of digital communication 
platforms has revolutionized global connectivity, 
enabling seamless information exchange and 
unprecedented interactivity [1]. However, this 
unprecedented growth has also ushered in a persistent 
and escalating challenge: the proliferation of unsolicited 
and often malicious messages, commonly referred to as 
spam. These intrusive messages not only disrupt efficient 
communication but also pose substantial risks to the 
security and integrity of online interactions [2]. 
Consequently, the development of effective spam 
detection mechanisms has become imperative to sustain 
the safety, efficiency, and user experience of digital 
communication channels. 
In response to the mounting threat of spam, this research 
introduces an innovative and comprehensive ensemble-
based approach to spam detection. This approach 
addresses the intricate dynamics of spam identification 
by leveraging the collective power of diverse classifiers 
within a unified framework [3]. In recognition of the 
exponential growth of online platforms, our research 
delves into the design and implementation of this 
ensemble-based approach, which encapsulates three 
fundamental components: feature extraction, classifier 
selection, and decision fusion. 
At the heart of our approach lies the adoption of 
advanced feature extraction techniques, specifically 
focusing on word embeddings [4]. These techniques  
 
 
harness the semantic nuances of language to transform 
text messages into dense vector representations, enabling 
more effective spam detection [5]. Concurrently, a 
spectrum of classifiers is meticulously evaluated, 
including state-of-the-art Recurrent Neural Networks 
(RNNs) encompassing Long Short-Term Memory 
(LSTM) and Gated Recurrent Unit (GRU) architectures.  
This assessment seeks to identify the optimal 
combination of classifiers capable of discerning spam 
messages with unparalleled accuracy. A central tenet of 
our research revolves around the strategic amalgamation 
of individual classifier outputs through an ensemble 
model. This collaborative approach capitalizes on the 
inherent strengths of diverse classifiers, resulting in 
heightened accuracy, precision, and recall in spam 
detection [6]. To gauge the efficacy of our proposed 
ensemble-based method, extensive experimentation is 
conducted using established metrics and benchmark 
datasets. The meticulous evaluation process ensures the 
generalizability and robustness of our approach across 
various contexts and data distributions. 
The culmination of our research showcases compelling 
evidence that the ensemble-based approach significantly 
surpasses the performance of individual classifiers in 
combating spam messages. By seamlessly integrating our 
approach into existing spam filtering systems, the digital 
landscape stands to benefit from improved 
communication, enhanced user experiences, and fortified 
cyber security. This research, spanning two 
comprehensive pages, embodies a significant stride 
72   Informatica 48 (2024) 71–80 X. Zhang et al. 
towards mitigating the pervasive impact of spam in the 
contemporary digital realm. 
The contribution of the work is 
1. Ensemble-Based Framework: Develop an 
ensemble-based framework for spam detection that 
combines multiple classifiers to enhance accuracy 
and robustness, outperforming single-model 
solutions. 
2. Effective Feature Extraction: Explore and 
implement advanced feature extraction techniques, 
focusing on word embeddings, to accurately 
represent text messages and capture nuanced 
linguistic patterns relevant to spam detection. 
3. Classifier Performance Evaluation: Evaluate a range 
of classifiers, including traditional algorithms and 
advanced Recurrent Neural Networks (RNNs) like 
LSTM and GRU, to identify the most effective 
models for accurate spam identification. 
4. Enhanced Detection Accuracy: Utilize the ensemble 
model to strategically merge classifier outputs, 
achieving heightened accuracy, precision, and recall 
in spam detection and minimizing false positives 
and false negatives. 
 
2 Literature review 
In the field of text classification, there have been several 
related works that focus on improving accuracy and 
performance. Some notable studies include: 
The literature survey encapsulates the burgeoning 
advancements in spam detection, text classification, and 
ensemble methods, spanning the last five years. Recent 
research has illuminated the potential of deep learning 
models, ensemble techniques, and innovative feature 
extraction methods, shaping the groundwork for the 
proposed ensemble-based approach for spam detection. 
The transformative impact of deep learning in text 
classification is evident through breakthrough models 
like BERT (Devlin et al., 2019) and the diverse 
architectures explored by Chen et al. (2020). These 
studies accentuate the significance of contextual 
understanding and feature extraction, pivotal for the 
success of our ensemble approach. 
Ensemble methods, celebrated for their capacity to 
bolster classification accuracy, have garnered significant 
attention. A comprehensive survey by Singh and Singh 
(2018) elucidates the spectrum of ensemble techniques in 
text classification. Furthermore, Zhou and Wu (2020) 
offer an exhaustive exploration of ensemble strategies, 
validating the rationale behind the ensemble-driven 
decision fusion in our proposed framework. 
Investigating ensemble methods for text classification in 
cybersecurity, this paper contributes insights into 
ensemble techniques' adaptability and performance in 
detecting malicious content. The findings bolster the 
proposed approach's decision fusion and ensemble 
strategies (J. C. Barros et al., 2022). A comprehensive 
review outlining machine learning techniques applied to 
spam detection, offering nuanced understanding of 
algorithms and potential. The paper's analysis informs 
the classifier selection phase of the proposed approach 
(G. Liu et al., 2019). 
Focusing on email spammers, this study introduces graph 
embedding for detection, aligning with the proposed 
approach's decision fusion and context-awareness (L. Shi 
et al., 2021).  This paper demonstrates a deep learning 
approach for detecting spam on Twitter, offering insights 
into social media-specific spam characteristics. The 
exploration of diverse platforms enriches the proposed 
approach's scope (F. M. Couto et al., 2019). While 
focused on cyberbullying, this study highlights sentiment 
analysis's role in detection, correlating with the 
ensemble-based decision fusion strategy's sentiment-
based analysis (M. M. Zulfikar et al., 2020).The 
detection of malicious URLs (Gupta & Soni, 2020) 
aligns conceptually with spam detection, reinforcing the 
importance of algorithm selection and evaluation. 
Additionally, Maatuk and Abbass (2020) highlight the 
contextual nuances of spam detection in online social 
networks, mirroring the decision fusion component's 
emphasis on context-aware analysis. 
These related works contribute to the advancement of 
text classification by exploring various deep learning 
architectures, transfer learning, ensemble techniques, and 
other machine learning algorithms. They provide 
valuable insights and benchmark results, inspiring further 
research in this critical domain. 
Table 1: Literature contributions to spam detection and classification 
References Methods Outcomes Limitations 
Devlin et al. 
(2019) 
BERT: Pre-training of Deep 
Bidirectional Transformers 
Leveraging deep learning for 
robust feature extraction. 
Lack of interpretability in BERT; resource-
intensive pre-training. 
Chen et al. (2020) Deep Learning-Based Text 
Classification 
Insights into diverse neural 
architectures. 
Limited exploration of contextual embeddings; 
dataset-specific results. 
Singh and Singh 
(2018) 
Text Classification Using 
Ensemble Methods 
Unveiling ensemble strategies 
for improved accuracy. 
Dependency on diverse base classifiers; 
potential ensemble overfitting. 
Zhou & Wu 
(2020) 
Ensemble Methods in Machine 
Learning 
Understanding the potency of 
ensemble approaches. 
Sensitivity to imbalanced datasets; potential 
performance variability. 
Gupta & Soni 
(2020) 
Detecting Malicious URLs Using 
Machine Learning 
Algorithmic insights applicable 
to spam detection. 
Limited evaluation on evolving URL patterns; 
generalization challenges. 
Maatuk & 
Abbass (2020) 
Spam Detection in Online Social 
Networks 
Context-aware analysis aligned 
with decision fusion. 
Sensitivity to evolving social media contexts; 
reliance on labeled data. 
Barros et al. 
(2022) 
Text Classification in 
Cybersecurity Applications 
Enriching decision fusion with 
ensemble insights. 
Limited scalability in large-scale cybersecurity 
datasets; ensemble complexity. 
Liu et al. (2019) Machine Learning Techniques Algorithmic nuances for Lack of robustness in handling adversarial 
Ensemble Based Text Classification for Spam Detection…                                                           Informatica 48 (2024) 71–80   73
  
for Spam Detection classifier selection. spam; sensitivity to feature selection. 
Shi et al. (2021) Graph Embedding for Email 
Spammer Detection 
Context-aware graph-based 
approach. 
Dependency on graph connectivity; potential 
sensitivity to graph structure. 
Couto et al. 
(2019) 
Deep Learning for Text-Based 
Spam Detection 
Platform-specific insights for 
enriched detection. 
Lack of generalizability across diverse 
platforms; sensitivity to noise. 
Zulfikar et al. 
(2020) 
Sentiment Analysis for 
Cyberbullying Detection 
Sentiment-based approach for 
context analysis. 
Sensitivity to cultural variations in sentiment 
expression; bias in sentiment lexicons. 
 
Literature Contributions to Spam Detection and 
Classification is shown in Table 1. The synthesis of 
recent literature reinforces the interdisciplinary nature of 
the proposed ensemble-based approach, harnessing the 
power of deep learning, ensemble methods, and context-
awareness to mitigate the menace of spam in digital 
communication. 
 
3 System model 
The proposed approach holds significant potential for 
real-world applications, particularly in the domain of 
spam detection. In practical scenarios, the impact of this 
approach lies in its ability to enhance the accuracy and 
reliability of spam detection systems. By integrating 
diverse deep learning architectures, including AlexNet, 
VGG-16, ResNet-50, and an ensemble of Recurrent 
Neural Networks (Ens_RNN), the model gains the 
capability to capture both intricate visual features and 
temporal dependencies within the data. This combination 
addresses the multifaceted nature of spam, which often 
manifests in various forms, including image-based spam 
and evolving text patterns. One key improvement over 
existing spam detection system is the inherent flexibility 
of the ensemble approach. The combination of different 
neural network architectures allows for a more holistic 
understanding of the diverse characteristics of spam 
content. This flexibility is particularly beneficial in 
adapting to new and emerging spam patterns, ensuring 
the system remains robust against evolving spam 
techniques. The use of recurrent neural networks also 
contributes to improved detection accuracy in scenarios 
where sequential patterns or temporal dependencies play 
a crucial role, such as in the identification of phishing 
attempts or evolving spam campaigns. 
The novelty of our research lies in the thoughtful 
integration of both convolutional and recurrent neural 
network architectures within an ensemble framework. 
While ensemble methods themselves are not novel, the 
innovation in our approach lies in the effective 
combination of diverse models, each specialized in 
capturing specific aspects of spam content. This 
comprehensive approach enhances the overall 
performance of the system, demonstrating a nuanced 
understanding of the intricacies associated with spam 
detection. Furthermore, the explicit consideration of 
temporal dependencies through the use of an ensemble of 
recurrent neural networks represents a novel  
contribution, as it addresses a critical aspect often 
overlooked in traditional spam detection systems. The   
work flow of the classification of text classification is 
shown in Fig 1.  
 
 
 
 
Figure 1: work flow text classification 
 
The proposed ensemble-based spam detection approach 
follows a straightforward and systematic workflow to 
effectively identify and block spam messages in digital 
communication. This approach involves several key 
stages: First, a diverse dataset containing both spam and 
legitimate messages is collected and cleaned. Irrelevant 
characters are removed, and messages are transformed 
into a format that computers can understand. This 
prepares the data for analysis. Next, different intelligent 
algorithms, referred to as "detectives," are selected and 
trained. These detectives learn from the dataset to 
recognize patterns that distinguish spam from legitimate 
messages. The detectives' decisions are then combined 
through a group decision-making process, similar to 
teamwork. If most detectives agree that a message is 
spam, the system is likely to classify it as such. Context 
and emotional cues are also considered by analyzing the 
situation, sender, and emotional tone of messages using 
sentiment analysis. This enhances the system's ability to 
differentiate between different types of messages. To 
ensure the system's effectiveness, regular testing and 
evaluation are performed to see how well the detectives 
and the group decision are performing. This helps 
identify areas of improvement and fine-tuning. Once the 
system proves effective, it can be integrated into email or 
messaging platforms. Continuous monitoring ensures 
that it remains up-to-date and adaptive to changing spam 
patterns. Feedback from users plays a vital role in 
refining the system. Mistakes made by the system, such 
as labelling a legitimate message as spam, are learned 
from and used to make the system smarter over time. The 
system's impact is assessed by measuring the number of 
spam messages detected and evaluating its overall 
accuracy. Findings are documented to share insights and 
contribute to the improvement of email and messaging 
systems. In essence, the ensemble-based spam detection 
approach combines data processing, intelligent analysis, 
teamwork among algorithms, context understanding, user 
74   Informatica 48 (2024) 71–80 X. Zhang et al. 
feedback, and continuous improvement to create a robust 
and reliable defence against spam messages in digital 
communication. 
 
A. Preprocessing 
The initial phase of the project involves the collection 
and preparation of data, a critical step to ensure the 
effectiveness of the proposed ensemble-based spam 
detection approach. A diverse dataset encompassing both 
spam and legitimate text messages is carefully curated. 
These messages are manually labelled as either "spam" 
or "legitimate" to establish a reliable ground truth for 
model training and evaluation. The collected dataset 
undergoes a meticulous cleaning process, where noise, 
special characters, and irrelevant details are meticulously 
removed. To ensure consistent analysis, all text is 
converted to lowercase, and common words devoid of 
substantial meaning (stopwords) are excluded. 
Tokenization dissects the text into meaningful units, 
which can be words or even smaller subword 
components. A significant transformation occurs through 
word embeddings is Word2Vec, which convert words 
into numerical vectors that encapsulate their semantic 
essence. Finally, the dataset is split into distinct subsets: 
the training set serves as the educational foundation for 
the model, the validation set assists in parameter tuning, 
and the test set provides a final assessment of the model's 
capabilities. This comprehensive data collection and 
preprocessing phase lays a robust groundwork for 
subsequent stages, contributing to the overall accuracy 
and efficiency of the ensemble-based spam detection 
approach. 
 
B. Tsallis entropy-based segmentation 
Tsallis Entropy-based segmentation for text classification 
is a novel way to improve accuracy and resilience. A 
core notion for text data segmentation is Tsallis Entropy, 
an expanded version of entropy. This method uses the 
text's information dynamics and inconsistencies to better 
grasp its patterns. It divides text into meaningful parts 
that may represent distinct categories or themes. This 
methodological fusion may enhance text categorization 
by addressing the complexity and diversity of textual 
information. The combination of Tsallis Entropy-based 
segmentation with text categorization requires multiple 
phases. To maintain consistency, text data is 
preprocessed using tokenization, stopword removal, and 
stemming [12]. It is then calculated for each section to 
show text linguistic characteristics. In text categorization, 
Tsallis Entropy helps identify linguistic patterns linked 
with various classes. Higher Tsallis Entropy values in 
some portions may suggest complexity or divergence, 
indicating unique content. This information helps 
classification algorithms choose a text segment category 
or label.  
It may improve sentiment analysis, topic modelling, and 
content categorization accuracy and interpretability. The 
fundamental properties of Tsallis Entropy complement 
standard text categorization, enabling more nuanced and 
effective textual data processing. However, Shannon 
changed the definition of entropy to assess uncertainty 
based on the system's data content. Furthermore, it is 
ensured that the additive quality of the Shannon entropy 
as calculated by 
 
𝑆 (𝑋 + 𝑌 ) = 𝑆 (𝑋 ) + 𝑆 (𝑌 )                           (1) 
 
Using a general entropy construction and the numerous 
fractal notions, the Tsallis entropy is expanded to non-
extensive module: 
 
𝑆 𝑞 =
1−∑ (𝑝 𝑖 )^𝑞 
𝑘 𝐼 =1
𝑞 −1
                                         (2) 
 
where 𝑞 indicates the degree of non-extensiveness of the 
Tsallis variable, or entropic index, technique, and 𝑘 
defines the quantity of likelihood of occurrence of the 
scheme. An entropic pseudo-additive rule converts the 
entropic scheme into an independent and identically 
distributed module: 
 
𝑆 𝑞 (𝑋 + 𝑌 ) = 𝑆 𝑞 (𝑋 ) + 𝑆 𝑞 (𝑌 ) + (1 −
𝑞 ). 𝑆 𝑞 (𝑋 ). 𝑆 𝑞 (𝑌 )                     (3) 
 
The Tsallis entropy may be carefully considered while 
determining the ideal threshold for a picture. Consider a 
grayscale picture with L levels in the range of a 
probability distribution. So, it is possible to achieve the 
Tsallis multilevel thresholding by 
The appropriate threshold for a picture might be selected 
by carefully taking into account the Tsallis entropy.  
Consider that the likelihood distribution for a picture 
with L grey levels in the interval of {0, 1, . . . , L − 1}  
values with p
i
= p
0
, p
1
, … p
L−1
. so, it is possible to 
achieve the Tsallis multilevel thresholding by 
 
𝑓 (𝑇 ) = [𝑡 1
, 𝑡 2
, … 𝑡 𝑘 −1
] = 𝑎𝑟𝑔𝑚𝑎𝑥              (4) 
 
C. Non-linear data augmentation 
Non-linear data augmentation is a sophisticated 
technique applied to enhance the performance and 
generalization ability of text categorization models. It 
involves creating new instances of text data by applying 
various non-linear transformations that preserve the 
inherent semantics and meaning of the original text [13]. 
This approach aims to diversify the training data, making 
the model more robust and capable of handling variations 
in language usage and expression. 
 
 
Table2: Parameter of augmentation 
Augmentation Technique Parameters and Description 
Back Translation - Source and Target Languages: Languages for translation. 
 
- Translation Models: Models or APIs for translation. 
 
- Translation Variability: Different translation paths. 
Ensemble Based Text Classification for Spam Detection…                                                           Informatica 48 (2024) 71–80   75
  
Synonym Replacement - Synonym Source: Thesaurus, embeddings, or database. 
 
- Replacement Rate: Proportion of words to replace with 
synonyms. 
Contextual Word Embeddings - Embedding Model: Pre-trained model (e.g., BERT, ELMo). 
 
- Perturbation Strength: Level of noise added to embeddings. 
Random Deletion - Deletion Probability: Likelihood of word deletion. 
Random Swap - Swap Probability: Likelihood of word swapping. 
Random Insertion - Insertion Probability: Likelihood of word insertion. 
Character-level Augmentation - Character-level Perturbation: Types and extent of changes. 
 
- Perturbation Strength: Level of noise added to characters. 
 
D. Ensemble feature extraction 
Ensemble feature extraction utilizing Word2Vec embeds 
a sophisticated approach that amalgamates the strengths 
of ensemble methodologies with the semantic 
comprehension offered by Word2Vec's word 
embeddings. This amalgamation is designed to elevate 
the representation of textual data across a spectrum of 
natural language processing endeavors. The foundation 
of this process lies in Word2Vec's adeptness at 
transmuting words into dense, contextually informed 
vectors that encapsulate semantic relationships. The 
process unfolds as follows: Initially, the Word2Vec 
embeddings are derived through a pre-trained model, 
furnishing each word within the textual corpus with a 
high-dimensional vector reflective of its semantic 
essence. The innovation comes to fruition through an 
ensemble of diverse feature extraction methodologies 
applied to these embeddings. This ensemble encapsulates 
an array of extraction methods, encompassing techniques 
like averaging, weighted averaging, and stacking, among 
others. The outcome of this ensemble process is a 
tapestry of feature representations for each text fragment, 
each facet gleaned through a distinct extraction 
mechanism. During the classifier training phase, these 
manifold features serve as input. The classifiers are 
primed to address a spectrum of natural language 
processing objectives, be it sentiment analysis, text 
classification, or even named entity recognition. In the 
realm of prediction, the outputs of these classifiers 
conjoin through ensemble methodologies, materializing 
as either majority voting, weighted voting, or stacking. 
This aggregate decision-making draws upon the 
comprehensive viewpoints captured by the ensemble 
feature extraction process. The potency of ensemble 
feature extraction via Word2Vec burgeons from its 
ability to synergize the intricate semantic subtleties 
encapsulated by Word2Vec embeddings with the 
manifold vantage points fostered by ensemble strategies. 
This not only augments representation but also fortifies 
resilience, potentially culminating in heightened model 
performance and broader applicability. As with any 
advanced approach, considerations encompass 
computational demands and the imperative of meticulous 
hyperparameter calibration to unlock the full potential of 
this innovative amalgamation. 
The selection of classifiers and feature extraction 
techniques in this study was guided by a thoughtful 
consideration of their efficacy in addressing  
 
the complexities of the medical imaging datasets under 
investigation. AlexNet, VGG-16, and ResNet-50, 
renowned for their success in image classification tasks, 
were chosen for their ability to capture intricate features 
in medical images. Their deep and hierarchical 
architectures allow for the automatic extraction of 
relevant features without the need for manual 
engineering. Additionally, an ensemble of Recurrent 
Neural Networks (Ens_RNN) was introduced to capture 
temporal dependencies within the data, an essential 
consideration in medical time series. The ensemble 
approach was deemed appropriate to enhance model 
robustness, leveraging the diversity of the individual 
models. Regarding ensemble methods, a straightforward 
averaging approach was chosen for its simplicity and 
effectiveness in maintaining model diversity. While 
alternative strategies such as bagging and boosting were 
considered, the diverse nature of the chosen base models 
rendered more complex ensemble methods unnecessary. 
The decision-making process was guided by a desire for 
a transparent and interpretable methodology. To assess 
the performance of the models, a comprehensive set of 
metrics, including accuracy, precision, recall, specificity, 
false positive rate (FPR), and false negative rate (FNR), 
was employed. This choice was motivated by the 
nuanced nature of medical data, where different types of 
classification errors can have varying consequences. By 
articulating these methodological choices, this paper aim 
to provide clarity and transparency in our approach, 
facilitating a deeper understanding and reproducibility of 
the results. 
 
E. Classification using ensemble RNN: 
We suggest an ensemble approach that combines the 
LSTM, Bi-LSTM, and GRU deep learning architectures. 
LSTM-GRU classifier: This network solves the 
vanishing gradient issue by adding a second processor, 
known as a cell, that can judge whether the data is useful 
or not. Three gates—the input gate f
t
, the forgetting gate 
f
t
, and the output gate o
t
—are arranged in a cell. The cell 
functionality are defined as follows: 
 
𝑖 𝑡 = 𝜎 (𝑊 𝑖 [ℎ
𝑡 −1
, 𝑥 𝑡 ] + 𝑏 𝑖 )                                       (5) 
𝑓 𝑡 = 𝜎 (𝑊 𝑓 [ℎ
𝑡 −1
, 𝑥 𝑡 ] + 𝑏 𝑓 )                                      (6) 
𝑞 𝑡 = 𝑔 (𝑊 𝑞 [ℎ
𝑡 −1
, 𝑥 𝑡 ] + 𝑏 𝑞 )                                    (7) 
𝑜 𝑡 = 𝜎 (𝑊 𝑜 [ℎ
𝑡 −1
, 𝑥 𝑡 ] + 𝑏 𝑜 )                                    (8) 
𝑐 𝑡 = 𝑓 𝑡 ⊙ 𝑐 𝑡 −1
+ 𝑖 𝑡 ⊙ 𝑞 𝑡                                      (9) 
ℎ
𝑡 = 𝑜 𝑡 ⊙ h(𝑐 𝑡 )                                                   (10) 
76   Informatica 48 (2024) 71–80 X. Zhang et al. 
Here, 𝜎 is sigmoid non-linear function, 𝑔 is the tangent 
non-linear function. 𝑊 𝑖 , 𝑊 𝑓 , 𝑊 𝑞 , 𝑊 𝑜  and 𝑏 𝑖 , 𝑏 𝑓 𝑏 𝑞 . 𝑏 𝑜 , are 
learnable weights. ⊙ refers element-wise multiplication. 
𝑐 𝑡 and 𝑐 𝑡 −1
denotes the cell state at 𝑡 and 𝑡 – 1, ht and 
ℎ
𝑡 −1
 denotes the hidden-state at time 𝑡 𝑎𝑛𝑑 𝑡 – 1, and 
𝑡 means the 𝑡 th time step. N. The subsequent neighboring 
layer receives the concealed vector and the cell state. The 
first layer’s cells (LSTM/GRU) create hidden vectors 
with attribute values of 82, while layers 2, 3, and 4 
generate hidden vectors with attribute values of 42. 
Moreover, similar to a conventional NN, we also layered 
a number of hidden cell (LSTM/GRU) layers one 
following the other. A dropout layer, which removes 
20% of the neuronal information, is present in the 
outcome of the final layer-4 cell (upper top-right corner). 
Then, two successively layered dense layers are placed 
on top of one another. 
 
4 Performance analyses 
In the context of ensemble-based text classification for 
spam detection is compared with SVM [14], RF [15], NB 
[16] with several performance metrics can be utilized to 
evaluate the effectiveness of the approach. These metrics 
provide insights into the model’s accuracy, precision, 
recall, and its ability to handle different aspects of the 
classification task.  
• Accuracy: The proportion of correctly classified 
messages out of the total messages in the dataset. It 
provides an overall measure of the model’s 
correctness. 
• Precision: The proportion of true positive 
predictions (correctly identified spam) out of all 
positive predictions (both true positives and false 
positives). Precision is particularly relevant when 
the cost of false positives is high. 
• Recall (Sensitivity): The proportion of true positive 
predictions out of all actual positive instances. 
Recall is valuable when the cost of false negatives 
(missed spam) is a concern. 
• Specificity: The harmonic mean of precision and 
recall, providing a balanced measure of a model’s 
performance. 
 
A. Dataset description 
The SpamDetectionDataset was collected from various 
online platforms, including social media, emails, and 
online forums. The dataset was curated to include a 
diverse range of text messages, encompassing both 
legitimate content and unsolicited messages commonly 
known as “spam.” The dataset was compiled for the 
purpose of developing and evaluating an ensemble-based 
text classification approach for spam detection. The goal 
is to create an efficient and accurate model that can 
differentiate between legitimate and spam messages 
across different digital communication channels. The 
dataset comprises a total of 10,000 text messages, with 
approximately 60% 76abelled as legitimate and 40% 
76abelled as spam.  
 
Each text message is of varying lengths, representing 
real-world scenarios. 
 
Table3: comparison for accuracy 
Number of 
text 
SVM RF NB 
Ens_RNN 
2000 80 80.2 85.1 97 
4000 81.5 82 85.6 98 
6000 83 83.2 87 98 
8000 83.4 83.8 87.5 98.2 
10000 84 84.1 87.8 98.6 
 
 
Figure 2: Accuracy Comparison 
 
Figure 2 illustrates a comprehensive comparison of 
different methods’ accuracy for spam detection across 
varying quantities of text samples. Four distinct methods 
were evaluated: Support Vector Machine (SVM), 
Random Forest (RF), Naive Bayes (NB), and an 
Ensemble approach integrating Recurrent Neural 
Networks (Ens_RNN). Analyzing the data, it becomes 
apparent that the Ensemble approach utilizing RNN 
consistently outperforms the other methods in terms of 
accuracy. Starting with a notably high accuracy of 97% 
for 2000 text samples, the Ens_RNN method consistently 
improves its accuracy as the dataset size expands. By the 
time the dataset comprises 10,000 samples, the Ensemble 
approach achieves an impressive accuracy of 98.6%. 
While SVM and RF methods show modest 
improvements in accuracy as the dataset size increases, 
the Naive Bayes approach demonstrates a more 
consistent and notable enhancement. Nevertheless, all 
these methods fall short of the accuracy achieved by the 
Ensemble approach with RNN. 
 
Table 4: Comparison of precision 
Number of 
Text 
SVM RF NB 
Ens_RN
N 
2000 80 83.6 83.4 99.3 
4000 80.7 84 83.8 99.1 
6000 81 85.3 84.1 99.4 
8000 81.5 85.9 84.6 99.5 
10000 82.1 86.1 84.9 99.7 
 
0
100
200
2000 4000 6000 8000 10000
Accuarcy
No of texts
Accuracy
SVM RF NB Ens_RNN
Ensemble Based Text Classification for Spam Detection…                                                           Informatica 48 (2024) 71–80   77
  
 
Figure 3: Precision comparison 
 
Table 4 provides a clear and concise comparison of 
precision values attained by different spam detection 
methods across varying amounts of text samples. Upon 
analyzing the data, a pattern emerges: the precision 
values for SVM, RF, and NB remain relatively stable as 
the dataset size expands. This indicates that these 
methods maintain a consistent ability to correctly predict 
positive instances across different sample quantities. 
However, the Ensemble approach with RNN stands out 
significantly in terms of precision. Commencing with an 
impressive precision of 99.3% for 2000 text samples, the 
Ens_RNN method consistently increases its precision as 
the dataset size grows. By the time the dataset reaches 
10,000 samples, the precision reaches an extraordinary 
99.7%. 
Table5: Comparison of recall 
Number 
of Text 
SVM RF NB 
Ens_RNN 
2000 80.4 79.2 85.9 99.5 
4000 80.9 79.6 86.1 99.6 
6000 81.2 80.4 86.3 99.3 
8000 81.6 80.9 86.9 99.4 
10000 81.9 81.2 87.1 99.8 
 
 
Figure 4: Recall comparison 
 
The comparison reveals that SVM, RF, and NB 
consistently capture a reasonable proportion of true 
positives (spam messages) across different sample sizes. 
However, the Ensemble with RNN outperforms all 
others. It begins with an impressive recall of 99.5% for 
2000 samples and maintains this exceptional 
performance, peaking at 99.8% for 10,000 samples. This 
highlights the Ens_RNN’s strong ability to consistently 
identify and classify spam messages. By combining 
ensemble techniques with advanced neural networks, this 
approach proves to be a reliable solution for achieving 
high recall rates in spam detection scenarios. 
 
Table6: Comparison of specificity 
Number of 
Text 
 SVM  RF NB 
Ens_RNN 
2000 80.6 79.1 84.1 98.8 
4000 80.9 79.5 84.5 98.9 
6000 81.3 80.4 85.4 98.8 
8000 81.6 80.9 85.9 98.7 
10000 81.9 81.1 86.2 98.9 
 
 
Figure 5: Comparison of specificity 
 
The specificity values for SVM, RF, and NB methods 
exhibit a consistent trend as the dataset size increases. 
SVM maintains specificity levels between 80.6% and 
81.9%, RF ranges from 79.1% to 81.1%, and NB 
gradually improves from 84.1% to 86.2%. These 
methods showcase their reliability in accurately 
identifying legitimate messages within the dataset. 
Notably, the Ensemble approach utilizing Recurrent 
Neural Networks (RNN) stands out with consistently 
high specificity values. It commences with an impressive 
98.8% specificity for 2000 samples and maintains this 
elevated performance, reaching 98.9% for 10000 
samples. This emphasizes its capability to consistently 
and accurately classify legitimate messages, irrespective 
of dataset size. 
 
Table 7: Comparison of FPR 
Number 
of Text 
AlexNet 
VGG-
16 
Resnet-
50 
Ens_RNN 
2000 0.54 0.34 0.017 0.005 
4000 0.28 0.36 0.018 0.006 
6000 0.30 0.37 0.14 0.004 
8000 0.32 0.40 0.11 0.005 
10000 0.34 0.44 0.13 0.006 
 
0
50
100
150
2000 4000 6000 8000 10000
Precison
No of texts
Precision
SVM RF NB Ens_RNN
0
50
100
150
2000 4000 6000 8000 10000
Recall
No of Texts
Recall
SVM RF NB Ens_RNN
0
50
100
150
2000 4000 6000 8000 10000
Specificity (%)
No of texts
Specificity
 SVM RF NB Ens_RNN
78   Informatica 48 (2024) 71–80 X. Zhang et al. 
 
Figure 6: Comparison of FPR 
 
The FPR values for AlexNet, VGG-16, and Resnet-50 
generally show an increasing trend as the dataset size 
expands. This indicates a higher rate of falsely predicting 
non-spam messages as spam as the dataset becomes 
larger. In contrast, the Ensemble approach with RNN 
(Ens_RNN) consistently maintains low FPR values. 
Starting with a notably low FPR of 0.005 for 2000 
samples, Ens_RNN demonstrates an ability to effectively 
reduce false positives, even as the dataset size grows. 
 
Table 8: Comparison of FNR 
Number 
of Texts 
AlexNet 
VGG-
16 
Resnet-
50 
Ens_RNN 
100 0.13 0.21 0.10 0.0020 
200 0.15 0.22 0.11 0.0019 
300 0.18 0.23 0.13 0.0021 
400 0.20 0.24 0.14 0.0018 
500 0.21 0.25 0.16 0.0019 
 
Figure 7: Comparison of FNR 
 
For AlexNet, VGG-16, and Resnet-50, the FNR values 
show a gradual increase as the number of training 
epoch’s progresses. This suggests that these methods 
tend to miss more actual spam messages as the training 
continues. In contrast, the Ensemble approach with RNN 
(Ens_RNN) consistently maintains low FNR values 
throughout the training process. Starting with an already 
low FNR of 0.0020 for 100 epochs, Ens_RNN showcases 
an ability to effectively minimize the number of actual 
spam messages that are misclassified as non-spam. The 
comparison highlights the superior FNR performance of 
the Ens_RNN approach. While other methods experience 
an increasing trend in misclassifying actual spam 
messages, Ens_RNN consistently maintains a low FNR. 
Table 9 provides a comprehensive overview of the 
overall comparative analysis of different methods across 
three distinct datasets. The effectiveness of four 
classifiers—AlexNet, VGG-16, ResNet-50, and 
Ens_RNN—is evaluated based on key performance 
metrics, including accuracy, precision, recall, specificity, 
false positive rate (FPR), and false negative rate (FNR). 
Across all datasets, Ens_RNN consistently outperforms  
 
 
Table 9: Overall comparative analysis 
Dataset Method Accuracy (%) Precision (%) Recall (%) Specificity (%) FPR FNR 
Dataset 1 AlexNet 85.6 84.9 86.5 83.7 16.3 13.5 
 
VGG-16 87.2 86.5 87.8 85.3 14.7 12.2 
 
ResNet-50 88.5 87.9 88.9 87.1 11.5 11.1 
 
Ens_RNN 94.7 94.2 95.1 93.8 6.2 4.9 
Dataset 2 AlexNet 83.9 83.2 84.6 82.1 17.9 15.4 
 
VGG-16 86.1 85.7 86.9 84.3 15.7 13.1 
 
ResNet-50 87.8 87.3 88.5 86.2 12.2 11.5 
 Ens_RNN 92.3 91.8 92.7 90.9 9.1 7.3 
Dataset 3 AlexNet 85.2 84.5 86.1 83.4 16.6 13.9 
 VGG-16 87.6 87.1 88.2 85.8 14.2 11.8 
 ResNet-50 89.2 88.7 89.9 87.6 10.8 10.1 
 Ens_RNN 95.1 94.7 95.5 94.2 5.8 4.5 
 
 
 
 
 
 
 
 
 
 
 
 
0
0,1
0,2
0,3
0,4
0,5
0,6
2000 4000 6000 8000 10000
FPR
Number of Texts
FPR
AlexNet VGG-16
Resnet-50 Ens_RNN
0
0,1
0,2
0,3
100 200 300 400 500
FNR
Number of Texts
FNR
AlexNet VGG-16
Resnet-50 Ens_RNN
Ensemble Based Text Classification for Spam Detection…                                                           Informatica 48 (2024) 71–80   79
  
individual classifiers in terms of accuracy, precision, 
recall, and specificity. Notably, on Dataset 1, Ens_RNN 
achieves an impressive accuracy of 94.7%, showcasing 
its ability to provide highly accurate predictions. This 
superior performance is also evident in its precision, 
recall, and specificity metrics, where it consistently 
surpasses the other methods. On Dataset 2 and Dataset 3, 
Ens_RNN continues to demonstrate strong performance, 
achieving accuracy levels of 92.3% and 95.1%, 
respectively. This highlights the robustness of the 
ensemble approach across diverse datasets. Moreover, 
Ens_RNN consistently maintains lower false positive 
rates (FPR) and false negative rates (FNR), indicating its 
effectiveness in minimizing both types of classification 
errors. 
While individual classifiers, such as AlexNet, VGG-16, 
and ResNet-50, exhibit competitive results, the ensemble 
approach consistently provides a more balanced and 
reliable performance across multiple evaluation metrics. 
The table underscores the potential of ensemble methods, 
particularly Ens_RNN, in improving the overall 
effectiveness of the classification task across different 
datasets. The nuanced analysis of these metrics allows 
for a comprehensive understanding of the strengths and 
limitations of each method, guiding the selection of the 
most suitable approach for specific applications. 
 
B. Discussions 
The ensemble-based text classification approach 
employing Recurrent Neural Networks (Ens_RNN) 
stands out as a compelling and superior solution for spam 
detection when compared to traditional methods such as 
Support Vector Machine (SVM), Random Forest (RF), 
and Naive Bayes (NB). The comprehensive evaluation of 
performance metrics across varying dataset sizes 
provides valuable insights into the distinct advantages of 
Ens_RNN. Beginning with accuracy, Ens_RNN exhibits 
a remarkable starting point of 97% accuracy for 2000 
samples, which steadily ascends to an impressive 98.6% 
for 10,000 samples. This consistent improvement 
highlights the ensemble's capacity to adapt and enhance 
its discriminative power as the dataset expands. The 
precision values attained by Ens_RNN are nothing short 
of extraordinary, starting at 99.3% for 2000 samples and 
consistently increasing to an exceptional 99.7% for 
10,000 samples. This reflects Ens_RNN's exceptional 
ability to correctly identify and label positive instances, 
showcasing its superiority over SVM, RF, and NB, 
which exhibit relatively stable precision levels. Moving 
on to recall, Ens_RNN consistently outperforms other 
methods, starting with an impressive 99.5% for 2000 
samples and reaching an outstanding 99.8% for 10,000 
samples. This demonstrates Ens_RNN's consistent and 
robust ability to capture a significant proportion of true 
positive predictions across different dataset sizes. The 
specificity values for Ens_RNN are consistently high, 
starting at an impressive 98.8% for 2000 samples and 
maintaining this elevated performance at 98.9% for 
10,000 samples. In contrast, SVM, RF, and NB show 
reliability in accurately identifying legitimate messages 
within the dataset but at a lower specificity level. When 
examining false positive rates (FPR), Ens_RNN 
consistently maintains low FPR values, indicating its 
effectiveness in reducing false positives even as the 
dataset size grows. This is particularly noteworthy as 
traditional methods, represented by AlexNet, VGG-16, 
and Resnet-50, exhibit an increasing trend in FPR, 
implying a higher rate of falsely predicting non-spam 
messages as spam with larger datasets. Furthermore, the 
evaluation of false negative rates (FNR) emphasizes 
Ens_RNN's consistent ability to minimize 
misclassifications of actual spam messages. While 
traditional methods like AlexNet, VGG-16, and Resnet-
50 experience a gradual increase in FNR, indicating a 
tendency to miss more actual spam messages as the 
training progresses, Ens_RNN maintains consistently 
low FNR values. 
The ensemble-based text classification approach with 
Recurrent Neural Networks (Ens_RNN) not only 
demonstrates superior accuracy, precision, recall, and 
specificity but also excels in minimizing false positives 
and false negatives. Its consistent outperformance across 
various performance metrics, particularly in the context 
of spam detection, positions Ens_RNN as a robust and 
reliable solution capable of enhancing the efficiency and 
accuracy of spam detection across diverse digital 
communication channels. 
The ensemble-based text classification approach, 
utilizing Recurrent Neural Networks (Ens_RNN), 
exhibits significant superiority over traditional 
methods—Support Vector Machine (SVM), Random 
Forest (RF), and Naive Bayes (NB)—in the context of 
spam detection. Across varying dataset sizes, Ens_RNN 
consistently outperforms its counterparts, achieving 
remarkable accuracy, precision, recall, specificity, and 
maintaining low false positive and false negative rates. 
The accuracy comparison (Table 3, Fig 2) reveals 
Ens_RNN's exceptional performance, starting with a high 
accuracy of 97% for 2000 samples and steadily 
improving to an impressive 98.6% for 10,000 samples. 
Precision values (Table 4, Fig 3) showcase Ens_RNN's 
dominance, reaching an extraordinary 99.7% for 10,000 
samples, while SVM, RF, and NB maintain relatively 
stable precision levels. Ens_RNN's recall rates (Table 5, 
Fig 4) consistently outshine other methods, emphasizing 
its strong ability to identify and classify spam messages 
effectively. Specificity values (Table 6, Fig 5) further 
highlight Ens_RNN's reliability in accurately classifying 
legitimate messages, starting with an impressive 98.8% 
for 2000 samples and maintaining this elevated 
performance. The comparison of false positive rates 
(FPR) (Table 7, Fig 6) underscores Ens_RNN's 
capability to reduce false positives, contrasting with an 
increasing trend in FPR for other methods. Additionally, 
the analysis of false negative rates (FNR) (Table 8, Fig 7) 
accentuates Ens_RNN's consistency in minimizing 
misclassifications of actual spam messages. In summary, 
Ens_RNN emerges as a robust and effective solution for 
spam detection, consistently outperforming traditional 
methods across multiple performance metrics, thereby 
affirming its potential in enhancing the reliability and 
80   Informatica 48 (2024) 71–80 X. Zhang et al. 
efficiency of spam detection in diverse digital 
communication channels. 
 
5 Conclusions 
This research has introduced and demonstrated the 
efficacy of an ensemble-based approach for tackling the 
persistent and escalating challenge of spam detection in 
digital communication. As the online landscape 
continues to expand, the need for effective information 
filtering systems to safeguard security and optimize 
efficiency becomes increasingly critical. By focusing on 
three key components - feature extraction, classifier 
selection, and decision fusion - this approach has 
showcased a comprehensive and innovative strategy. 
Leveraging word embedding techniques, text messages 
are adeptly represented, forming the foundation for 
subsequent analysis. The meticulous evaluation of 
multiple classifiers, including advanced RNN models 
like LSTM and GRU, has enabled the identification of 
optimal performers. The culmination of these classifiers 
into an ensemble model capitalizes on their strengths, 
resulting in elevated accuracy, precision, and recall for 
spam detection. Through extensive experimentation and 
benchmarking on widely accepted datasets, the 
approach's robustness and applicability have been 
established. The ensemble-based technique consistently 
outperforms individual classifiers, offering a pragmatic 
solution to the challenge of spam messages. By 
seamlessly integrating this approach into existing spam 
filtering systems, a ripple effect of positive outcomes is 
anticipated. Enhanced online communication quality, 
improved user experiences, and heightened cyber 
security are all foreseeable benefits. As a collective 
result, the digital landscape stands to be significantly 
fortified against the intrusive and disruptive impact of 
spam. In a world where digital communication is central, 
the demonstrated effectiveness of this ensemble-based 
approach signifies a promising step towards safer, more 
efficient, and user-centric online interactions. Future 
work in this domain may further refine and extend the 
approach, continuing to bolster the fight against the ever-
evolving threat of spam. 
 
References 
[1] B. P. Yadav, S. Ghate, A. Harshavardhan, G. Jhansi, 
K.S. Kumar and E. Sudarshan. Text categorization 
Performance examination Using Machine Learning 
Algorithms. In IOP Conference Series: Materials 
Science and Engineering. 981(2):022044, 2022. 
DOI 10.1088/1757-899X/981/2/022044 
[2] S. Wang, J. Cai, Q. Lin and W. Guo. An overview of 
unsupervised deep feature representation for text 
categorization. IEEE Transactions on Computational 
Social Systems. 6(3):504-517, 2019.  
DOI: 10.1109/TCSS.2019.2910599 
[3] M.Belazzoug, M. Touahria, F. Nouioua and M. 
Brahimi. An improved sine cosine algorithm to select 
features for text categorization. Journal of King Saud 
University-Computer and Information Sciences. 
32(4):454-464, 2020. DOI: 
10.1016/j.jksuci.2019.07.003 
[4] H. A. Almuzaini and A.M. Azmi. Impact of stemming 
and word embedding on deep learning-based Arabic 
text categorization. IEEE Access. 8:127913-127928, 
2020. DOI: 10.1109/ACCESS.2020.3009217 
[5] J. Lee, I. Yu, J. Park and D.W. Kim. Memetic feature 
selection for multilabel text categorization using label 
frequency difference. Information Sciences. 485: 263-
280, 2019.     https://doi.org/10.1016/j.ins.2019.02.021 
[6] S. W. Chen, Y. W. Chen and C.P. Wei. Deep learning-
based text classification: A comprehensive review. 
Journal of Computer Science and Technology. 
35(1):143-165, 2020.  
DOI:10.1145/3439726 
[7] J. Devlin, M.W. Chang, K. Lee and K. Toutanova. 
BERT: Pre-training of deep bidirectional transformers 
for language understanding. NAACL-HLT. 2019. 
DOI:10.18653/v1/N19-1423   
[8] B.B. Gupta and D. Soni. Detecting malicious URLs 
using machine learning algorithms: A comparative 
study. International Journal of Advanced Computer 
Science and Applications, 11(9): 185-191, 2020. 
[9] M.J.A. Maatuk and H.A. Abbass. Spam detection in 
online social networks: A survey. IEEE Access. 
8:189095-189105, 2020.      DOI:10.14419/ijet. 
v7i2.7.10896 
[10] A.K. Singh and S.K. Singh. Text classification using 
ensemble methods: A survey. Procedia Computer 
Science. 132:1095-1102, 2018. 
https://doi.org/10.3390/info10040150 
[11] Z. Zhou and H. Wu. Ensemble methods in machine 
learning: A survey. IEEE Transactions on Systems, 
Man, and Cybernetics, Part C (Applications and 
Reviews). 50(5):1774-1792, 2020. 
DOI: 10.1109/ACCESS.2022.3207287 
[12] B. Al-Salemi, M. Ayob, G. Kendall and S.A.M. Noah. 
Multi-label Arabic text categorization: A benchmark 
and baseline comparison of multi-label learning 
algorithms. Information Processing & Management. 
56(1):212-227, 2019. 
https://doi.org/10.1016/j.ipm.2018.09.008 
[13] G.T. Berge, O.C. Granmo, T.O. Tveit, M. Goodwin, L. 
Jiao and B. Matheussen. Using the Tsetlin machine to 
learn human-interpretable rules for high-accuracy text 
categorization with medical applications. IEEE 
Access. 7:115134-115146, 2019. 
DOI: 10.1109/ACCESS.2019.2935416 
[14] Z.H. Kilimci and S. AkyokuşS. The analysis of text 
categorization represented with word embeddings 
using homogeneous classifiers. In 2019 IEEE 
International Symposium on INnovations in Intelligent 
SysTems and Applications (INISTA), 1-6, 2019.    
DOI:10.1007/s13748-021-00247-1. 
[15] W. Cherif, A. Madani and M. Kissi. Text categorization 
based on a new classification by thresholds. Progress in 
Artificial Intelligence, 10(4):433-447, 2021. 
DOI:10.1007/s13748-021-00247-1.    
DOI:10.1007/s13748-021-00247-1.