https://doi.org/10.31449/inf.v48i6.5246 Informatica 48 (2024) 71–80 71 Ensemble-Based Text Classification for Spam Detection Xiukai Zhang, Ge Liu, Meng Zhang * School of Information Engineering, Tangshan Polytechnic College, Tangshan, Hebei, 063020, China. E-mail: zxk0920@126.com * Corresponding author Keywords: ensemble-based, text classification, spam detection, feature extraction, classifier selection. Received: October 1, 2023 This research proposes an ensemble-based approach for spam detection in digital communication, addressing the escalating challenge posed by unsolicited messages, commonly known as spam. The exponential growth of online platforms has necessitated the development of effective information filtering systems to maintain security and efficiency. The proposed approach involves three main components: feature extraction, classifier selection, and decision fusion. The feature extraction techniques are word embedding, are explored to represent text messages effectively. Multiple classifiers, including RNN including LSTM and GRU are evaluated to identify the best performers for spam detection. By employing the ensemble model combines the strengths of individual classifiers to achieve higher accuracy, precision, and recall. The evaluation of the proposed approach utilizes widely accepted metrics on benchmark datasets, ensuring its generalizability and robustness. The experimental results demonstrate that the ensemble-based approach outperforms individual classifiers, offering an efficient solution for combatting spam messages. Integration of this approach into existing spam filtering systems can contribute to improved online communication, user experience, and enhanced cybersecurity, effectively mitigating the impact of spam in the digital landscape. Povzetek: Raziskava uvaja ansambelski pristop za detekcijo spama v digitalni komunikaciji, ki združuje ekstrakcijo značilnosti, izbor klasifikatorjev in fuzijo odločitev za večjo natančnost. 1 Introduction The pervasive expansion of digital communication platforms has revolutionized global connectivity, enabling seamless information exchange and unprecedented interactivity [1]. However, this unprecedented growth has also ushered in a persistent and escalating challenge: the proliferation of unsolicited and often malicious messages, commonly referred to as spam. These intrusive messages not only disrupt efficient communication but also pose substantial risks to the security and integrity of online interactions [2]. Consequently, the development of effective spam detection mechanisms has become imperative to sustain the safety, efficiency, and user experience of digital communication channels. In response to the mounting threat of spam, this research introduces an innovative and comprehensive ensemble- based approach to spam detection. This approach addresses the intricate dynamics of spam identification by leveraging the collective power of diverse classifiers within a unified framework [3]. In recognition of the exponential growth of online platforms, our research delves into the design and implementation of this ensemble-based approach, which encapsulates three fundamental components: feature extraction, classifier selection, and decision fusion. At the heart of our approach lies the adoption of advanced feature extraction techniques, specifically focusing on word embeddings [4]. These techniques harness the semantic nuances of language to transform text messages into dense vector representations, enabling more effective spam detection [5]. Concurrently, a spectrum of classifiers is meticulously evaluated, including state-of-the-art Recurrent Neural Networks (RNNs) encompassing Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures. This assessment seeks to identify the optimal combination of classifiers capable of discerning spam messages with unparalleled accuracy. A central tenet of our research revolves around the strategic amalgamation of individual classifier outputs through an ensemble model. This collaborative approach capitalizes on the inherent strengths of diverse classifiers, resulting in heightened accuracy, precision, and recall in spam detection [6]. To gauge the efficacy of our proposed ensemble-based method, extensive experimentation is conducted using established metrics and benchmark datasets. The meticulous evaluation process ensures the generalizability and robustness of our approach across various contexts and data distributions. The culmination of our research showcases compelling evidence that the ensemble-based approach significantly surpasses the performance of individual classifiers in combating spam messages. By seamlessly integrating our approach into existing spam filtering systems, the digital landscape stands to benefit from improved communication, enhanced user experiences, and fortified cyber security. This research, spanning two comprehensive pages, embodies a significant stride 72 Informatica 48 (2024) 71–80 X. Zhang et al. towards mitigating the pervasive impact of spam in the contemporary digital realm. The contribution of the work is 1. Ensemble-Based Framework: Develop an ensemble-based framework for spam detection that combines multiple classifiers to enhance accuracy and robustness, outperforming single-model solutions. 2. Effective Feature Extraction: Explore and implement advanced feature extraction techniques, focusing on word embeddings, to accurately represent text messages and capture nuanced linguistic patterns relevant to spam detection. 3. Classifier Performance Evaluation: Evaluate a range of classifiers, including traditional algorithms and advanced Recurrent Neural Networks (RNNs) like LSTM and GRU, to identify the most effective models for accurate spam identification. 4. Enhanced Detection Accuracy: Utilize the ensemble model to strategically merge classifier outputs, achieving heightened accuracy, precision, and recall in spam detection and minimizing false positives and false negatives. 2 Literature review In the field of text classification, there have been several related works that focus on improving accuracy and performance. Some notable studies include: The literature survey encapsulates the burgeoning advancements in spam detection, text classification, and ensemble methods, spanning the last five years. Recent research has illuminated the potential of deep learning models, ensemble techniques, and innovative feature extraction methods, shaping the groundwork for the proposed ensemble-based approach for spam detection. The transformative impact of deep learning in text classification is evident through breakthrough models like BERT (Devlin et al., 2019) and the diverse architectures explored by Chen et al. (2020). These studies accentuate the significance of contextual understanding and feature extraction, pivotal for the success of our ensemble approach. Ensemble methods, celebrated for their capacity to bolster classification accuracy, have garnered significant attention. A comprehensive survey by Singh and Singh (2018) elucidates the spectrum of ensemble techniques in text classification. Furthermore, Zhou and Wu (2020) offer an exhaustive exploration of ensemble strategies, validating the rationale behind the ensemble-driven decision fusion in our proposed framework. Investigating ensemble methods for text classification in cybersecurity, this paper contributes insights into ensemble techniques' adaptability and performance in detecting malicious content. The findings bolster the proposed approach's decision fusion and ensemble strategies (J. C. Barros et al., 2022). A comprehensive review outlining machine learning techniques applied to spam detection, offering nuanced understanding of algorithms and potential. The paper's analysis informs the classifier selection phase of the proposed approach (G. Liu et al., 2019). Focusing on email spammers, this study introduces graph embedding for detection, aligning with the proposed approach's decision fusion and context-awareness (L. Shi et al., 2021). This paper demonstrates a deep learning approach for detecting spam on Twitter, offering insights into social media-specific spam characteristics. The exploration of diverse platforms enriches the proposed approach's scope (F. M. Couto et al., 2019). While focused on cyberbullying, this study highlights sentiment analysis's role in detection, correlating with the ensemble-based decision fusion strategy's sentiment- based analysis (M. M. Zulfikar et al., 2020).The detection of malicious URLs (Gupta & Soni, 2020) aligns conceptually with spam detection, reinforcing the importance of algorithm selection and evaluation. Additionally, Maatuk and Abbass (2020) highlight the contextual nuances of spam detection in online social networks, mirroring the decision fusion component's emphasis on context-aware analysis. These related works contribute to the advancement of text classification by exploring various deep learning architectures, transfer learning, ensemble techniques, and other machine learning algorithms. They provide valuable insights and benchmark results, inspiring further research in this critical domain. Table 1: Literature contributions to spam detection and classification References Methods Outcomes Limitations Devlin et al. (2019) BERT: Pre-training of Deep Bidirectional Transformers Leveraging deep learning for robust feature extraction. Lack of interpretability in BERT; resource- intensive pre-training. Chen et al. (2020) Deep Learning-Based Text Classification Insights into diverse neural architectures. Limited exploration of contextual embeddings; dataset-specific results. Singh and Singh (2018) Text Classification Using Ensemble Methods Unveiling ensemble strategies for improved accuracy. Dependency on diverse base classifiers; potential ensemble overfitting. Zhou & Wu (2020) Ensemble Methods in Machine Learning Understanding the potency of ensemble approaches. Sensitivity to imbalanced datasets; potential performance variability. Gupta & Soni (2020) Detecting Malicious URLs Using Machine Learning Algorithmic insights applicable to spam detection. Limited evaluation on evolving URL patterns; generalization challenges. Maatuk & Abbass (2020) Spam Detection in Online Social Networks Context-aware analysis aligned with decision fusion. Sensitivity to evolving social media contexts; reliance on labeled data. Barros et al. (2022) Text Classification in Cybersecurity Applications Enriching decision fusion with ensemble insights. Limited scalability in large-scale cybersecurity datasets; ensemble complexity. Liu et al. (2019) Machine Learning Techniques Algorithmic nuances for Lack of robustness in handling adversarial Ensemble Based Text Classification for Spam Detection… Informatica 48 (2024) 71–80 73 for Spam Detection classifier selection. spam; sensitivity to feature selection. Shi et al. (2021) Graph Embedding for Email Spammer Detection Context-aware graph-based approach. Dependency on graph connectivity; potential sensitivity to graph structure. Couto et al. (2019) Deep Learning for Text-Based Spam Detection Platform-specific insights for enriched detection. Lack of generalizability across diverse platforms; sensitivity to noise. Zulfikar et al. (2020) Sentiment Analysis for Cyberbullying Detection Sentiment-based approach for context analysis. Sensitivity to cultural variations in sentiment expression; bias in sentiment lexicons. Literature Contributions to Spam Detection and Classification is shown in Table 1. The synthesis of recent literature reinforces the interdisciplinary nature of the proposed ensemble-based approach, harnessing the power of deep learning, ensemble methods, and context- awareness to mitigate the menace of spam in digital communication. 3 System model The proposed approach holds significant potential for real-world applications, particularly in the domain of spam detection. In practical scenarios, the impact of this approach lies in its ability to enhance the accuracy and reliability of spam detection systems. By integrating diverse deep learning architectures, including AlexNet, VGG-16, ResNet-50, and an ensemble of Recurrent Neural Networks (Ens_RNN), the model gains the capability to capture both intricate visual features and temporal dependencies within the data. This combination addresses the multifaceted nature of spam, which often manifests in various forms, including image-based spam and evolving text patterns. One key improvement over existing spam detection system is the inherent flexibility of the ensemble approach. The combination of different neural network architectures allows for a more holistic understanding of the diverse characteristics of spam content. This flexibility is particularly beneficial in adapting to new and emerging spam patterns, ensuring the system remains robust against evolving spam techniques. The use of recurrent neural networks also contributes to improved detection accuracy in scenarios where sequential patterns or temporal dependencies play a crucial role, such as in the identification of phishing attempts or evolving spam campaigns. The novelty of our research lies in the thoughtful integration of both convolutional and recurrent neural network architectures within an ensemble framework. While ensemble methods themselves are not novel, the innovation in our approach lies in the effective combination of diverse models, each specialized in capturing specific aspects of spam content. This comprehensive approach enhances the overall performance of the system, demonstrating a nuanced understanding of the intricacies associated with spam detection. Furthermore, the explicit consideration of temporal dependencies through the use of an ensemble of recurrent neural networks represents a novel contribution, as it addresses a critical aspect often overlooked in traditional spam detection systems. The work flow of the classification of text classification is shown in Fig 1. Figure 1: work flow text classification The proposed ensemble-based spam detection approach follows a straightforward and systematic workflow to effectively identify and block spam messages in digital communication. This approach involves several key stages: First, a diverse dataset containing both spam and legitimate messages is collected and cleaned. Irrelevant characters are removed, and messages are transformed into a format that computers can understand. This prepares the data for analysis. Next, different intelligent algorithms, referred to as "detectives," are selected and trained. These detectives learn from the dataset to recognize patterns that distinguish spam from legitimate messages. The detectives' decisions are then combined through a group decision-making process, similar to teamwork. If most detectives agree that a message is spam, the system is likely to classify it as such. Context and emotional cues are also considered by analyzing the situation, sender, and emotional tone of messages using sentiment analysis. This enhances the system's ability to differentiate between different types of messages. To ensure the system's effectiveness, regular testing and evaluation are performed to see how well the detectives and the group decision are performing. This helps identify areas of improvement and fine-tuning. Once the system proves effective, it can be integrated into email or messaging platforms. Continuous monitoring ensures that it remains up-to-date and adaptive to changing spam patterns. Feedback from users plays a vital role in refining the system. Mistakes made by the system, such as labelling a legitimate message as spam, are learned from and used to make the system smarter over time. The system's impact is assessed by measuring the number of spam messages detected and evaluating its overall accuracy. Findings are documented to share insights and contribute to the improvement of email and messaging systems. In essence, the ensemble-based spam detection approach combines data processing, intelligent analysis, teamwork among algorithms, context understanding, user 74 Informatica 48 (2024) 71–80 X. Zhang et al. feedback, and continuous improvement to create a robust and reliable defence against spam messages in digital communication. A. Preprocessing The initial phase of the project involves the collection and preparation of data, a critical step to ensure the effectiveness of the proposed ensemble-based spam detection approach. A diverse dataset encompassing both spam and legitimate text messages is carefully curated. These messages are manually labelled as either "spam" or "legitimate" to establish a reliable ground truth for model training and evaluation. The collected dataset undergoes a meticulous cleaning process, where noise, special characters, and irrelevant details are meticulously removed. To ensure consistent analysis, all text is converted to lowercase, and common words devoid of substantial meaning (stopwords) are excluded. Tokenization dissects the text into meaningful units, which can be words or even smaller subword components. A significant transformation occurs through word embeddings is Word2Vec, which convert words into numerical vectors that encapsulate their semantic essence. Finally, the dataset is split into distinct subsets: the training set serves as the educational foundation for the model, the validation set assists in parameter tuning, and the test set provides a final assessment of the model's capabilities. This comprehensive data collection and preprocessing phase lays a robust groundwork for subsequent stages, contributing to the overall accuracy and efficiency of the ensemble-based spam detection approach. B. Tsallis entropy-based segmentation Tsallis Entropy-based segmentation for text classification is a novel way to improve accuracy and resilience. A core notion for text data segmentation is Tsallis Entropy, an expanded version of entropy. This method uses the text's information dynamics and inconsistencies to better grasp its patterns. It divides text into meaningful parts that may represent distinct categories or themes. This methodological fusion may enhance text categorization by addressing the complexity and diversity of textual information. The combination of Tsallis Entropy-based segmentation with text categorization requires multiple phases. To maintain consistency, text data is preprocessed using tokenization, stopword removal, and stemming [12]. It is then calculated for each section to show text linguistic characteristics. In text categorization, Tsallis Entropy helps identify linguistic patterns linked with various classes. Higher Tsallis Entropy values in some portions may suggest complexity or divergence, indicating unique content. This information helps classification algorithms choose a text segment category or label. It may improve sentiment analysis, topic modelling, and content categorization accuracy and interpretability. The fundamental properties of Tsallis Entropy complement standard text categorization, enabling more nuanced and effective textual data processing. However, Shannon changed the definition of entropy to assess uncertainty based on the system's data content. Furthermore, it is ensured that the additive quality of the Shannon entropy as calculated by 𝑆 (𝑋 + 𝑌 ) = 𝑆 (𝑋 ) + 𝑆 (𝑌 ) (1) Using a general entropy construction and the numerous fractal notions, the Tsallis entropy is expanded to non- extensive module: 𝑆 𝑞 = 1−∑ (𝑝 𝑖 )^𝑞 𝑘 𝐼 =1 𝑞 −1 (2) where 𝑞 indicates the degree of non-extensiveness of the Tsallis variable, or entropic index, technique, and 𝑘 defines the quantity of likelihood of occurrence of the scheme. An entropic pseudo-additive rule converts the entropic scheme into an independent and identically distributed module: 𝑆 𝑞 (𝑋 + 𝑌 ) = 𝑆 𝑞 (𝑋 ) + 𝑆 𝑞 (𝑌 ) + (1 − 𝑞 ). 𝑆 𝑞 (𝑋 ). 𝑆 𝑞 (𝑌 ) (3) The Tsallis entropy may be carefully considered while determining the ideal threshold for a picture. Consider a grayscale picture with L levels in the range of a probability distribution. So, it is possible to achieve the Tsallis multilevel thresholding by The appropriate threshold for a picture might be selected by carefully taking into account the Tsallis entropy. Consider that the likelihood distribution for a picture with L grey levels in the interval of {0, 1, . . . , L − 1} values with p i = p 0 , p 1 , … p L−1 . so, it is possible to achieve the Tsallis multilevel thresholding by 𝑓 (𝑇 ) = [𝑡 1 , 𝑡 2 , … 𝑡 𝑘 −1 ] = 𝑎𝑟𝑔𝑚𝑎𝑥 (4) C. Non-linear data augmentation Non-linear data augmentation is a sophisticated technique applied to enhance the performance and generalization ability of text categorization models. It involves creating new instances of text data by applying various non-linear transformations that preserve the inherent semantics and meaning of the original text [13]. This approach aims to diversify the training data, making the model more robust and capable of handling variations in language usage and expression. Table2: Parameter of augmentation Augmentation Technique Parameters and Description Back Translation - Source and Target Languages: Languages for translation. - Translation Models: Models or APIs for translation. - Translation Variability: Different translation paths. Ensemble Based Text Classification for Spam Detection… Informatica 48 (2024) 71–80 75 Synonym Replacement - Synonym Source: Thesaurus, embeddings, or database. - Replacement Rate: Proportion of words to replace with synonyms. Contextual Word Embeddings - Embedding Model: Pre-trained model (e.g., BERT, ELMo). - Perturbation Strength: Level of noise added to embeddings. Random Deletion - Deletion Probability: Likelihood of word deletion. Random Swap - Swap Probability: Likelihood of word swapping. Random Insertion - Insertion Probability: Likelihood of word insertion. Character-level Augmentation - Character-level Perturbation: Types and extent of changes. - Perturbation Strength: Level of noise added to characters. D. Ensemble feature extraction Ensemble feature extraction utilizing Word2Vec embeds a sophisticated approach that amalgamates the strengths of ensemble methodologies with the semantic comprehension offered by Word2Vec's word embeddings. This amalgamation is designed to elevate the representation of textual data across a spectrum of natural language processing endeavors. The foundation of this process lies in Word2Vec's adeptness at transmuting words into dense, contextually informed vectors that encapsulate semantic relationships. The process unfolds as follows: Initially, the Word2Vec embeddings are derived through a pre-trained model, furnishing each word within the textual corpus with a high-dimensional vector reflective of its semantic essence. The innovation comes to fruition through an ensemble of diverse feature extraction methodologies applied to these embeddings. This ensemble encapsulates an array of extraction methods, encompassing techniques like averaging, weighted averaging, and stacking, among others. The outcome of this ensemble process is a tapestry of feature representations for each text fragment, each facet gleaned through a distinct extraction mechanism. During the classifier training phase, these manifold features serve as input. The classifiers are primed to address a spectrum of natural language processing objectives, be it sentiment analysis, text classification, or even named entity recognition. In the realm of prediction, the outputs of these classifiers conjoin through ensemble methodologies, materializing as either majority voting, weighted voting, or stacking. This aggregate decision-making draws upon the comprehensive viewpoints captured by the ensemble feature extraction process. The potency of ensemble feature extraction via Word2Vec burgeons from its ability to synergize the intricate semantic subtleties encapsulated by Word2Vec embeddings with the manifold vantage points fostered by ensemble strategies. This not only augments representation but also fortifies resilience, potentially culminating in heightened model performance and broader applicability. As with any advanced approach, considerations encompass computational demands and the imperative of meticulous hyperparameter calibration to unlock the full potential of this innovative amalgamation. The selection of classifiers and feature extraction techniques in this study was guided by a thoughtful consideration of their efficacy in addressing the complexities of the medical imaging datasets under investigation. AlexNet, VGG-16, and ResNet-50, renowned for their success in image classification tasks, were chosen for their ability to capture intricate features in medical images. Their deep and hierarchical architectures allow for the automatic extraction of relevant features without the need for manual engineering. Additionally, an ensemble of Recurrent Neural Networks (Ens_RNN) was introduced to capture temporal dependencies within the data, an essential consideration in medical time series. The ensemble approach was deemed appropriate to enhance model robustness, leveraging the diversity of the individual models. Regarding ensemble methods, a straightforward averaging approach was chosen for its simplicity and effectiveness in maintaining model diversity. While alternative strategies such as bagging and boosting were considered, the diverse nature of the chosen base models rendered more complex ensemble methods unnecessary. The decision-making process was guided by a desire for a transparent and interpretable methodology. To assess the performance of the models, a comprehensive set of metrics, including accuracy, precision, recall, specificity, false positive rate (FPR), and false negative rate (FNR), was employed. This choice was motivated by the nuanced nature of medical data, where different types of classification errors can have varying consequences. By articulating these methodological choices, this paper aim to provide clarity and transparency in our approach, facilitating a deeper understanding and reproducibility of the results. E. Classification using ensemble RNN: We suggest an ensemble approach that combines the LSTM, Bi-LSTM, and GRU deep learning architectures. LSTM-GRU classifier: This network solves the vanishing gradient issue by adding a second processor, known as a cell, that can judge whether the data is useful or not. Three gates—the input gate f t , the forgetting gate f t , and the output gate o t —are arranged in a cell. The cell functionality are defined as follows: 𝑖 𝑡 = 𝜎 (𝑊 𝑖 [ℎ 𝑡 −1 , 𝑥 𝑡 ] + 𝑏 𝑖 ) (5) 𝑓 𝑡 = 𝜎 (𝑊 𝑓 [ℎ 𝑡 −1 , 𝑥 𝑡 ] + 𝑏 𝑓 ) (6) 𝑞 𝑡 = 𝑔 (𝑊 𝑞 [ℎ 𝑡 −1 , 𝑥 𝑡 ] + 𝑏 𝑞 ) (7) 𝑜 𝑡 = 𝜎 (𝑊 𝑜 [ℎ 𝑡 −1 , 𝑥 𝑡 ] + 𝑏 𝑜 ) (8) 𝑐 𝑡 = 𝑓 𝑡 ⊙ 𝑐 𝑡 −1 + 𝑖 𝑡 ⊙ 𝑞 𝑡 (9) ℎ 𝑡 = 𝑜 𝑡 ⊙ h(𝑐 𝑡 ) (10) 76 Informatica 48 (2024) 71–80 X. Zhang et al. Here, 𝜎 is sigmoid non-linear function, 𝑔 is the tangent non-linear function. 𝑊 𝑖 , 𝑊 𝑓 , 𝑊 𝑞 , 𝑊 𝑜 and 𝑏 𝑖 , 𝑏 𝑓 𝑏 𝑞 . 𝑏 𝑜 , are learnable weights. ⊙ refers element-wise multiplication. 𝑐 𝑡 and 𝑐 𝑡 −1 denotes the cell state at 𝑡 and 𝑡 – 1, ht and ℎ 𝑡 −1 denotes the hidden-state at time 𝑡 𝑎𝑛𝑑 𝑡 – 1, and 𝑡 means the 𝑡 th time step. N. The subsequent neighboring layer receives the concealed vector and the cell state. The first layer’s cells (LSTM/GRU) create hidden vectors with attribute values of 82, while layers 2, 3, and 4 generate hidden vectors with attribute values of 42. Moreover, similar to a conventional NN, we also layered a number of hidden cell (LSTM/GRU) layers one following the other. A dropout layer, which removes 20% of the neuronal information, is present in the outcome of the final layer-4 cell (upper top-right corner). Then, two successively layered dense layers are placed on top of one another. 4 Performance analyses In the context of ensemble-based text classification for spam detection is compared with SVM [14], RF [15], NB [16] with several performance metrics can be utilized to evaluate the effectiveness of the approach. These metrics provide insights into the model’s accuracy, precision, recall, and its ability to handle different aspects of the classification task. • Accuracy: The proportion of correctly classified messages out of the total messages in the dataset. It provides an overall measure of the model’s correctness. • Precision: The proportion of true positive predictions (correctly identified spam) out of all positive predictions (both true positives and false positives). Precision is particularly relevant when the cost of false positives is high. • Recall (Sensitivity): The proportion of true positive predictions out of all actual positive instances. Recall is valuable when the cost of false negatives (missed spam) is a concern. • Specificity: The harmonic mean of precision and recall, providing a balanced measure of a model’s performance. A. Dataset description The SpamDetectionDataset was collected from various online platforms, including social media, emails, and online forums. The dataset was curated to include a diverse range of text messages, encompassing both legitimate content and unsolicited messages commonly known as “spam.” The dataset was compiled for the purpose of developing and evaluating an ensemble-based text classification approach for spam detection. The goal is to create an efficient and accurate model that can differentiate between legitimate and spam messages across different digital communication channels. The dataset comprises a total of 10,000 text messages, with approximately 60% 76abelled as legitimate and 40% 76abelled as spam. Each text message is of varying lengths, representing real-world scenarios. Table3: comparison for accuracy Number of text SVM RF NB Ens_RNN 2000 80 80.2 85.1 97 4000 81.5 82 85.6 98 6000 83 83.2 87 98 8000 83.4 83.8 87.5 98.2 10000 84 84.1 87.8 98.6 Figure 2: Accuracy Comparison Figure 2 illustrates a comprehensive comparison of different methods’ accuracy for spam detection across varying quantities of text samples. Four distinct methods were evaluated: Support Vector Machine (SVM), Random Forest (RF), Naive Bayes (NB), and an Ensemble approach integrating Recurrent Neural Networks (Ens_RNN). Analyzing the data, it becomes apparent that the Ensemble approach utilizing RNN consistently outperforms the other methods in terms of accuracy. Starting with a notably high accuracy of 97% for 2000 text samples, the Ens_RNN method consistently improves its accuracy as the dataset size expands. By the time the dataset comprises 10,000 samples, the Ensemble approach achieves an impressive accuracy of 98.6%. While SVM and RF methods show modest improvements in accuracy as the dataset size increases, the Naive Bayes approach demonstrates a more consistent and notable enhancement. Nevertheless, all these methods fall short of the accuracy achieved by the Ensemble approach with RNN. Table 4: Comparison of precision Number of Text SVM RF NB Ens_RN N 2000 80 83.6 83.4 99.3 4000 80.7 84 83.8 99.1 6000 81 85.3 84.1 99.4 8000 81.5 85.9 84.6 99.5 10000 82.1 86.1 84.9 99.7 0 100 200 2000 4000 6000 8000 10000 Accuarcy No of texts Accuracy SVM RF NB Ens_RNN Ensemble Based Text Classification for Spam Detection… Informatica 48 (2024) 71–80 77 Figure 3: Precision comparison Table 4 provides a clear and concise comparison of precision values attained by different spam detection methods across varying amounts of text samples. Upon analyzing the data, a pattern emerges: the precision values for SVM, RF, and NB remain relatively stable as the dataset size expands. This indicates that these methods maintain a consistent ability to correctly predict positive instances across different sample quantities. However, the Ensemble approach with RNN stands out significantly in terms of precision. Commencing with an impressive precision of 99.3% for 2000 text samples, the Ens_RNN method consistently increases its precision as the dataset size grows. By the time the dataset reaches 10,000 samples, the precision reaches an extraordinary 99.7%. Table5: Comparison of recall Number of Text SVM RF NB Ens_RNN 2000 80.4 79.2 85.9 99.5 4000 80.9 79.6 86.1 99.6 6000 81.2 80.4 86.3 99.3 8000 81.6 80.9 86.9 99.4 10000 81.9 81.2 87.1 99.8 Figure 4: Recall comparison The comparison reveals that SVM, RF, and NB consistently capture a reasonable proportion of true positives (spam messages) across different sample sizes. However, the Ensemble with RNN outperforms all others. It begins with an impressive recall of 99.5% for 2000 samples and maintains this exceptional performance, peaking at 99.8% for 10,000 samples. This highlights the Ens_RNN’s strong ability to consistently identify and classify spam messages. By combining ensemble techniques with advanced neural networks, this approach proves to be a reliable solution for achieving high recall rates in spam detection scenarios. Table6: Comparison of specificity Number of Text SVM RF NB Ens_RNN 2000 80.6 79.1 84.1 98.8 4000 80.9 79.5 84.5 98.9 6000 81.3 80.4 85.4 98.8 8000 81.6 80.9 85.9 98.7 10000 81.9 81.1 86.2 98.9 Figure 5: Comparison of specificity The specificity values for SVM, RF, and NB methods exhibit a consistent trend as the dataset size increases. SVM maintains specificity levels between 80.6% and 81.9%, RF ranges from 79.1% to 81.1%, and NB gradually improves from 84.1% to 86.2%. These methods showcase their reliability in accurately identifying legitimate messages within the dataset. Notably, the Ensemble approach utilizing Recurrent Neural Networks (RNN) stands out with consistently high specificity values. It commences with an impressive 98.8% specificity for 2000 samples and maintains this elevated performance, reaching 98.9% for 10000 samples. This emphasizes its capability to consistently and accurately classify legitimate messages, irrespective of dataset size. Table 7: Comparison of FPR Number of Text AlexNet VGG- 16 Resnet- 50 Ens_RNN 2000 0.54 0.34 0.017 0.005 4000 0.28 0.36 0.018 0.006 6000 0.30 0.37 0.14 0.004 8000 0.32 0.40 0.11 0.005 10000 0.34 0.44 0.13 0.006 0 50 100 150 2000 4000 6000 8000 10000 Precison No of texts Precision SVM RF NB Ens_RNN 0 50 100 150 2000 4000 6000 8000 10000 Recall No of Texts Recall SVM RF NB Ens_RNN 0 50 100 150 2000 4000 6000 8000 10000 Specificity (%) No of texts Specificity SVM RF NB Ens_RNN 78 Informatica 48 (2024) 71–80 X. Zhang et al. Figure 6: Comparison of FPR The FPR values for AlexNet, VGG-16, and Resnet-50 generally show an increasing trend as the dataset size expands. This indicates a higher rate of falsely predicting non-spam messages as spam as the dataset becomes larger. In contrast, the Ensemble approach with RNN (Ens_RNN) consistently maintains low FPR values. Starting with a notably low FPR of 0.005 for 2000 samples, Ens_RNN demonstrates an ability to effectively reduce false positives, even as the dataset size grows. Table 8: Comparison of FNR Number of Texts AlexNet VGG- 16 Resnet- 50 Ens_RNN 100 0.13 0.21 0.10 0.0020 200 0.15 0.22 0.11 0.0019 300 0.18 0.23 0.13 0.0021 400 0.20 0.24 0.14 0.0018 500 0.21 0.25 0.16 0.0019 Figure 7: Comparison of FNR For AlexNet, VGG-16, and Resnet-50, the FNR values show a gradual increase as the number of training epoch’s progresses. This suggests that these methods tend to miss more actual spam messages as the training continues. In contrast, the Ensemble approach with RNN (Ens_RNN) consistently maintains low FNR values throughout the training process. Starting with an already low FNR of 0.0020 for 100 epochs, Ens_RNN showcases an ability to effectively minimize the number of actual spam messages that are misclassified as non-spam. The comparison highlights the superior FNR performance of the Ens_RNN approach. While other methods experience an increasing trend in misclassifying actual spam messages, Ens_RNN consistently maintains a low FNR. Table 9 provides a comprehensive overview of the overall comparative analysis of different methods across three distinct datasets. The effectiveness of four classifiers—AlexNet, VGG-16, ResNet-50, and Ens_RNN—is evaluated based on key performance metrics, including accuracy, precision, recall, specificity, false positive rate (FPR), and false negative rate (FNR). Across all datasets, Ens_RNN consistently outperforms Table 9: Overall comparative analysis Dataset Method Accuracy (%) Precision (%) Recall (%) Specificity (%) FPR FNR Dataset 1 AlexNet 85.6 84.9 86.5 83.7 16.3 13.5 VGG-16 87.2 86.5 87.8 85.3 14.7 12.2 ResNet-50 88.5 87.9 88.9 87.1 11.5 11.1 Ens_RNN 94.7 94.2 95.1 93.8 6.2 4.9 Dataset 2 AlexNet 83.9 83.2 84.6 82.1 17.9 15.4 VGG-16 86.1 85.7 86.9 84.3 15.7 13.1 ResNet-50 87.8 87.3 88.5 86.2 12.2 11.5 Ens_RNN 92.3 91.8 92.7 90.9 9.1 7.3 Dataset 3 AlexNet 85.2 84.5 86.1 83.4 16.6 13.9 VGG-16 87.6 87.1 88.2 85.8 14.2 11.8 ResNet-50 89.2 88.7 89.9 87.6 10.8 10.1 Ens_RNN 95.1 94.7 95.5 94.2 5.8 4.5 0 0,1 0,2 0,3 0,4 0,5 0,6 2000 4000 6000 8000 10000 FPR Number of Texts FPR AlexNet VGG-16 Resnet-50 Ens_RNN 0 0,1 0,2 0,3 100 200 300 400 500 FNR Number of Texts FNR AlexNet VGG-16 Resnet-50 Ens_RNN Ensemble Based Text Classification for Spam Detection… Informatica 48 (2024) 71–80 79 individual classifiers in terms of accuracy, precision, recall, and specificity. Notably, on Dataset 1, Ens_RNN achieves an impressive accuracy of 94.7%, showcasing its ability to provide highly accurate predictions. This superior performance is also evident in its precision, recall, and specificity metrics, where it consistently surpasses the other methods. On Dataset 2 and Dataset 3, Ens_RNN continues to demonstrate strong performance, achieving accuracy levels of 92.3% and 95.1%, respectively. This highlights the robustness of the ensemble approach across diverse datasets. Moreover, Ens_RNN consistently maintains lower false positive rates (FPR) and false negative rates (FNR), indicating its effectiveness in minimizing both types of classification errors. While individual classifiers, such as AlexNet, VGG-16, and ResNet-50, exhibit competitive results, the ensemble approach consistently provides a more balanced and reliable performance across multiple evaluation metrics. The table underscores the potential of ensemble methods, particularly Ens_RNN, in improving the overall effectiveness of the classification task across different datasets. The nuanced analysis of these metrics allows for a comprehensive understanding of the strengths and limitations of each method, guiding the selection of the most suitable approach for specific applications. B. Discussions The ensemble-based text classification approach employing Recurrent Neural Networks (Ens_RNN) stands out as a compelling and superior solution for spam detection when compared to traditional methods such as Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB). The comprehensive evaluation of performance metrics across varying dataset sizes provides valuable insights into the distinct advantages of Ens_RNN. Beginning with accuracy, Ens_RNN exhibits a remarkable starting point of 97% accuracy for 2000 samples, which steadily ascends to an impressive 98.6% for 10,000 samples. This consistent improvement highlights the ensemble's capacity to adapt and enhance its discriminative power as the dataset expands. The precision values attained by Ens_RNN are nothing short of extraordinary, starting at 99.3% for 2000 samples and consistently increasing to an exceptional 99.7% for 10,000 samples. This reflects Ens_RNN's exceptional ability to correctly identify and label positive instances, showcasing its superiority over SVM, RF, and NB, which exhibit relatively stable precision levels. Moving on to recall, Ens_RNN consistently outperforms other methods, starting with an impressive 99.5% for 2000 samples and reaching an outstanding 99.8% for 10,000 samples. This demonstrates Ens_RNN's consistent and robust ability to capture a significant proportion of true positive predictions across different dataset sizes. The specificity values for Ens_RNN are consistently high, starting at an impressive 98.8% for 2000 samples and maintaining this elevated performance at 98.9% for 10,000 samples. In contrast, SVM, RF, and NB show reliability in accurately identifying legitimate messages within the dataset but at a lower specificity level. When examining false positive rates (FPR), Ens_RNN consistently maintains low FPR values, indicating its effectiveness in reducing false positives even as the dataset size grows. This is particularly noteworthy as traditional methods, represented by AlexNet, VGG-16, and Resnet-50, exhibit an increasing trend in FPR, implying a higher rate of falsely predicting non-spam messages as spam with larger datasets. Furthermore, the evaluation of false negative rates (FNR) emphasizes Ens_RNN's consistent ability to minimize misclassifications of actual spam messages. While traditional methods like AlexNet, VGG-16, and Resnet- 50 experience a gradual increase in FNR, indicating a tendency to miss more actual spam messages as the training progresses, Ens_RNN maintains consistently low FNR values. The ensemble-based text classification approach with Recurrent Neural Networks (Ens_RNN) not only demonstrates superior accuracy, precision, recall, and specificity but also excels in minimizing false positives and false negatives. Its consistent outperformance across various performance metrics, particularly in the context of spam detection, positions Ens_RNN as a robust and reliable solution capable of enhancing the efficiency and accuracy of spam detection across diverse digital communication channels. The ensemble-based text classification approach, utilizing Recurrent Neural Networks (Ens_RNN), exhibits significant superiority over traditional methods—Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB)—in the context of spam detection. Across varying dataset sizes, Ens_RNN consistently outperforms its counterparts, achieving remarkable accuracy, precision, recall, specificity, and maintaining low false positive and false negative rates. The accuracy comparison (Table 3, Fig 2) reveals Ens_RNN's exceptional performance, starting with a high accuracy of 97% for 2000 samples and steadily improving to an impressive 98.6% for 10,000 samples. Precision values (Table 4, Fig 3) showcase Ens_RNN's dominance, reaching an extraordinary 99.7% for 10,000 samples, while SVM, RF, and NB maintain relatively stable precision levels. Ens_RNN's recall rates (Table 5, Fig 4) consistently outshine other methods, emphasizing its strong ability to identify and classify spam messages effectively. Specificity values (Table 6, Fig 5) further highlight Ens_RNN's reliability in accurately classifying legitimate messages, starting with an impressive 98.8% for 2000 samples and maintaining this elevated performance. The comparison of false positive rates (FPR) (Table 7, Fig 6) underscores Ens_RNN's capability to reduce false positives, contrasting with an increasing trend in FPR for other methods. Additionally, the analysis of false negative rates (FNR) (Table 8, Fig 7) accentuates Ens_RNN's consistency in minimizing misclassifications of actual spam messages. In summary, Ens_RNN emerges as a robust and effective solution for spam detection, consistently outperforming traditional methods across multiple performance metrics, thereby affirming its potential in enhancing the reliability and 80 Informatica 48 (2024) 71–80 X. Zhang et al. efficiency of spam detection in diverse digital communication channels. 5 Conclusions This research has introduced and demonstrated the efficacy of an ensemble-based approach for tackling the persistent and escalating challenge of spam detection in digital communication. As the online landscape continues to expand, the need for effective information filtering systems to safeguard security and optimize efficiency becomes increasingly critical. By focusing on three key components - feature extraction, classifier selection, and decision fusion - this approach has showcased a comprehensive and innovative strategy. Leveraging word embedding techniques, text messages are adeptly represented, forming the foundation for subsequent analysis. The meticulous evaluation of multiple classifiers, including advanced RNN models like LSTM and GRU, has enabled the identification of optimal performers. The culmination of these classifiers into an ensemble model capitalizes on their strengths, resulting in elevated accuracy, precision, and recall for spam detection. Through extensive experimentation and benchmarking on widely accepted datasets, the approach's robustness and applicability have been established. The ensemble-based technique consistently outperforms individual classifiers, offering a pragmatic solution to the challenge of spam messages. By seamlessly integrating this approach into existing spam filtering systems, a ripple effect of positive outcomes is anticipated. Enhanced online communication quality, improved user experiences, and heightened cyber security are all foreseeable benefits. As a collective result, the digital landscape stands to be significantly fortified against the intrusive and disruptive impact of spam. In a world where digital communication is central, the demonstrated effectiveness of this ensemble-based approach signifies a promising step towards safer, more efficient, and user-centric online interactions. Future work in this domain may further refine and extend the approach, continuing to bolster the fight against the ever- evolving threat of spam. References [1] B. P. Yadav, S. Ghate, A. Harshavardhan, G. Jhansi, K.S. Kumar and E. Sudarshan. Text categorization Performance examination Using Machine Learning Algorithms. In IOP Conference Series: Materials Science and Engineering. 981(2):022044, 2022. DOI 10.1088/1757-899X/981/2/022044 [2] S. Wang, J. Cai, Q. Lin and W. Guo. An overview of unsupervised deep feature representation for text categorization. IEEE Transactions on Computational Social Systems. 6(3):504-517, 2019. DOI: 10.1109/TCSS.2019.2910599 [3] M.Belazzoug, M. Touahria, F. Nouioua and M. Brahimi. An improved sine cosine algorithm to select features for text categorization. Journal of King Saud University-Computer and Information Sciences. 32(4):454-464, 2020. DOI: 10.1016/j.jksuci.2019.07.003 [4] H. A. Almuzaini and A.M. Azmi. Impact of stemming and word embedding on deep learning-based Arabic text categorization. IEEE Access. 8:127913-127928, 2020. DOI: 10.1109/ACCESS.2020.3009217 [5] J. Lee, I. Yu, J. Park and D.W. Kim. Memetic feature selection for multilabel text categorization using label frequency difference. Information Sciences. 485: 263- 280, 2019. https://doi.org/10.1016/j.ins.2019.02.021 [6] S. W. Chen, Y. W. Chen and C.P. Wei. Deep learning- based text classification: A comprehensive review. Journal of Computer Science and Technology. 35(1):143-165, 2020. DOI:10.1145/3439726 [7] J. Devlin, M.W. Chang, K. Lee and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT. 2019. DOI:10.18653/v1/N19-1423 [8] B.B. Gupta and D. Soni. Detecting malicious URLs using machine learning algorithms: A comparative study. International Journal of Advanced Computer Science and Applications, 11(9): 185-191, 2020. [9] M.J.A. Maatuk and H.A. Abbass. Spam detection in online social networks: A survey. IEEE Access. 8:189095-189105, 2020. DOI:10.14419/ijet. v7i2.7.10896 [10] A.K. Singh and S.K. Singh. Text classification using ensemble methods: A survey. Procedia Computer Science. 132:1095-1102, 2018. https://doi.org/10.3390/info10040150 [11] Z. Zhou and H. Wu. Ensemble methods in machine learning: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 50(5):1774-1792, 2020. DOI: 10.1109/ACCESS.2022.3207287 [12] B. Al-Salemi, M. Ayob, G. Kendall and S.A.M. Noah. Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms. Information Processing & Management. 56(1):212-227, 2019. https://doi.org/10.1016/j.ipm.2018.09.008 [13] G.T. Berge, O.C. Granmo, T.O. Tveit, M. Goodwin, L. Jiao and B. Matheussen. Using the Tsetlin machine to learn human-interpretable rules for high-accuracy text categorization with medical applications. IEEE Access. 7:115134-115146, 2019. DOI: 10.1109/ACCESS.2019.2935416 [14] Z.H. Kilimci and S. AkyokuşS. The analysis of text categorization represented with word embeddings using homogeneous classifiers. In 2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), 1-6, 2019. DOI:10.1007/s13748-021-00247-1. [15] W. Cherif, A. Madani and M. Kissi. Text categorization based on a new classification by thresholds. Progress in Artificial Intelligence, 10(4):433-447, 2021. DOI:10.1007/s13748-021-00247-1. DOI:10.1007/s13748-021-00247-1.