https://doi.org/10.31449/inf.v46i7.4280 Informatica 46 (2022) 131-144 131 A Multi-label Classification of Disaster-Related Tweets with Enhanced Word Embedding Ensemble Convolutional Neural Network Model E. Arathi 1 *, S.Sasikala 1 E-mail: aarthi.devpal@gmail.com, sasikalarams@gmail.com 1 Computer Science, IDE, University of Madras Chennai, Tamil Nadu, India. * Corresponding author Keywords: Twitter data classification, Social Media, Convolutional Neural Network (CNN), Ensemble Deep Learning, Embeddings from Language Models (ELMo), Recurrent Neural Network (RNN), multi-label classification. Received: August 8, 2022 Abstract: Recently, adopting machine learning techniques to automate the identification and classification of event-related tweets has been beneficial in times of crisis. Word embeddings are the most effective word vectors for NLP processing using deep learning classifiers. This research proposes a novel method with the Enhanced Embedding from Language Model (EnELMo) for classifying tweets as different categories with higher classification accuracy and precision for the rapid rescue action in the disaster scenario. The proposed Enhanced Word Embedding Ensemble Convolutional Neural Network Model(EWECNN) method comprises an Enhanced ELMo module with crisis lexicon to create Crisis word vectors, a Novel ELMo-CNN Architecture module for feature extraction (ECA), and an effective multi-label classification of text using Crisis Word Vector specific CNN-RNN(CWV-CRNN) stacks. These functional modules are intended to improve the classification. Among the various approaches discussed, the proposed method outperforms the classification of microblog texts with an accuracy of 93.46 percent and the F1-Score of 92.99 percent for multi-classification of tweets, which is higher than other methods discussed in this study. The proposed multi-label classification of disaster-related text facilitates faster rescue action in a crisis scenario. Povzetek: Predstavljena je nova metoda za iskanje uspešnega reševanja v primeru katastrof na osnovi analiz socialnih omrežij z ansamblom nevronskih mrež (EWECNN). 1 Introduction Disasters have an impact on societies all over the world because they can strike without warning and cause massive damage. The reaction and recovery of communities during all disaster phases rely on their level of preparedness [1]. During a crisis, those affected, authorities, and volunteers seek actionable information to aid in damage restoration and rescue operations. Timely and accurate information is critical for humanitarian and government efforts to save lives and gain access to those affected [2]. The phenomenal utilization and additional accomplishments make the social media platform widely used in crisis dissemination. Recent advances in Artificial Intelligence have resulted in the development of machine learning techniques to aid in automating Twitter data multi- classification. Text classification is the process of automatically assigning tags to predefined classes based on the content of texts. Different types of tweets are posted during a disaster, such as rumors and spam, emotional information, and other types of data that need to be classified by NLP. In conventional methods, different features are hand-crafted to classify tweets of a specific type in times of crisis. Various deep learning neural network algorithms, such as CNN and RNN, achieve high accuracy in crisis-related multi- classification challenges [4][5][6]. However, more research and inventions are required to improve social media text data classifications. Data collection, preprocessing, feature selection, Construction of a new classification methodology, Training, hyper-parameter fine-tuning, and evaluation of the proposed method are the customary phases of a text classification system [7]. The microblog messages are categorized under eight categories: Disaster kind, Location, Dead and Injured People, Help Request, Infrastructure damage, Search and Rescue, Weather-related information, and Non-relevant information to reach victims of disasters more quickly and efficiently. Using a context-specific Language Model and domain-specific ensemble CRNN, a new multi-label text classifier is developed in this work. Extensive research is conducted for the multi-label categorization of tweets with standard CrisisNLP datasets of different 132 Informatica 46 (2022) 131–144 E. Arathi et al. disaster events, evaluated using performance metrics. This study compares various multi-classification approaches with the proposed method. This paper is structured as follows: The second section covers relevant studies on classifying Twitter messages for various goals. The third section presents an enELMo embedding model employing the ensemble CNN-RNN stack classifier. The experiment is described in section four. The fifth section elaborates on the model's results and analysis of its performance. Finally, Section 6 concludes the paper. 2 Related Works 2.1 Crisis-related Tweets Classification Recent studies have shown the increasing importance of social media during emergencies and how broadcasting information via social media can improve situational awareness during a crisis. These authors [8,9,10] were the first to look into and analyze the use of microblogs and information lifecycles during crisis scenarios. They studied the behavior of microbloggers by conducting a qualitative analysis of tweets published during a flooding incident. The authors of [11] employed the bags-of- words method to locate a crisis's data. Various grammatical elements, like Parts-Of-Speech (POS), are influenced by the vocabulary used in Twitter posts. In the proposed work, an Artificial Intelligence Disaster Response (AIDR) system was developed using uni-gram and bi-gram features to classify crisis-related information during a disaster. The authors of [12] used the informative terms from tweets as characteristics to recognize resource tweets during a disaster. Based on low-level features of lexical and syntactic characteristics, models for disaster situational awareness were developed in [13,14]. The process of automatically extracting information from tweets has brought much attention to word embedding. The work proposed in [15, 16] used CNN with word embedding and demonstrated superior performance to hand-crafted features in disaster-related tasks. In the study [17,18], CNN and MLP-CNN with word embedding are used to categorize the data linked to crises. The skip-gram model of the word2vec tool was applied [19, 20, 21] to extract information from an extensive corpus consisting of almost 57,908 tweets. During the disaster event, the authors of [22] developed a deep learning model using the word embedding to detect informative tweets related to the catastrophe for speedier actions. 2.2 Multi-Label Classifications of tweets A set of comparable multi-label text classification methods are taken here to juxtapose the performance of the proposed method. They are, A new big data approach for topic classification and sentiment analysis of Twitter data (BDACSA)[20], A pattern-based approach for multi-class sentiment analysis in Twitter(PAMSA) [23], and improved classification of crisis-related data on Twitter using contextual representations (CRICCD [24], Multi-level aspect-based sentiment classification of Twitter data: using the hybrid approach in deep learning (MASC) [25] and Sentiment classification from multi- class imbalanced Twitter data using binarization (BMSC) [26]. This section confers a limpid analysis of the methodologies used and the advantages and limitations of these existing methods. The Naive Bayes classifier (HL-NBC) proposed by the authors of [27] can be used to classify microblog text into several categories and filter out irrelevant tweets. This experiment tested the Naive Bayes classifier model with Lexicon, unigram, and bigram features. The HL-NBC method improves sentiment classification and achieves an accuracy of 82 percent; however, processing time and cross-lingual categorization are limitations, and the context of the text cannot be found effectively using this approach. SENTA, a user-friendly tool for classifying text from microblogging websites into seven distinct sentimental categories, was presented in this paper [28]. A customizable feature selection option was designed to extract the features such as Sentiment features, Punctuation features, Syntax/Stylistic Features, and semantic features to preprocess the text for classification. Compared to the text's binary and ternary classification, this model gives a multi-class classification rate of 60.2%, while the binary and ternary classification rates are both 80.1%. To the advantage of this work, there is an abundance of configurable feature classifications. However, this method's average accuracy for multi-class classification is a noted limitation. The authors of [29] suggested that the intense classifier with ELMo embedding's model achieved a higher metric in terms of performance than the regular Support Vector Machine (SVM) for the multi- classification of disasters-related tweets. To arrive at this conclusion, the researchers used datasets such as those gathered from multiple sources like CrisisNLP, CrisisLex, and AIDR Twitter standard datasets from the earthquakes in Nepal and California and Typhoon Hagupit. An embedding layer can generate word vectors from input text. The dense classifier then accurately predicts the crisis and other tweet texts by processing the word vectors at 82.3 percent. The drawbacks are the necessity of contextual feature selection and better processing time. Using a new classifier MuLeHyABSC and a feature ranking process, this study [30] claims to perform classification of Twitter data based on multi-level aspect- based classification. MASC employs an Artificial Neural Network Multilayer Perceptron (ANMP) to improve classification performance. Several existing machine learning classifiers are combined with MuLeHyABSC. The evaluation process used the benchmark datasets STC, TAS, FGD, ATC, and STS. the performance of the MASC method secured higher scores in terms of Accuracy, Precision, Recall, and F-Score. Higher classification accuracy and precision are the proven advantages of this method. At the same time, the increased processing time is the limitation of the MASC A Multi-label Classification of Disaster-Related Tweets… Informatica 46 (2022) 131-144 133 method. The higher processing time makes the application MASC lag in the real-time classification processes. The authors of [31] presented the proposed methodology as one-vs-one binary decomposition and dimension reduction. In addition to Weighted multi-class reconstruction, a stable preprocessing method was introduced to detect minority classes with the MBSC method. SemEval2016 Message Polarity classification dataset was used in the MBSC method. The modules of MBSC were tested with the standard baseline classifiers. The performance metric of MBSC work was measured in terms of Geometric Mean for multi-class classification. Better classification performance is the advantage of this method. Table 1 provides a summary of the related methodologies, their limitations, and the benefits of the existing methods. Table 1: An outline of the methodologies, advantages, and limitations of the existing methods Author Work Methodology Accuracy % Advantages Limitations Anisha. Rodrigues et.al. A new big data approach for topic classification and sentiment analysis of Twitter data[20] Hybrid Lexicon- Naïve Bayesian Classifier 82 Processing time Moderate Accuracy Mondher Bouaziz et.al. A pattern-based approach for multi-class sentiment analysis in Twitter .[21] Tokenization, Lemmatization, and Generating negation vectors 80.1 Highly configurable Average Accuracy Sreenivasulu Madichetty et.al Improved classification of crisis-related data on Twitter using contextual representations[22] CNN dense classifier with ELMo embedding for feature extraction. 82.3 Better Accuracy Higher Processing Time Sadaf Hussain Janjua et.al. Multi-level aspect-based sentiment classification of Twitter data: using a hybrid approach in deep learning[23] Artificial Neural Network Multilayer Perceptron 80.38 Higher Accuracy Higher Processing Time Bartosz Krawczyk et.al. Sentiment classification from multi-class imbalanced Twitter data using binarization[24] one-vs-one binary decomposition and dimension reduction 66.36 (G-mean) Average metrics for the classification Higher Processing Time The above methods discussed are based on different features of the tweet text, such as unigram. Bigram. N-gram features, Sentiment features, Punctuation features, Syntactic/stylistic features, and semantic features are used for the classification. Various feature extraction studies show that word embedding based on language models is more contextual than the other methods. The above studies reveal that the deep learning models outperform the baseline classifiers in terms of accuracy and processing time, as well as the need for a better classifier for effective multi-label classification in the disaster scenario. The proposed model works on the text as the feature using Crisis word vector-specific ELMo embedding with ensemble CNN classifier for the multi-classification of the microblog text, which creates contextualized feature vector and feature map, resulting in improved performance of the classifier. 3 Proposed Methodology The proposed EWECNN method was devised with three underlying functional modules. Enhanced-ELMo handles disaster-related vectors; Enhanced ELMo-CNN Architecture extracts context-related features, and Crisis Word vector-specific CNN and RNN stacks classify text into multiple classes and significantly improve this multi- label classification. Instead of converting text as word vectors and running the algorithm, this model focuses on creating context-related, disaster word-specific vectors as a whole sentence rather than single words. This section describes the fundamental functional building components in further detail. Figure I displays the whole flowchart for the EWECNN approach. The algorithm I: EWECNN algorithm explains the methodology in a stepwise procedure. 134 Informatica 46 (2022) 131–144 E. Arathi et al. Algorithm I : EWECNN Algorithm The proposed EWECNN method for multi-label classification of tweets with crisisNLP dataset Algorithm I : EWECNN(dataset d) Begin // input layer- creating word embeddings using context based language model . load dataset d create training, validation, and test sets of data For each tweet t in d begin clean_tweet<-preprocess_tweet(t) feature_vector <- enELMo(clean_tweet) end // enELMo-CNN layer feature_map = ECA(feature_vector) // Multi-label classification with ensemble CNN and RNN. z <- CWV_CRNN(feature_map) labels<-output(z) end 3.1 Enhanced ELMo to generate crisis word vectors (EECWV) This module is the model's input layer, which adds the disaster's context to the word vector for categorization, then transferred to the following phase. For the convolutional net, vectors are generated using embeddings. ELMo can build a contextually rich representation of words and dynamically alter the representation of words, hence resolving polysemous words. A deep bidirectional language model (biLM) is pre-trained on a large crisis lexicon corpus[32] to generate these word vectors. Adding these vectors above each input word for each end task significantly improves the algorithm's performance. enELMo representations are derived from crisis-related word vectors using the RLSTM encoder. The ELMo parses a sequence of word vectors at a time to eliminate the ambiguous meaning which misleads in classification. This context-biased smart purport selection is the primary reason to designate the enELMo model here. The steps are listed as an algorithm II – enELMO algorithm. Algorithm II: enELMO algorithm This algorithm process the feature vector of the given dataset using enhanced ELMo Algorithm enELMo(preprocess_tweet) Begin rlstm<-lstm(crisis_word_vector) using equation(4) crisis_based_embedding<-ELMo(rlstm) using equation(3) feature_vector