https://doi.or g/10.31449/inf.v47i1.3510 Informatica 47 (2023) 109–1 14 109 A Multi-channel Convolutional Neural Netw ork for Multilabel Sentiment Classification Using Abilify Oral User Reviews T ina Esther T rueman 1 , Ashok Kumar Jayaraman ∗ , 2 , Jasmine S 2 , Gayathri Ananthakrishnan 3 and Narayanasamy P 4 1 Department of Computer Science, University of the People, Pasadena, United States 2 Department of Information Science and T echnology , Anna University , Chennai, India 3 Department of Information T echnology , V ellore Institute of T echnology , V ellore, India 4 Department of Electrical and Electronics Engineering, PSG College of T echnology , Coimbatore, India E-mail: tina.trueman@uopeople.edu, jashokkumar83@auist.net, jasminemtech7@gmail.com, gayathri.a@vit.ac.in, drp- nsam@gmail.com ∗ Corresponding author Keywords: Multilabel classification, sentiment classification, multichannel convolutional neural network, abilify user reviews Received: April 13, 2021 Nowadays, patients and car egivers have become very active in social media. They ar e sharing a lot of information about their medication and drugs in terms of posts or comments. Ther efor e, sentiment analysis plays an active r ole to compute those posts or comments. However , each post is associated with multilabel such as ease of use, effectiveness, and satisfaction. T o solve this kind of pr oblem, we pr opose a multichannel convolution neural network for multilabel sentiment classification using Abilify oral user comments. The multichannel r epr esents the multiple versions of the standar d model with differ ent strides. Specifically , we use the pr e-trained model to generate wor d vectors. The pr oposed model is evaluated with multilabel metrics. The r esults indicate that the pr oposed multichannel convolutional network model outperforms the traditional machine learning algorithms. Povzetek: Razvita je konvolucijska mr eža za pr eučevanje izmenjav mnenj o boleznih na socialnih omr ežjih. 1 Intr oduction Social media has become an active part of drugs and med- ication users. They share the advantage or disadvantages of their medication and drugs. This information may give some insightful information about the reaction of the drug. Therefore, sentiment analysis plays a wide role to compute the opinions of drug users and caregivers. The sentiment analysis can be performed at the document level, sentence level, or aspect level [1, 2]. The document and sentence level computes the overall opinion. But, the aspect level computes opinion at a specific tar get or an entity . In this paper , we aim to focus on aspect level sentiment. A com- ment may be associated with a single label or multilabel [3]. The single label problem has only one label. However , It has two classification methods namely , binary classifica- tion or multiclass classification [4]. The binary classifica- tion problem belongs to a binary set such as true and false or positive and negative. The multiclass classification prob- lem belongs to a set of more than two elements such as pos- itive, neutral, and negative. In these problems, algorithms assign only one label to comment or instance. Multilabel classification problem belongs to a set of multiple tar get la- bels where each label maybe belongs to a binary class or multiclass. T raditionally , the multilabel classification problems are solved using problem transformation, adapted algorithms, and ensemble learnings [3]. The problem transformation problem is further solved using the binary relevance, clas- sifier chain, and label powerset methods [5, 6]. How- ever , these methods use the traditional bag of words (BoW) method to represent features. These features fail to rep- resent semantic meaning between words. Therefore, deep learning models are proposed to capture the semantic mean- ing between words in the input sequence. It is also proven that they outperform in many tasks such as image classifica- tion, text classification, etc [7, 8, 9]. In this paper , we pro- pose a multichannel convolution neural network for multi- label sentiment classification using Abilify oral user com- ments. The multichannel model represents the multiple ver - sions of the standard model with dif ferent strides. Particu- larly , we use the GloV e pre-trained model [10] to gener - ate word vectors. W e then evaluate the proposed multilabel metrics. This paper is or ganized as follows. Section 2 briefly de- scribes the related works. The proposed multichannel con- volutional neural network for multilabel sentiment classifi- cation is presented in Section 3. In Section 4, the results and their comparison is presented. Finally , Section 5 concludes the paper . 1 10 Informatica 47 (2023) 109–1 14 T .E. T rueman et al. 2 Related works In recent years, researchers widely studied clinical text and user text using natural language processing (NLP). They used both machine learning and deep learning to solve their problems. In this paper , we present the existing works on biomedical texts. Baumel et al. [1 1] investigated four mod- els such as SVM, CNN, CBOW , and hierarchical attention- based recurrent neural network models for the extreme mul- tilabel task using the MIMIC datasets. The authors indi- cated that the hierarchical attention-based recurrent neural network model achieves a 55.86% F1 score. W ang et al. [12] developed a rule-based algorithm to generate labels that are weakly supervised. Then, the authors used the pre- trained word embeddings to represent deep features. They employed SVM, random forest, MLPNN, and CNN algo- rithms. Their study indicated that the CNN model achieves the best performance score. Singh et al. [13] developed an attentive neural tree decoding model for tagging structured bio-medical texts with multilabel. This method decodes an input sequence into a tree of labels. The authors suggested that the proposed model outperforms on SOT A (sate-of-the art) approaches with biomedical abstracts. Citrome [14] re- viewed the treatment of Abilify oral users with bipolar I dis- order and schizophrenia. The author indicated that the tol- erability of Abilify with schizophrenia appears superior to haloperidol, risperidone, and perphenazine. Rios et al. [15] demonstrated the biomedical text classification task using CNN. They indicated that they achieved a 3% improvement over the SOT A results. Moreover , Gar giulo et al. [16] presented a deep neu- ral network (DNN) for extreme multilabel and multiclass text classification tasks. The authors used two models: the first one uses a word embedding with two dense layers, and the second uses the convolution, word embedding, and the dense layers. Kolesov et al. [17] performed multilabel classification on incompletely labeled biomedical texts us- ing the SVM and RF . They used soft supervised learning and weighted k-nearest neighbor algorithms for modifying the training set. Their study indicated that both algorithms perform better . Parwez et al. [18] presented the CNN model for multilabel text classification. The authors used the domain-specific and generic based pre-trained model to predict class labels. In summary , the above authors used SVM, NB, RF , and CNN to perform multilabel classifica- tion tasks on various biomedical texts (T able 1). In this paper , we propose multichannel convolutional neural net- work for multilabel sentiment classification using Abilify oral user comments. 3 The pr oposed method In this section, we present a multichannel convolutional neural network for a multilabel sentiment classification model using Abilify oral user comments. The system archi- tecture is shown in Fig.1. It includes data pre-processing, word embedding, multichannel CNN, mer ge layer , fully connected layer , and an output layer . Each of these pro- cesses is explained as follows. 3.1 Abilify oral dataset W e obtained this Abilify oral dataset from the IEEE Data- port [19, 20]. It contains 1722 user comments with their age group, gender , treatment condition, patient type, treatment duration, and labeled sentiment on satisfaction, ef fective- ness, and ease of use. 3.2 Pr e-pr ocessing The dataset is converted from upper case to lower case, re- moved punctuation lists and stop words, and retained the numbers where it describes the drugs in grams. Then, each instance is split into separate words using the tokenization method. 3.3 Multichannel convolutional neural network The multichannel convolutional neural network represents the multiple version of the standard convolutional neural network model with dif ferent sizes of kernels. This rep- resentation allows the instance or document to process in dif ferent n-grams such as 4-gram, 6-gram, and 8-grams at the same time [22]. In particular , we define the standard convolutional neural network model with a word embed- ding layer , one-dimensional convolutional layer , dropout layer , max-pooling, and flatten layer . This standard ver - sion is defined with three channels for dif ferent n-grams. Each component of the channel is explained as follows. 3.3.1 W ord embedding In NLP , word embedding represents a feature learning tech- nique where it maps the vocabulary of words or phrases into a vector space. Specifically , we use the GloV e word em- bedding [10] technique to generate word vectors in a fixed dimension with the semantic relationship between words. 3.3.2 Convolutional layer Convolutional neural networks perform well in image clas- sification and computer vision-related tasks. The convolu- tional layer is an important part of the convolutional neu- ral network. It slides over an input sequence with a fixed kernel size to generate feature maps [15, 16, 18, 22, 23]. In this work, we use one-dimensional convolutional layers to move the kernel in one direction. This layer is mostly used to perform NLP tasks. The input and output of the 1D convolutional layer are 2D. The convoluted feature maps output the maximum, minimum, or average values using pooling layers. A multi-channel convolutional neural network for… Informatica 47 (2023) 109–1 14 1 1 1 Authors Dataset Models Accuracy Key Findings Baumel et al. [9] MIMIC Datasets HA-GRU 55.86% Classification of patient notes on ICD code assignment W ang et al. [10] Mayo Clinic smoking status CNN 92.00% A rule-based algorithm to generate labels that are weakly supervised Singh et al. [1 1] Articles describing randomized controlled trials NTD-s 32.70% An attentive neural tree decoding model for tagging structured bio-medical texts with multilabel Rios et al. [13] MED-LINE Citations CNN-V ote2 64.69% Biomedical text classification Gar giulo et al. [14] PubMed Dataset CNN-Dense 20.15% Extreme multilabel and multiclass text classification tasks Kolesov et al. [15] AgingPortfolio Dataset SVM 30.59% Multilabel classification on incompletely labeled biomedical texts Parwez et al. [16] T weets dataset CNN-PubMed 94.12% Domain-specific and generic based pre-trained model to predict class labels T able 1: Summary of the related works. Abilify Reviews Pre-processing Word Embedding CNN Channel 3 CNN Channel 2 CNN Channel 1 Merge layer Dense layer BatchNorm layer Sigmoid output Figure 1: A multichannel convolutional neural network model. 3.3.3 Dr opout layer This layer is used to regularize the neural networks in terms of overfitting and underfitting. Specifically , it ignores some of the outputs in the neural network during the training pro- cess. 3.3.4 Max-pooling The max-pooling layer is applied over each feature map to select the maximum value based on the filter size. It is smaller in size than the feature map. The output of this layer contains the most important feature values of the previous feature map [15, 16, 18, 22]. 3.3.5 Flatten layer The flatten layer converts the pooled feature map into a sin- gle column or one-dimensional array . This result is passed to a mer ged layer . 3.4 Merge layer The mer ged or concatenate layer combines the output of each channel. These combined results passed to a fully con- nected or dense layer . 3.5 Fully connected layer A fully connected or dense layer connects the input of the flatten layer to all units of the next layer . It works the same as the feed-forward neural network. 3.6 Batch normalization layer The batch normalization layer allows all layers of a network to learn more independently . Specifically , it standardizes or normalizes the result of previous layers. Also, this layer acts as a regularization parameter to avoid overfitting. 3.7 Sigmoid output layer The sigmoid output function predicts the probability-based output for each label as shown in equation 1. It is success- fully applied in multilabel classification problems [24]. f(x)= 1 1+exp x (1) 4 Results and discussion W e implemented the proposed multilabel multichannel model on Abilify oral dataset. This dataset contains 1722 instances associated with a set of labels, namely , ease of use, satisfaction, and ef fectiveness. W e split the dataset into training (1394), validation (155), and testing (173). The data instances are preprocessed with various tasks such as removing punctuations, stopwords, upper case to lower case, and tokenization. Then, word vectors are generated to the input sequences using the GloV e word embedding model. The proposed multichannel convolutional network model was applied to this dataset to perform the multilabel classification task. This model represents three multiple versions of the standard convolutional neural network with dif ferent kernel sizes. The standard CNN model consists of a word embedding layer , 1D-convolutional layer , dropout layer , max-pooling layer , and a flattened layer . The output 1 12 Informatica 47 (2023) 109–1 14 T .E. T rueman et al. Data Accuracy score Hamming loss F1 micro score Accuracy per label 0 1 2 V alidation 0.548 0.275 0.839 0.820 0.931 0.750 T esting 0.538 0.303 0.820 0.815 0.912 0.715 T able 2: Performance of the proposed multichannel CNN model. Data Accuracy score Hamming loss F1 micro score Accuracy per label 0 1 2 BR_NB 55.2 0.379 71.5 60.5 68.9 57.0 BR_DT 59.6 0.343 73.9 61.9 79.7 55.3 BR_SVM 60.7 0.338 75.2 62.6 78.3 57.8 CC_NB 55.0 0.378 71.5 60.5 68.7 57.5 CC_DT 61.0 0.347 74.6 61.9 78.3 51.5 CC_SVM 60.5 0.346 75.0 62.6 77.5 55.7 LP_NB 54.1 0.385 69.4 566. 72.0 55.7 LP_DT 60.3 0.354 74.5 60.9 77.7 55.1 LP_SVM 62.8 0.334 76.5 63.4 79.8 56.4 Proposed 53.8 0.303 82.0 81.5 91.2 71.5 T able 3: Model comparison. of each channel is combined through a mer ged layer and it is passed to a dense layer , batch normalization layer , and the sigmoid output layer . Specifically , we fixed the following hyperparameters using the random approach such as input length with 150 units, 100 embedding dimension, three ker - nel sizes (4, 6, and 8), ReLU activation, 0.8 dropouts, pool- ing size 2, 10 units in the fully connected layer , 20 epochs, and Adam optimizer with a binary cross-entropy loss func- tion. The proposed multichannel CNN model for multilabel classification is evaluated using various multilabel metrics, namely , accuracy or exact match, hamming loss, F1-micro average score, and accuracy per label [3, 5, 20, 21]. T able 2 shows the performance of the proposed multichannel CNN model for multilabel classification. This result is compared with the problem transformation approaches, namely , bi- nary relevance, label powerset, and classifier chains with NB, DT , and SVM [20] as shown in T able 3. The exist- ing researchers in the T able 1 have addressed the multilabel classification using dif ferent biomedical texts. In this work, we used the patients and caregivers’ opinion on drugs and medications dataset. In particular , we have compared the results of our proposed method with various baselines as shown in T able 3. The proposed multichannel CNN model achieves better results in terms of Hamming loss (30.3%), F1 micro score (82.0%), and accuracy per label (81.5%, 91.2%, 71.5%). 5 Conclusion In this paper , we proposed a multichannel convolution neu- ral network for multilabel sentiment classification using Abilify oral user comments. A pre-trained model was used to generate word vectors. Then, the proposed model was evaluated with the multilabel classification metrics. The results showed that the proposed multichannel CNN model achieves the better result in terms of Hamming loss, F1 mi- cro score, and accuracy per label than the problem transfor - mation approaches. In future work, we study the trend of drugs and medications in dif ferent age groups using patient and caregiver reviews. Acknowledgement W e thank the Department of Information Science and T ech- nology , Anna University , Chennai for the facility provided during this work. Refer ences [1] Ronen Feldman. T echniques and applications for sentiment analysis. Communications of the ACM , 56(4):82-89, 2013. https://doi.org/10.1145/ 2436256.2436274 [2] Bing Liu. Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge university pr ess , 2020. https://www.cs.uic.edu/~liub/ FBS/sentiment- analysis- tutorial- 2012.pdf [3] Grigorios T soumakas and Ioannis Katakis. Multi- label classification: An overview . International Journal of Data W ar ehousing and Mining (IJDWM) , 3(3):1-13, 2007. https://doi.org/10.4018/ 978- 1- 59904- 951- 9.ch006 [4] Sadri Alija, Edmond Beqiri, Alaa Sahl Gaafar , Alaa Khalaf Hamoud. Predicting Students Performance A multi-channel convolutional neural network for… Informatica 47 (2023) 109–1 14 1 13 Using Supervised Machine Learning Based on Imbal- anced Dataset and W rapper Feature Selection. Infor - matica , 47(1), 2022. https://doi.org/10.31449/ inf.v47i1.4519 [5] Read J, Pfahringer B, Holmes G, and Frank E. Clas- sifier chains for multi-label classification. In Joint Eur opean Confer ence on Machine Learning and Knowledge Discovery in Databases , 254-269, 2009. Springer , Berlin, Heidelber g. https://doi.org/ 10.1007/978- 3- 642- 04174- 7_17 [6] T soumakas G, Katakis I, and Vlahavas I. Ran- dom k-labelsets for multilabel classification. IEEE T ransactions on Knowledge and Data Engineer - ing , 23(7):1079-1089, 2010. https://doi.org/ 10.1109/TKDE.2010.164 [7] LeCun Y , Bengio Y , and Hinton G. Deep learning. Natur e , 521(7553):436-444, 2015. https://doi. org/10.1038/nature14539 [8] Ian Goodfellow , Y oshua Bengio, and Aaron Courville. Deep learning. Cambridge: MIT pr ess , 1(2), 2016. http://www.deeplearningbook.org [9] Patel R, T anwani S, and Patidar C. Relation Extrac- tion Between Medical Entities Using Deep Learning Approach. Informatica , 45(3), 2021. https://doi. org/10.31449/inf.v45i3.3056 [10] Jef frey Pennington, Richard Socher , and Christopher Manning. Glove: Global vectors for word represen- tation. In Pr oceedings of the 2014 confer ence on empirical methods in natural language pr ocessing (EMNLP) , 1532-1543, 2014. https://doi.org/ 10.3115/v1/d14- 1162 [1 1] Baumel T , Nassour -Kassis J, Cohen R, Elhadad M, and Elhadad N. Multi-label classification of patient notes a case study on ICD code assignment. 2017. https://arxiv.org/abs/1709.09587 [12] W ang Y , Sohn S, and Liu S et al. A clinical text clas- sification paradigm using weak supervision and deep representation. BMC medical informatics and deci- sion making , 19(1):1, 2019. https://doi.org/10. 1186/s12911- 018- 0723- 6 [13] Singh G, Thomas J, Marshall IJ, Shawe-T aylor J, and W allace BC. Structured multi-label biomedical text tagging via attentive neural tree decoding. 2018. https://arxiv.org/abs/1810.01468 [14] Leslie Citrome. A review of aripiprazole in the treatment of patients with schizophrenia or bipo- lar I disorder . Neur opsychiatric Disease and T r eat- ment , 2(4):427, 2006. https://doi.org/10.2147/ nedt.2006.2.4.427 [15] Rios A and Kavuluru R. Convolutional neural net- works for biomedical text classification: applica- tion in indexing biomedical articles. In Pr oceed- ings of the 6th ACM Confer ence on Bioinformatics, Computational Biology and Health Informatics , 258- 267, 2015. https://doi.org/10.1145/2808719. 2808746 [16] Gar giulo F , Silvestri S, and Ciampi M. Deep Convo- lution Neural Network for Extreme Multi-label T ext Classification. In HEAL THINF , 641-650, 2018. [17] Kolesov A, K amyshenkov D, Litovchenko M, Smekalova E, Golovizin A, and Zhavoronkov A. On multilabel classification methods of incom- pletely labeled biomedical text data. Computational and mathematical methods in medicine , 2014. https://doi.org/10.1155/2014/781807 [18] Parwez MA and Abulaish M. Multi-label classifica- tion of microblogging texts using convolution neural network. IEEE Access , 7:68678-68691, 2019. https: //doi.org/10.1109/ACCESS.2019.2919494 [19] Ashok Kumar J, Abirami S, and T ina Esther T rue- man. Abilify Oral user reviews. IEEE Dataport , 2020. https://dx.doi.org/10.21227/p1jp- 2m84 [20] Kumar JA, Abirami S, and T rueman TE. Mul- tilabel Aspect-Based Sentiment Classification for Abilify Drug User Review . In 2019 1 1th In- ternational Confer ence on Advanced Computing (ICoAC) , IEEE, 376-380, 2019. https://doi.org/ 10.1109/ICoAC48765.2019.246871 [21] Baadel S, Thabtah F , Lu J, and Har guem S. OM- COKE: A Machine Learning Outlier -based Overlap- ping Clustering T echnique for Multi-Label Data Anal- ysis. Informatica , 46(4), 2022. https://doi.org/ 10.31449/inf.v46i4.3476 [22] Y oon Kim. Convolutional neural networks for sen- tence classification. 2014. https://arxiv.org/ abs/1510.03820 [23] Oo SH, Hung ND, and Theeramunkong T . Justify- ing convolutional neural network with ar gumentation for explainability . Informatica , 46(9), 2023. https: //doi.org/10.31449/inf.v46i9.4359 [24] Burkhardt S and Kramer S. Online multi-label de- pendency topic models for text classification. Ma- chine Learning , 107(5):859-886, 2018. https:// doi.org/10.1007/s10994- 017- 5689- 6 1 14 Informatica 47 (2023) 109–1 14 T .E. T rueman et al.