https://doi.org/10.31449/inf.v45i2.3465 Informatica 45 (2021) 267–276 267 Value-Based Retweet Prediction on Twitter Surbhi Kakar, Deepali Dhaka and Monica Mehrotra Computer Science, Jamia Millia Islamia University E-mails: kakar.surbhi3@gmail.com, deepali.dhaka@gmail.com, mmehrotra@jmi.ac.in Keywords: retweet prediction, twitter, value system, emotions, sentiments, topic- specific emotion, retweeting, social networks, retweet behavior, topic Received: March 7, 2021 Retweeting is an online activity done on the twitter social network. This activity leads to sharing of opinions and ideas from one person to another. Predicting retweet decision has been an interesting and challenging task since the past decade. Past studies have shown that emotions, sentiments and topic specific emotions can influence the retweet decision of the user. However, value systems of an individual can also be an important and crucial aspect in predicting the decision of user. Hence, through our work, we propose to study retweet prediction as a function of value systems. Our work also presents an experimental comparative study with the features used in previous studies. The experimental results using the different machine learning algorithms shows that value-systems have a higher performance in predicting retweet decision of the user as compared to emotions, sentiments and topic-specific emotions. Povzetek: Z metodami strojnega učenja je analiziran problem uporabnikovega odgovora na Twitterju. 1 Introduction Social networks are a platform where people meet each other virtually. Such platforms allow people to share their ideas, thoughts, opinions with each other freely and leads to diffusion of information within the network [47]. As the content shared by the users is an expression of their feelings, sentiments and values, this content can be used to predict the user behavior [4]. Twitter is a social network famous for micro blogging where users express their interests by using the Tweet button. These tweets can further be shared by anyone who feels or experiences a connection with the author of the tweet[32], thus initiating the retweeting mechanism. Users on Twitter can have a follower-followee relationship between them. If a user A is inspired by another user B or finds their interests similar to them, A can then opt to follow that user. In such a case, A is said to be a follower of B. The user B may or may not follow the former user, in the scenario where B does not follow A, he/she is said to be a followee of A. A retweet is a tweet that is re-shared by a user. A tweet prefixed with a symbol RT represents a retweet. Research around retweeting mainly addresses three research problems: 1. Whether a tweet will be retweeted by a user This problem can be redefined as: Given a tweet, whether a user will retweet the tweet or not. Studies done in this area focuses on exploring features that impact user’s retweet behavior followed by building retweet prediction models for the same. [33][51][1][47]. 2. Finding users who will retweet a tweet This research problem focuses on finding which users will retweet a given target tweet. [26][23]. 3. Factors that affect the retweet frequency of the tweet A lot of research has been done around finding why a specific tweet is retweeted more in comparison to other tweets. [35][42][5][32][17]. This work will be focusing on addressing the first research problem calling it as the retweet prediction problem. Retweet prediction can be defined as a problem of predicting the retweet decisions of a user. This has been a very challenging problem as tweets innately are noisy and complex. It is through the retweet mechanism that the diffusion of information takes place. Understanding retweet behaviors and the virality of tweets on social media can help us identify influential people who can spread the information at a faster pace. This insight is useful in applications such as viral marketing and emergency response. Predicting which tweets will be retweeted by a user can also help in providing recommendations to a user to create a personalized experience for them. The most popular approach for retweet prediction starts with building a user profile[8][10][24][25][28]. The profile of the user can be extracted from their tweet/retweet data. Several factors like URL’s, hashtags can be used directly from the timeline of the user to build their profile. However, certain information is latent/hidden in the content shared by the user. At such places, topic extraction can be very useful. The topic of the tweet has been found to be a promising factor in capturing interests of the user [8][10][28][15][46]. In addition to these factors, emotions and sentiments can also be employed for this task. Emotion represents the mental state of a human being whereas sentiments can be viewed as an opinion towards 268 Informatica 45 (2021) 267–276 S. Kakar et al. a person or an object. The content written by a user is an expression of how he/she feels, making it a good representative of their emotions and sentiments. Several theories have been proposed to classify human emotions but for our work, we use the well-accepted theory by [37]. According to this theory, emotions can be classified into eight basic types: Anger, Joy, Surprise, Anticipation, Sadness, Disgust, Fear and Trust. Sentiments, on the other hand can be categorized into either positive or negative. For our work, we use the NRC word-emotion lexicon[31] to label emotions and sentiments in the content of the tweet as it is a well-accepted lexicon for labeling emotions and sentiments. The impact of emotions and sentiments on retweet prediction have been studied in various research[17][21][32][36][11]. [11] in their work compared the conjunctive effect of using emotion and topic with topic-specific emotion model in predicting the retweet decision of the user. They proposed that not just the topic, but the emotions and sentiments expressed by a user on a topic also correlates with user’s decision. They, in their future work, proposed to study value systems for the purpose of retweet prediction task. This formed the inspiration of our work around value systems. A value system of an individual denotes the beliefs a user carries in their life. This can be learnt from their environment including places like family and school. As per [41][7], value systems can fall into the following classes: • Self-Transcendence: This type of value system represents values of benevolence and universalism. The core beliefs in this category are those of wisdom, peace, spirituality, and welfare of general public. • Self-Enhancement: This is the category where people are more interested in their own enhancement and growth. They are also inclined towards power and authority. • Conservation: People who carry this type of value system are more traditional and believe in cultural values and religion. They tend to conform to the rules of the society and are concerned about their family security as well as national security. • Openness to Change: This value system represents people who are adventurous and daring, someone who is independent and self-directed. • Hedonism: People carrying this type of value system tend to be involved in pleasure seeking activities. Value systems have been shown as an important factor in influencing user decisions as per past studies[2][48][39][22][29] ranging from shaping leadership styles to influencing voting preferences of a user. Hence, our work proposes to explore the impact of using value systems on retweet decisions of a user. Overall, the contributions of this paper can be stated as below: • Proposing a novel value-based model which uses value system related features to predict retweet behavior. • Proposing feature extraction methodology for value related features. • Comparing emotion, sentiment, topic-specific emotion and value-based models. • Experimental results demonstrate the higher performance of value-based models as compared to emotion, sentiment and topic-specific emotion models used. The remaining paper is structured as follows. Section 2 outlines the previous studies performed in this field. Section 3 summarizes the statistics and the approach of the data collection. Section 4 presents the methodology used and the process of feature generation. Section 5 discusses the experiments performed followed by Section 6 and 7, discussing the results and summarizing the conclusion of our work respectively. 2 Related works This section reviews the past work done in the context of retweet prediction discussing various features like emotions, sentiments and value systems which are potential predictors for modeling retweet decisions of a user. Retweet Prediction can be approached as either a classification problem or a recommendation problem. Our work would be using classification to approach the problem of retweet prediction. The earlier research in retweet prediction were mainly studying the factors affecting the retweet mechanism. [4] studied the reasons and the conventional styles of retweeting. [42] proved the impact of using URL’s, hashtags, number of followers and followees on the retweet frequency of the tweet. It was shown by [17][36] that emotions and sentiments also affect the virality of the tweet. The topic of interest was seen as a potential factor by [28]. Recent studies are focused around predicting retweet behavior. [47] used a factor graph model and concluded that time of the tweet, user information and the content of the tweet can be effective predictors in predicting retweet behavior of the user. [33] used the temporal information to study the retweeting activity. They used conditional random fields for their work. Other research also exploited the temporal information of the tweet [14][51]. The topical information of the tweet was also studied to gauge the influence of topic on the retweet decision of the user. [50] used a factor graph model considering user attributes, topic information and instantaneity to study the retweet behavior. [10] captured short term interests of the user by ranking top three topics as the hot topics that the user is interested in, at that point in time. [8] used a collaborative-based recommendation algorithm considering topic as a feature to capture user interests. [11] studied the impact of emotions specific to a topic, emphasizing that the same person can have different emotions for different topics and showed that topic- specific emotion feature correlates with the retweeting behavior of the user. Other researchers also showed the importance of topic as a factor in retweet mechanism[15][46][28][27]. Author Value-Based Retweet Prediction on Twitter Informatica 45 (2021) 267–276 269 information has also been shown to have influence on retweet behavior of the user. Recent authors view the retweet prediction as a recommendation problem and use matrix factorization techniques for the same [45][44][18][19][51]. Sentiments and Emotions also have a big impact on user posting/re-posting behavior. Emotions represent the state of mind of an individual at a specific point in time. As per [37] it can be categorized into eight basic types of anger, joy, sadness, disgust, trust, anticipation, surprise and fear. Sentiments can be viewed as positive, negative opinions of people over an event/person/object. It has been proved by several studies that sentiments and emotions have an impact on predicting the decisions of the user [17][32][36]. [36] in their work, demonstrated that the intensity of emotions expressed in a tweet is directly proportional to its retweet frequency. [21] used emotions and user related features to predict the retweet behavior. They concluded that tweets reflecting sadness and anger are the most dominating emotions to be retweeted. Several tools and techniques have been proposed to detect emotions from the content of the tweet. These include tools based on parsers, tree taggers and lexicon-based techniques to label emotions in text [40][21][34][38]. For our work, we employ the use of lexicon-based methods to label emotions, sentiments and value systems in the content of the tweet. An individual learns their personal values from their environment since childhood. The value system of an individual can be viewed as the core beliefs held by them towards someone or something. [41] classified value systems into five types of self- transcendence, conservation, openness to change, hedonism, self-enhancement. Several works show that the content shared reflects the value system of the user [7][12][43][16][9]. [43] used the text of the speech to infer values of the user. Another work used human annotations and machine- learning to identify values in text [16]. Some authors built a word map for the people reflecting traits of being conservative and liberal [9]. [7] also confirmed that the content of the tweet can have potential influence for labeling the value system of the user by analyzing the words related to a specific value system category. Several research show that value systems can help shape personal decisions of people. [2] showed that personal values of an individual can shape their style of leading teams. Another work studied value systems and concluded that they can impact the travel decisions of young adults [48]. Several other works confirm that value systems have a potential influence on voting decisions, foreign policy orientations and health decisions of an individual [22][39][29]. Hence, our work attempts to use value systems to explore the impact on retweet decisions of the user comparing them with previous state of art models used viz, emotions, sentiments, topic-specific emotions. For our work, we will be using the valueDict lexicon [20] for labeling value systems in the content written by users as it is one of the first lexicons to be proposed for the purpose of labeling value systems. This lexicon contains words associated with each of the value system categories. The lexicon was created by taking a set of seed users. For these users, their value system and the associated words for each value system category were inferred by investigating the content written by them in their descriptions and the tweets. The strength of the lexicon was increased by generating synonyms using word2vec embeddings. A validation of the lexicon was applied on an additional set of users taking their descriptions as the ground truth label for their value system. Table 1 summarizes the findings of past studies around sentiments, emotions and topic- specific emotions in addition to other user and tweet features used. Table 1: Summary of Related Works. 3 Data collection The data for this research was collected through the Twint API [49][3]. For this study, we selected a set of 126 seed users manually based upon the average activity of the user per day. These users had an activity of posting a tweet/retweet on an average 4-5 times per day. For these users, their latest 700 tweet/retweet data were collected which was used in constructing the user-interest profile. As a next step, a list of 60 followees each was fetched for these users from the Twint API. To create the target dataset, we needed to create positive and negative samples for each user. We considered all the retweets of the user within a given time as the positive samples for the target dataset. To create the negative samples, we fetched the list of retweet authors for each of the seed users. A retweet author is the author whose tweets, a seed user has retweeted. If these retweet authors were also a 270 Informatica 45 (2021) 267–276 S. Kakar et al. i followee of the seed user, we collected latest 700 tweets of the author within the same timeline of user’s retweets. Both datasets were merged together to form a list of interesting and non-interesting tweets for each user. This process was repeated for all the seed users. The final dataset amounted to 17,180 users with 2,15,312 tweet/retweet data. Table 2 presents a summary of the data statistics used. Table 2: Data Statistics. 4 Methodology To prepare the data for retweet prediction problem, for each of the seed user, we prepared a list of interesting and non-interesting tweets. The interesting tweets were the tweets that the user found interesting and retweeted. Whereas, to collect the non-interesting tweets, we collected tweets of their followees where the seed user did not retweet. This enabled us to create positive and negative samples for each of the user. The methodology resulted in a total of 2,15,312 tweet/retweet data with 17,180 users. 4.1 Feature generation 4.1.1 Value systems Value system of an individual is a potential predictor of the decisions, they are likely to take in their life. Past studies have shown the significance of content-based analysis of value systems. Hence, our work uses content of the tweet to determine the value system of the user. We use the valueDict lexicon for labeling the users with their value system [20]. The authors in this study proposed this lexicon which contains words relative to each of the value system categories. They then used it to study the prominent value systems in developing and developed regions of the world. The following strategy is used to calculate the value system for a user in our study: The value system of a user can be represented by w dimensions, (where, w=5) namely: self-transcendence, self-enhancement, conservation, hedonism, openness to change V = {V 1, V 2, . . . V w} Suppose, U = {U 1, U 2, ..U n} be the set of users And let Tw i = {Tw i1, Tw i2…. Twiz} be the set of tweets for i th user, where 1 ≤ i ≤ n 1 ≤ j ≤ z i Then let n ijk be the number of hits found in the lexicon, for each 𝑣 𝑘 where 𝑣 𝑘 ∈ V, for a tweet Tw ij, The score of 𝑣 𝑘 for a user i, can be then calculated as: 𝑆 𝑖𝑘 = ∑ 𝑛 𝑖 𝑗 𝑘 𝑧 𝑖 𝑗 = 1 Hence, the total score of the value system for a user i, can be represented as: 𝑆 = ma x ⁡ { 𝑆 𝑖𝑘 } The user is labeled with the value system which has the score S. The value system of a tweet is calculated similarly. 4.1.2 Value similarity score This feature intends to capture how similar a target tweet is, to user's past interests. Suppose a target tweet for a user u, has a value system V j, then the similarity score for this tweet can be calculated as: 𝑆𝑖𝑚𝑖𝑙 𝑎𝑟 𝑖 𝑡 𝑦 𝑠𝑐 𝑜𝑟𝑒 = 𝑇 𝑜𝑡𝑎 𝑙 𝑗 / 𝑋 Where Total j is the total number of tweets/retweets in the u’s profile reflecting value system V j, and X is the total number of tweets/retweets posted by u. 4.1.3 Emotions and sentiments Human emotions can be represented by eight basic emotions namely, anger, disgust, sadness, trust, joy, surprise, fear and anticipation [37]. Our work uses this theory of emotions proposed by the author. Sentiments can be viewed as the opinions of people on certain objects/events. It can be classified into positive and negative sentiments. For our work, we use the NRC word-emotion lexicon [31], to determine the emotion and sentiment score of the tweet. Our methodology of extracting emotions and sentiments is inspired by [11]. For simplicity, we treat emotions and sentiments together and calculate a single score for them, therefore, we may sometimes refer to this combined score as the emotional score of the tweet. Let emotions and sentiments be represented by a 10-dimensional vector for a tweet Tw i for a user i: 𝐸 = { 𝐸𝑆 𝑖 1 , 𝐸𝑆 𝑖 2 , … … . 𝐸𝑆 𝑖 10 } Let n ki be the number of hits found in the NRC lexicon for emotion dimension, ES k , where ES k ∈ E, then the emotion/sentiment score for ES k can be determined by: 𝑆 𝑖𝑘 = ⁡ 𝑛 𝑘𝑖 To give more weight to the emotions that are dominating, we calculate the fraction of the matching words found in the lexicon for a tweet Tw i, multiplying with the number of matching words found in the lexicon. The resultant emotional vector for tweet, Tw i , is of the form: {S ik1, S ik2. . . .S ik10} Value-Based Retweet Prediction on Twitter Informatica 45 (2021) 267–276 271 To further simplify, we convert the scores in this vector to binary scores based on a threshold as past studies have shown that a tweet may reflect more than one emotion. We consider the threshold as the mean of the emotional scores in the vector. If a score is greater than the threshold, we mark it as a 1 else a 0. However, for the sentiments, we consider the bigger of the two sentiment value based on if the tweet reflects a higher Positive value or a higher negative value. 4.1.4 Topic-specific emotion This feature, given the topic of a tweet reflecting certain emotional states, captures its similarity with the user’s emotional states on this topic. To create this feature, we first extracted topic out of a tweet using LDA GIBBS sampling method[13]. We then used conditional probabilities to extract topic specific emotions for the target tweet using the method suggested by [11]. A target tweet, for a user, in such a case can be represented by a vector containing conditional probabilities, { 𝑃 ( 𝐸𝑆 1 | 𝑇 𝑖 ) , 𝑃 ( 𝐸𝑆 2 | 𝑇 𝑖 ) … . 𝑃 ( 𝐸𝑆 10 | 𝑇 𝑖 ) }, for all emotion dimensions given a specific topic the user is interested in. These conditional probabilities can be defined as: 𝑃 ( 𝐸𝑆 𝑗 | 𝑇 𝑖 ) = ⁡ 𝑃 ( 𝐸𝑆 𝑗 , 𝑇 𝑖 ) / 𝑃 ( 𝑇 𝑖 ) ⁡ Where 𝑃 ( 𝐸𝑆 𝑗 , 𝑇 𝑖 ) is the probability of emotion dimension ES j and topic T i occurring together in user’s profile and, 𝑃 ( 𝑇 𝑖 ) is the probability of user’s tweets/retweets reflecting topic T i. Mathematically, it can be written as: 𝑃 ( 𝐸𝑆 𝑗 , 𝑇 𝑖 ) = 𝑇 𝑜𝑡𝑎 𝑙 𝑖𝑗 / 𝑋 𝑃 ( 𝑇 𝑖 ) = 𝑇 𝑜 𝑡𝑎 𝑙 𝑖 / 𝑋 where, Total ij are the total number of tweets/retweets where emotion ES j and Topic T i co-occur in user’s profile, X is the total number of tweets/retweets posted by the user and Total i is the total number of tweets/retweets of the user on topic T i. 4.1.5 Conventional features URL’s and Hashtags URL’s and Hashtags have been an important factor in determining the retweet decision of the user [46][42][1]. For our work, we checked if the URLs and hashtags in the target tweet is similar to their user profile. If so, we create a score of 1 else a 0. The URLs and hashtags interest were taken from the user interest profile. User Interest Vector Text Similarity is a well-known algorithm for the task of retweet prediction [46][42][26][1]. To compute text similarity between the target tweet and the past tweets/retweets of the user, we create user interest vector and interest vector for the target tweet by using word2vec algorithm [30]. Cosine Similarity is used to further calculate the text similarity between the two vectors. 5 Experiment We performed separate experiments to evaluate value- based, emotion/sentiment based and topic-specific emotion-based models for the task of retweet prediction. Conventional features were used in conjunction with these models. To perform the experiment, the target tweet/retweet dataset was divided into training and test set with a test ratio of 0.3. Each model was trained on the train set and evaluated on the test set. All the models were run using four different classifiers: Random Forest, Logistic Regression, XGB and GBT. A 10-fold cross-validation was performed to get optimal parameter values for the models in order to avoid overfitting. These experiments were implemented using python 3 with a PyCharm editor on a machine with a processor of 2.2 GHz 6-core Intel Core i7 and memory of 16 GB. 5.1 Data checks and preprocessing To prepare the modeling data, several data checks and preprocessing techniques were applied including skewness checks, handling null values and encoding the categorical features. As the target label was highly imbalanced, we used SMOTE sampling to balance out the imbalance between the class labels [6]. 5.2 Modeling For our work, we built the following models to compare the value-based model with previously used models for retweet behavior prediction, namely, emotion/sentiment- based model, topic-specific emotion-based model. We also compared our work with one of the baseline models proposed in previous studies [15][11]. This baseline model is called as the user-interest model. 5.2.1 Value-based Model (VM) This model explores the impact of using user’s value systems on their retweet decisions. The model uses features based on the value systems viz target value system and the similarity score between the target value system and value system in the user interest profile. 5.2.2 Emotion-based Model (EM) This model intends to capture the effect of user’s emotions and sentiments on their retweet behavior. The model uses the 10-dimensional emotion and sentiment score extracted by the process described in the Feature generation section. 5.2.3 Topic-Specific Emotion Model (TSM) The topic-specific model was built to investigate the effect of topic specific emotions on user’s retweeting decision, as different users can express different emotions for a specific topic. It uses the 10-dimensional conditional probabilities score to predict the retweet decision of the user. The probabilities are calculated using conditional 272 Informatica 45 (2021) 267–276 S. Kakar et al. probability of an emotion dimension, given a specific topic. This tells us how likely the user is to express an emotion given a specific topic. 5.2.4 User Interest Model (UIM) This model is used as a baseline model and intends to explore the text similarity between the user interest vector and the target tweet. The vectors are created using the word2vec algorithm. Cosine Similarity is used to infer the similarity between user interest vector and target tweet vector. To calculate the accuracy of our retweet predictions models, we used the accuracy metric which can be defined as the ratio of number of correctly classified instances to the total number of instances. Figure 1: Model Accuracies for a) Value based Model b) Emotion based model c) Topic-specific emotion-based model d) User Interest Model. 6 Results and discussion All the above models were initially evaluated on the accuracy metric. Figure 1 shows the accuracies of value-based models along with previously used models for the task of predicting retweet decision of users. The accuracies are calculated using classifiers namely, Random Forest (RF), Logistic Regression (LR), XGB (Extreme gradient boosting trees), GBT (Gradient boosting trees). The figure shows 4 sub-parts demonstrating the accuracies of value- based model, emotion-based model, topic-specific emotion-based model and user interest model respectively. The value-based model (VM model) uses value system and the value similarity score between target tweet and user profile. This model has a comparable performance across all the classifiers used, with XGB performing slightly better than others. The emotion-based model (EM model) simply uses the 10-dimensional emotion vector as a feature for the prediction. As seen in Figure b), this model as well has a comparable performance across all classifiers used with a slight improvement with the random forest classifier. Figure c) shows the topic-specific emotion-based model (TSM model) accuracy for the task of retweet prediction. It uses the topic-specific emotion feature for predicting retweet behavior by using the conditional probability of an emotion given a topic in the target tweet. For this model, it can be seen that logistic regression performs the best when drawn a comparison with other classifiers. The user-interest model (UIM model) uses the cosine similarity between the user interest vector and the target vector as a feature. The accuracy of this model varies with the type of classifier used. We can see that when using logistic regression classifier, this model performs the worst but shows a great improvement when tested with other classifiers. Table 3: A comparison of various models on the basis of accuracy. Table 4: A comparison of various models based on precision, recall and F1 score. Table 3 and 4 shows the comparative performance between value-based model with previously used retweet prediction models. Table 3 draws a comparison between different models based on accuracy. We used four classifiers viz, Random Forest, XGB, GBT and Logistic Regression, for each of the models to be compared. The user-interest model (UIM model) uses the cosine similarity between the user interest vector and the target vector as a feature. The topic-specific emotion-based model (TSM model) uses the topic-specific emotion feature for predicting retweet behavior. It uses the conditional probability of an emotion given a topic in the target tweet. Emotion-based model (EM model) simply uses the 10-dimensional emotion vector as a feature for the prediction. Value based model (VM model) on the Value-Based Retweet Prediction on Twitter Informatica 45 (2021) 267–276 273 other hand, uses value system and the value similarity score between target tweet and user profile. As it can be seen, the UIM model achieves the worst performance as compared to other models. This is in conformance to our expectation as previous studies use this model as a baseline [15][11]. TSM model shows a better performance than the EM model across all classifiers, however, EM model has an improved accuracy with Random Forest classifier. This indicates that mutual effect of topic and emotions can be treated as a comparable feature to the use of emotions for predicting retweet behaviors. Comparing VM model to TSM and EM model, VM model has an improved accuracy across all classifiers. This indicates that using value systems of an individual can prove to be potential predictor of their retweet decision. Also, we believe that the use of word2vec model to generate similar words for the valueDict lexicon captures the underlying contextual information in the content which when used to label the value systems of users helps in having a higher performance for the retweet prediction task. Accuracy is a good metric when the distribution of our target is balanced. However, in case of imbalanced classes, it is good to evaluate our test set based on other metrics like precision, recall and F1 score. Precision is the ratio of correctly classified true instances to total classified instances as positive. Recall is the ratio of correctly identified true positives to the total instances that were originally positive. Precision can be also written as: 𝑃 𝑟 𝑒 𝑐𝑖 𝑠 𝑖 𝑜 𝑛 = 𝑇 𝑟 𝑢 𝑒 ⁡ 𝑃 𝑜𝑠 𝑡𝑖𝑣𝑒 𝑠 / ( 𝑇 𝑟 𝑢 𝑒 ⁡ 𝑃 𝑜𝑠 𝑖 𝑡𝑖𝑣𝑒 𝑠 + 𝐹 𝑎𝑙 𝑠 𝑒 ⁡ 𝑃 𝑜𝑠 𝑖 𝑡𝑖 𝑣𝑒 𝑠 ) Recall can also be expressed as in: 𝑅𝑒𝑐𝑎 𝑙𝑙 = 𝑇 𝑟 𝑢 𝑒 ⁡ 𝑃 𝑜𝑠 𝑖 𝑡 𝑖 𝑣 𝑒𝑠 / ( 𝑇 𝑟 𝑢 𝑒 ⁡ 𝑃 𝑜𝑠 𝑖 𝑡𝑖𝑣 𝑒𝑠 + 𝐹 𝑎𝑙 𝑠 𝑒 ⁡ 𝑁𝑒𝑔 𝑎𝑡𝑖 𝑣𝑒 𝑠 ) F1 score is a metric that represents the harmonic mean of precision and recall. It is important to look at this metric as a model with a very high precision and a very low recall is also not considered to be a useful model. Hence F1 score provides a mean to judge the performance of both metrics. Hence, we present a comparison based on evaluation metrics that we used for our test set, namely, precision, recall and F1 score in Table 4. The Table presents a comparison between our proposed value-based model with the previous state of art models used for the given classification task. A similar pattern as that in accuracy can be seen in these metrics while comparing across the different models. The precision of the VM model proves to be higher as compared to TSM, EM and UIM models across all classifiers. This again proves the ability of using value systems as a feature for the retweet prediction. Comparing the TSM and EM model in terms of precision, we can see that TSM model shows a higher precision when used with all classifiers except Random Forest. This proves again that both features can be said to be potential predictors rather than one being superior to the other. Looking at the recall, we see that almost all the models have a comparable performance, with VM model having a slightly better performance than others. However, to have a look at both the measures jointly, we consider the harmonic mean of precision and recall, used as F1 score for our evaluation. Through the results, we can see that VM model has a higher F1 score as compared to all the other models. As expected, the baseline using UIM Model has a lower performance for all the metrics. These results confirm the importance of value systems as a potential predictor of retweet behaviors in addition to state of art features previously used: emotions, sentiments, and topic-specific emotions. This work can be used in all the applications of retweet prediction including viral marketing, emergency response and tweet recommendation. Value Systems of an individual can also be used in practice to identify spammers. 7 Conclusion Predicting retweet decisions of a user is a challenging problem. The retweet behavior of a user correlates with factors like emotions, sentiments, topic-specific emotions as studied and showed by the past studies. Value systems have also been shown in the past studies as an important predictor of user decisions, however, its impact was not yet explored in the domain of retweet prediction. Hence, in this work, our objective was to explore the impact of value systems on the retweet decisions of the user. Value Systems, being a latent attribute of a user have a potential to have a good predictive power in deciphering retweet behavior of the user. We presented a value-based model explaining the methodologies to extract value related features. We also compared our model with previous state of art models used. Through different experiments, our work shows that value systems, are indeed an important factor in predicting retweeting decisions of the user. The future work of our paper includes studying and comparing other state of art models with value-based models. References [1] Abel, F., Gao, Q., Houben, G. J., Tao, K. 2011. Analyzing user modeling on twitter for personalized news recommendations. In international conference on user modeling, adaptation, and personalization, pages 1–12. Springer. https://doi.org/10.1007/978-3-642-22362-4_1 [2] Ali, S., Katoma, V., Tyobeka, E. 2015. Identification of key values and behaviours influencing leadership orientation in Southern Africa. Journal of Emerging Trends in Educational Research and Policy Studies, 6(1):6–12. [3] Bonsón, E., Perea, D., Bednárová, M. 2019. Twitter as a tool for citizen engagement: An empirical study of the Andalusian municipalities. Government Information Quarterly, 36(3):480–489. https://doi.org/10.1016/j.giq.2019.03.001 [4] Boyd, D., Goder, S., Lotan, G. 2010. Tweet, tweet, retweet: Conversational aspects of retweeting on 274 Informatica 45 (2021) 267–276 S. Kakar et al. twitter. In 2010 43rd Hawaii International Conference on System Sciences, pages 1–10. IEEE. https://doi.org/10.1109/hicss.2010.412 [5] Can, E. F., Oktay, H., & Manmatha, R. (2013, October). Predicting retweet count using visual cues. In Proceedings of the 22nd ACM international conference on information & knowledge management (pp. 1481-1484). https://doi.org/10.1145/2505515.2507824 [6] Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P. 2002. SMOTE: Synthetic Minority Over- sampling Technique. Journal of Artificial Intelligence Research, 16:321–357. https://doi.org/10.1613/jair.953 [7] Chen, J., Hsieh, G., Mahmud, J. U., Nichols, J. 2014. Understanding individuals’ personal values from social media word use. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, pages 405– 414. ACM. https://doi.org/10.1145/2531602.2531608 [8] Chen, K., Chen, T., Zheng, G., Yao, J. O., Yu, E., Y 2012. Collaborative personalized tweet recommendation. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval 2012, ACM, pages 661–670. https://doi.org/10.1145/2348283.2348372 [9] Dehghani, M., Gratch, J., Sachdeva, S., Sagae, K. 2011. Analyzing conservative and liberal blogs related to the construction of the ‘Ground Zero Mosque’. Proceedings of the Annual Meeting of the Cognitive Science Society, 33. [10] Deng, Z., Yan, M., Sang, J., Xu, C. 2015. Twitter is faster: personalized time-aware video recommendation from Twitter to YouTube. ACM Trans Multimed Comput Commun Appl (TOMM), 11(2):31–31. https://doi.org/10.1145/2637285 [11] Firdaus, S. N., Ding, C., Sadeghian, A. 2019. Topic specific emotion detection for retweet prediction. International Journal of Machine Learning and Cybernetics, 10(8):2071–2083. https://doi.org/10.1007/s13042-018-0798-5 [12] Fleischmann, K. R., Oard, D. W., Cheng, A.-S., Wang, P., Ishita, E. 2009. Automatic classification of human values: Applying computational thinking to information ethics. Proceedings of the American Society for Information Science and Technology, 46(1):1–4. https://doi.org/10.1002/meet.2009.1450460345 [13] Griffiths, T. L. Steyvers, M. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Supplement 1):5228–5235. https://doi.org/10.1073/pnas.0307752101 [14] Hong, O., Dan, B. D., Davison 2011. Predicting popular messages in twitter. Proceedings of the 20th international conference companion on World wide web, pages 57–58. https://doi.org/10.1145/1963192.1963222 [15] Huang, D., Zhou, J., Mu, D., Yang, F. 2014. Retweet behavior prediction in twitter. 2014 IEEE Seventh international symposium computational intelligence and design (ISCID), 2:30–33. https://doi.org/10.1109/iscid.2014.187 [16] Ishita, E., Oard, D. W., Fleischmann, K. R., Cheng, A.-S., Templeton, T. C. 2010. Investigating multi- label classification for human values. Proceedings of the American Society for Information Science and Technology, 47(1):1–4. https://doi.org/10.1002/meet.14504701116 [17] Jenders, M., Kasneci, G., Naumann, F. 2013. Analyzing and predicting viral tweets. Proceedings of the 22nd international conference on world wide web 2013, ACM, pages 657–664. https://doi.org/10.1145/2487788.2488017 [18] Jiang, B., Lu, Z., Li, N., Wu, J., Jiang, Z. 2018. Retweet prediction using social-aware probabilistic matrix factorization. International Conference on Computational Science, pages 316–327. https://doi.org/10.1007/978-3-319-93698-7_24 [19] Jiang, B., Yi, F., Wu, J., Lu, Z. 2019. Retweet prediction using context- aware coupled matrix- tensor factorization. International Conference on Knowledge Science, Engineering and Management, pages 185– 196. https://doi.org/10.1007/978-3-030-29551-6_17 [20] Kakar, S., Dhaka, D., Mehrotra, M. 2020. Value- Based Behavioral Analysis of Users Using Twitter. In Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems, Springer, volume 145. https://doi.org/10.1007/978-981-15-7345-3_23 [21] Kanavos, A., Perikos, I., Vikatos, P., Hatzilygeroudis, I., Makris, C., Tsakalidis, A. 2014. Modeling retweet diffusion using emotional content. In IFIP International conference on artificial intelligence applications and innovations, pages 101–110. Springer. https://doi.org/10.1007/978-3-662-44654-6_10 [22] Kaufmann, E. 2016. It’s NOT the economy, stupid: Brexit as a story of personal values. British Politics and Policy at LSE. [23] Lee, K., Mahmud, J., Chen, J., Zhou, M., & Nichols, J. 2015. Who will retweet this? detecting strangers from twitter to retweet information. ACM Transactions on Intelligent Systems and Technology (TIST), 6(3), 1-25. https://doi.org/10.1145/2700466 [24] Lee, W. J., Oh, K. J., Lim, C. G., Choi, H. J. 2014. User profile extraction from twitter for personalized news recommendation. 16th International conference on advanced communication technology, pages 779–783. https://doi.org/10.1109/icact.2014.6779068 [25] Lu, C., Lam, W., Zhang, Y. 2012. Twitter user modeling and tweets recommendation based on Wikipedia concept graph. Workshops at the Twenty- Sixth AAAI conference on artificial intelligence. [26] Luo, Z., Osborne, M., Tang, J., Wang, T. 2013. Who will retweet me? Finding retweeters in Twitter. Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 869–872. Value-Based Retweet Prediction on Twitter Informatica 45 (2021) 267–276 275 https://doi.org/10.1145/2484028.2484158 [27] Ma, R., Hu, X., Zhang, Q., Huang, X., Jiang, Y. G. 2019. Hot topic-aware retweet prediction with masked self-attentive model. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 525–534. https://doi.org/10.1145/3331184.3331236 [28] Macskassy, S. A. Michelson, M. 2011. Why do people retweet? Anti-homophily wins the day! In 5th international AAAI conference on weblogs and social media, pages 209–216. [29] Mazzi, M. A., Rimondini, M., van der Zee, E., Boerma, W., Zimmermann, C., Bensing, J. 2018. Which patient and doctor behaviours make a medical consultation more effective from a patient point of view. Results from a European multicentre study in 31 countries. Patient Education and Counseling, 101(10):1795-1803. https://doi.org/10.1016/j.pec.2018.05.019 [30] Mikolov, T., Chen, K., Corrado, G., Dean, J. 2013. Efficient estimation of word representations in vector space. [31] Mohammad, S. M. Turney, P. D. 2013. CROWDSOURCING A WORD-EMOTION ASSOCIATION LEXICON. Computational Intelligence, 29(3):436–465. https://doi.org/10.1111/j.1467-8640.2012.00460.x [32] Naveed, N., Gottron, T., Kunegis, J., Alhadi, A. C. 2011. Bad news travel fast: A content-based analysis of interestingness on twitter. In Proceedings of the 3rd international web science conference, pages 1–7. ACM. https://doi.org/10.1145/2527031.2527052 [33] Peng, H. K., Zhu, J., Piao, D., Yan, R., Zhang, Y. 2011. Retweet modeling using conditional random fields. 2011 IEEE 11th International conference on data mining workshops (ICDMW), pages 336–343. https://doi.org/10.1109/icdmw.2011.146 [34] Perikos, I. Hatzilygeroudis, I. 2013. Recognizing emotion presence in natural language sentences. In International conference on engineering applications of neural networks 2013, pages 30–39. Springer. https://doi.org/10.1007/978-3-642-41016-1_4 [35] Petrovic, S., Osborne, M., & Lavrenko, V. (2011, July). Rt to win! predicting message propagation in twitter. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 5, No. 1). [36] Pfitzner, R., Garas, A., Schweitzer, F. 2012. Emotional divergence influences information spreading in Twitter. Sixth international AAAI conference on weblogs and social media, 12. [37] Plutchik, R. 2001. The Nature of Emotions. American Scientist, 89(4):344–344. [38] Rao, Y., Li, Q., Wenyin, L., Wu, Q., Quan, X. 2014. Affective topic model for social emotion detection. Neural Networks, 58:29–37. https://doi.org/10.1016/j.neunet.2014.05.007 [39] Rathbun, B. C., Kertzer, J. D., Reifler, J., Goren, P., Scotto, T. J. 2016. Taking Foreign Policy Personally: Personal Values and Foreign Policy Attitudes. International Studies Quarterly, 60(1):124–137. https://doi.org/10.1093/isq/sqv012 [40] Roberts, K., Roach, M. A., Johnson, J., Guthrie, J., Harabagiu, A. M. 2012. Empatweet: annotating and detecting emotions on Twitter. LREC 12, 12:3806– 3813. [41] Schwartz, S. H. 1994. Are there universal aspects in the structure and contents of human values? Journal of social issues, 50(4):19–45. https://doi.org/10.1111/j.1540-4560.1994.tb01196.x [42] Suh, B., Hong, L., Pirolli, P., Chi, E. H. 2010. Want to be retweeted? large scale analytics on factors impacting retweet in twitter network. In and others, editor, IEEE Second International Conference on Social Computing. https://doi.org/10.1109/socialcom.2010.33 [43] Templeton, T. C., Fleischmann, K. R., Boyd-Graber, J. 2011. Simulating audiences: Automating analysis of values, attitudes, and sentiment. 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pages 734–737. https://doi.org/10.1109/passat/socialcom.2011.238 [44] Wang, Q., Li, L., Wang, D. D., Zeng 2017. Incorporating message embedding into co-factor matrix factorization for retweeting prediction. International Joint Conference on Neural Networks (IJCNN), pages 1265–1272. https://doi.org/10.1109/ijcnn.2017.7965998 [45] Wang, W., Zuo, Y., Wang 2015. A multidimensional nonnegative matrix factorization model for retweeting behavior prediction. Mathematical Problems in Engineering. https://doi.org/10.1155/2015/936397 [46] Xu, Z. Yang, Q. 2012. Analyzing user retweet behavior on twitter. Proceedings of the 2012 international conference on advances in social networks analysis and mining, pages 46–50. https://doi.org/10.1109/asonam.2012.18 [47] Yang, Z. 2010. Understanding retweeting behaviors in social networks. CIKM, pages 1633–1636. https://doi.org/10.1145/1871437.1871691 [48] Ye, S., Soutar, G. N., Sneddon, J. N., Lee, J. A. 2017. Personal values and the theory of planned behaviour: A study of values and holiday trade-offs in young adults. Tourism Management, 62:107–109. https://doi.org/10.1016/j.tourman.2016.12.023 [49] Zacharias, C. 2017. Twint-twitter intelligence tool. [50] Zhang, J., Tang, J., Li, J., Liu, Y., Xing, C. 2015a. Who influenced you? predicting retweet via social influence locality. ACM Trans. Knowl. Disc. Data (TKDD), 9(3):25–25. https://doi.org/10.1145/2700398 [51] Zhang, K., Yun, X., Liang, J., Zhang, X. Y., Li, C., Tian, B. 2016. Retweeting behavior prediction using probabilistic matrix factorization. IEEE Symposium on Computers and Communication (ISCC). https://doi.org/10.1109/iscc.2016.7543897 [52] Zhang, Q., Gong, Y., Guo, Y., Huang, X. 2015b. Retweet behavior prediction using hierarchical 276 Informatica 45 (2021) 267–276 S. Kakar et al. dirichlet process. Twenty-Ninth AAAI Conference on Artificial Intelligence.