https://doi.org/10.31449/inf.v45i2.3465 Informatica 45 (2021) 267–276 267 
Value-Based Retweet Prediction on Twitter 
Surbhi Kakar, Deepali Dhaka and Monica Mehrotra 
Computer Science, Jamia Millia Islamia University 
E-mails: kakar.surbhi3@gmail.com, deepali.dhaka@gmail.com, mmehrotra@jmi.ac.in 
Keywords: retweet prediction, twitter, value system, emotions, sentiments, topic- specific emotion, retweeting, social 
networks, retweet behavior, topic  
Received: March 7, 2021 
Retweeting is an online activity done on the twitter social network. This activity leads to sharing of 
opinions and ideas from one person to another. Predicting retweet decision has been an interesting and 
challenging task since the past decade. Past studies have shown that emotions, sentiments and topic 
specific emotions can influence the retweet decision of the user. However, value systems of an individual 
can also be an important and crucial aspect in predicting the decision of user. Hence, through our work, 
we propose to study retweet prediction as a function of value systems. Our work also presents an 
experimental comparative study with the features used in previous studies. The experimental results using 
the different machine learning algorithms shows that value-systems have a higher performance in 
predicting retweet decision of the user as compared to emotions, sentiments and topic-specific emotions. 
Povzetek: Z metodami strojnega učenja je analiziran problem uporabnikovega odgovora na Twitterju. 
 
1 Introduction 
Social networks are a platform where people meet each 
other virtually. Such platforms allow people to share their 
ideas, thoughts, opinions with each other freely and leads 
to diffusion of information within the network [47]. As the 
content shared by the users is an expression of their 
feelings, sentiments and values, this content can be used 
to predict the user behavior [4]. 
Twitter is a social network famous for micro blogging 
where users express their interests by using the Tweet 
button. These tweets can further be shared by anyone who 
feels or experiences a connection with the author of the 
tweet[32], thus initiating the retweeting mechanism. 
Users on Twitter can have a follower-followee 
relationship between them. If a user A is inspired by 
another user B or finds their interests similar to them, A 
can then opt to follow that user. In such a case, A is said 
to be a follower of B. The user B may or may not follow 
the former user, in the scenario where B does not follow 
A, he/she is said to be a followee of A. 
A retweet is a tweet that is re-shared by a user. A tweet 
prefixed with a symbol RT represents a retweet.  
Research around retweeting mainly addresses three 
research problems: 
1. Whether a tweet will be retweeted by a user 
This problem can be redefined as: Given a tweet, 
whether a user will retweet the tweet or not. 
Studies done in this area focuses on exploring features 
that impact user’s retweet behavior followed by 
building retweet prediction models for the same. 
[33][51][1][47]. 
2. Finding users who will retweet a tweet 
This research problem focuses on finding which 
users will retweet a given target tweet. [26][23]. 
3. Factors that affect the retweet frequency of the tweet 
A lot of research has been done around finding why 
a specific tweet is retweeted more in comparison to 
other tweets. [35][42][5][32][17]. 
This work will be focusing on addressing the first research 
problem calling it as the retweet prediction problem. 
Retweet prediction can be defined as a problem of 
predicting the retweet decisions of a user. This has been a 
very challenging problem as tweets innately are noisy and 
complex. It is through the retweet mechanism that the 
diffusion of information takes place. Understanding 
retweet behaviors and the virality of tweets on social 
media can help us identify influential people who can 
spread the information at a faster pace. This insight is 
useful in applications such as viral marketing and 
emergency response. Predicting which tweets will be 
retweeted by a user can also help in providing 
recommendations to a user to create a personalized 
experience for them.  
The most popular approach for retweet prediction 
starts with building a user profile[8][10][24][25][28]. The 
profile of the user can be extracted from their 
tweet/retweet data. Several factors like URL’s, hashtags 
can be used directly from the timeline of the user to build 
their profile. However, certain information is 
latent/hidden in the content shared by the user. At such 
places, topic extraction can be very useful. 
The topic of the tweet has been found to be a 
promising factor in capturing interests of the user 
[8][10][28][15][46]. In addition to these factors, emotions 
and sentiments can also be employed for this task.  
Emotion represents the mental state of a human being 
whereas sentiments can be viewed as an opinion towards 
268 Informatica 45 (2021) 267–276  S. Kakar et al. 
a person or an object. The content written by a user is an 
expression of how he/she feels, making it a good 
representative of their emotions and sentiments. Several 
theories have been proposed to classify human emotions 
but for our work, we use the well-accepted theory by [37]. 
According to this theory, emotions can be classified into 
eight basic types: Anger, Joy, Surprise, Anticipation, 
Sadness, Disgust, Fear and Trust. Sentiments, on the other 
hand can be categorized into either positive or negative. 
For our work, we use the NRC word-emotion lexicon[31] 
to label emotions and sentiments in the content of the 
tweet as it is a well-accepted lexicon for labeling emotions 
and sentiments. The impact of emotions and sentiments on 
retweet prediction have been studied in various 
research[17][21][32][36][11]. 
[11] in their work compared the conjunctive effect of 
using emotion and topic with topic-specific emotion 
model in predicting the retweet decision of the user. They 
proposed that not just the topic, but the emotions and 
sentiments expressed by a user on a topic also correlates 
with user’s decision. They, in their future work, proposed 
to study value systems for the purpose of retweet 
prediction task. This formed the inspiration of our work 
around value systems.  
A value system of an individual denotes the beliefs a 
user carries in their life. This can be learnt from their 
environment including places like family and school. As 
per [41][7], value systems can fall into the following 
classes:  
• Self-Transcendence: This type of value system 
represents values of benevolence and universalism. 
The core beliefs in this category are those of 
wisdom, peace, spirituality, and welfare of general 
public. 
• Self-Enhancement: This is the category where people 
are more interested in their own enhancement and 
growth. They are also inclined towards power and 
authority. 
• Conservation: People who carry this type of value 
system are more traditional and believe in cultural 
values and religion. They tend to conform to the 
rules of the society and are concerned about their 
family security as well as national security. 
• Openness to Change: This value system represents 
people who are adventurous and daring, someone 
who is independent and self-directed. 
• Hedonism: People carrying this type of value system 
tend to be involved in pleasure seeking activities. 
Value systems have been shown as an important factor in 
influencing user decisions as per past 
studies[2][48][39][22][29] ranging from shaping 
leadership styles to influencing voting preferences of a 
user.  
Hence, our work proposes to explore the impact of 
using value systems on retweet decisions of a user.  
Overall, the contributions of this paper can be stated 
as below: 
• Proposing a novel value-based model which uses 
value system related features to predict retweet 
behavior. 
• Proposing feature extraction methodology for value 
related features. 
• Comparing emotion, sentiment, topic-specific 
emotion and value-based models. 
• Experimental results demonstrate the higher 
performance of value-based models as compared to 
emotion, sentiment and topic-specific emotion 
models used. 
The remaining paper is structured as follows. Section 
2 outlines the previous studies performed in this field. 
Section 3 summarizes the statistics and the approach of the 
data collection. Section 4 presents the methodology used 
and the process of feature generation. Section 5 discusses 
the experiments performed followed by Section 6 and 7, 
discussing the results and summarizing the conclusion of 
our work respectively. 
2 Related works 
This section reviews the past work done in the context of 
retweet prediction discussing various features like 
emotions, sentiments and value systems which are 
potential predictors for modeling retweet decisions of a 
user. Retweet Prediction can be approached as either a 
classification problem or a recommendation problem. Our 
work would be using classification to approach the 
problem of retweet prediction. 
The earlier research in retweet prediction were mainly 
studying the factors affecting the retweet mechanism. [4] 
studied the reasons and the conventional styles of 
retweeting. [42] proved the impact of using URL’s, 
hashtags, number of followers and followees on the 
retweet frequency of the tweet. It was shown by [17][36] 
that emotions and sentiments also affect the virality of the 
tweet. The topic of interest was seen as a potential factor 
by [28]. 
Recent studies are focused around predicting retweet 
behavior. [47] used a factor graph model and concluded 
that time of the tweet, user information and the content of 
the tweet can be effective predictors in predicting retweet 
behavior of the user. [33] used the temporal information 
to study the retweeting activity. They used conditional 
random fields for their work. Other research also exploited 
the temporal information of the tweet [14][51]. 
The topical information of the tweet was also studied 
to gauge the influence of topic on the retweet decision of 
the user. [50] used a factor graph model considering user 
attributes, topic information and instantaneity to study the 
retweet behavior. [10] captured short term interests of the 
user by ranking top three topics as the hot topics that the 
user is interested in, at that point in time. [8] used a 
collaborative-based recommendation algorithm 
considering topic as a feature to capture user interests. [11] 
studied the impact of emotions specific to a topic, 
emphasizing that the same person can have different 
emotions for different topics and showed that topic-
specific emotion feature correlates with the retweeting 
behavior of the user. 
Other researchers also showed the importance of topic 
as a factor in retweet mechanism[15][46][28][27]. Author 
Value-Based Retweet Prediction on Twitter Informatica 45 (2021) 267–276 269 
information has also been shown to have influence on 
retweet behavior of the user. 
Recent authors view the retweet prediction as a 
recommendation problem and use matrix factorization 
techniques for the same [45][44][18][19][51]. 
Sentiments and Emotions also have a big impact on 
user posting/re-posting behavior. Emotions represent the 
state of mind of an individual at a specific point in time.   
As   per [37] it can be categorized into eight basic types of 
anger, joy, sadness, disgust, trust, anticipation, surprise 
and fear. Sentiments can be viewed as positive, negative 
opinions of people over an event/person/object. It has been 
proved by several studies that sentiments and emotions 
have an impact on predicting the decisions of the user 
[17][32][36]. [36] in their work, demonstrated that the 
intensity of emotions expressed in a tweet is directly 
proportional to its retweet frequency. [21] used emotions 
and user related features to predict the retweet behavior. 
They concluded that tweets reflecting sadness and anger 
are the most dominating emotions to be retweeted. 
Several tools and techniques have been proposed to 
detect emotions from the content of the tweet. 
These include tools based on parsers, tree taggers and 
lexicon-based techniques to label emotions in text 
[40][21][34][38]. For our work, we employ the use of 
lexicon-based methods to label emotions, sentiments and 
value systems in the content of the tweet. 
An individual learns their personal values from their 
environment since childhood. The value system of an 
individual can be viewed as the core beliefs held by them 
towards someone or something. 
[41] classified value systems into five types of self-
transcendence, conservation, openness to change, 
hedonism, self-enhancement. 
Several works show that the content shared reflects 
the value system of the user [7][12][43][16][9]. 
[43] used the text of the speech to infer values of the 
user. Another work used human annotations and machine-
learning to identify values in text [16]. Some authors built 
a word map for the people reflecting traits of being 
conservative and liberal [9]. [7] also confirmed that the 
content of the tweet can have potential influence for 
labeling the value system of the user by analyzing the 
words related to a specific value system category. Several 
research show that value systems can help shape personal 
decisions of people. [2] showed that personal values of an 
individual can shape their style of leading teams. Another 
work studied value systems and concluded that they can 
impact the travel decisions of young adults [48]. Several 
other works confirm that value systems have a potential 
influence on voting decisions, foreign policy orientations 
and health decisions of an individual [22][39][29]. 
Hence, our work attempts to use value systems to 
explore the impact on retweet decisions of the user 
comparing them with previous state of art models used viz, 
emotions, sentiments, topic-specific emotions. For our 
work, we will be using the valueDict lexicon [20] for 
labeling value systems in the content written by users as it 
is one of the first lexicons to be proposed for the purpose 
of labeling value systems. This lexicon contains words 
associated with each of the value system categories. The 
lexicon was created by taking a set of seed users. For these 
users, their value system and the associated words for each 
value system category were inferred by investigating the 
content written by them in their descriptions and the 
tweets. The strength of the lexicon was increased by 
generating synonyms using word2vec embeddings. A 
validation of the lexicon was applied on an additional set 
of users taking their descriptions as the ground truth label 
for their value system. Table 1 summarizes the findings of 
past studies around sentiments, emotions and topic-
specific emotions in addition to other user and tweet 
features used. 
 
Table 1: Summary of Related Works. 
3 Data collection 
The data for this research was collected through the Twint 
API [49][3]. For this study, we selected a set of 126 seed 
users manually based upon the average activity of the user 
per day. These users had an activity of posting a 
tweet/retweet on an average 4-5 times per day. For these 
users, their latest 700 tweet/retweet data were collected 
which was used in constructing the user-interest profile. 
As a next step, a list of 60 followees each was fetched 
for these users from the Twint API. 
To create the target dataset, we needed to create 
positive and negative samples for each user. We 
considered all the retweets of the user within a given time 
as the positive samples for the target dataset. 
To create the negative samples, we fetched the list of 
retweet authors for each of the seed users.  
A retweet author is the author whose tweets, a seed 
user has retweeted. If these retweet authors were also a 
270 Informatica 45 (2021) 267–276  S. Kakar et al. 
i 
followee of the seed user, we collected latest 700 tweets 
of the author within the same timeline of user’s retweets. 
Both datasets were merged together to form a list of 
interesting and non-interesting tweets for each user. This 
process was repeated for all the seed users. The final 
dataset amounted to 17,180 users with 2,15,312 
tweet/retweet data. Table 2 presents a summary of the data 
statistics used. 
 
Table 2: Data Statistics. 
4 Methodology 
To prepare the data for retweet prediction problem, for 
each of the seed user, we prepared a list of interesting and 
non-interesting tweets. The interesting tweets were the 
tweets that the user found interesting and retweeted. 
Whereas, to collect the non-interesting tweets, we 
collected tweets of their followees where the seed user did 
not retweet. This enabled us to create positive and negative 
samples for each of the user. The methodology resulted in 
a total of 2,15,312 tweet/retweet data with 17,180 users.  
4.1 Feature generation 
4.1.1 Value systems 
Value system of an individual is a potential predictor of 
the decisions, they are likely to take in their life. Past 
studies have shown the significance of content-based 
analysis of value systems. Hence, our work uses content 
of the tweet to determine the value system of the user. We 
use the valueDict lexicon for labeling the users with their 
value system [20]. The authors in this study proposed this 
lexicon which contains words relative to each of the value 
system categories. They then used it to study the 
prominent value systems in developing and developed 
regions of the world. 
The following strategy is used to calculate the value 
system for a user in our study: The value system of a user 
can be represented by w dimensions, (where, w=5) 
namely: self-transcendence, self-enhancement, 
conservation, hedonism, openness to change 
V  =     {V 1, V 2, . . . V w} 
Suppose, U = {U 1, U 2, ..U n} be the set of users 
And let Tw i = {Tw i1, Tw i2….
Twiz} be the set of 
tweets for i
th
 user, where 
1 ≤ i ≤ n 
1 ≤ j ≤ z i 
Then let n ijk be the number of hits found in the lexicon, 
for each 𝑣 𝑘 where 𝑣 𝑘 ∈ V, for a tweet Tw ij, 
The score of 𝑣 𝑘 for a user i, can be then calculated as: 
                          𝑆 𝑖𝑘
= ∑ 𝑛 𝑖 𝑗 𝑘 𝑧 𝑖 𝑗 = 1
 
Hence, the total score of the value system for a user i, can 
be represented as: 
𝑆 = ma x ⁡ { 𝑆 𝑖𝑘
} 
The user is labeled with the value system which has the 
score S.  The value system of a tweet is calculated 
similarly. 
4.1.2 Value similarity score 
This feature intends to capture how similar a target tweet 
is, to user's past interests. 
Suppose a target tweet for a user u, has a value system 
V j, then the similarity score for this tweet can be calculated 
as: 
𝑆𝑖𝑚𝑖𝑙 𝑎𝑟 𝑖 𝑡 𝑦 𝑠𝑐 𝑜𝑟𝑒 = 𝑇 𝑜𝑡𝑎 𝑙 𝑗 / 𝑋 
Where Total j is the total number of tweets/retweets in 
the u’s profile reflecting value system V j, and X is the total 
number of tweets/retweets posted by u. 
4.1.3 Emotions and sentiments 
Human emotions can be represented by eight basic 
emotions namely, anger, disgust, sadness, trust, joy, 
surprise, fear and anticipation [37]. Our work uses this 
theory of emotions proposed by the author. 
Sentiments can be viewed as the opinions of people 
on certain objects/events. It can be classified into positive 
and negative sentiments. 
For our work, we use the NRC word-emotion lexicon 
[31], to determine the emotion and sentiment score of the 
tweet. Our methodology of extracting emotions and 
sentiments is inspired by [11]. 
For simplicity, we treat emotions and sentiments 
together and calculate a single score for them, therefore, 
we may sometimes refer to this combined score as the 
emotional score of the tweet. Let emotions and sentiments 
be represented by a 10-dimensional vector for a tweet Tw i 
for a user i: 
𝐸 = { 𝐸𝑆
𝑖 1 ,
𝐸𝑆
𝑖 2 , … … .
𝐸𝑆
𝑖 10
} 
Let n ki be the number of hits found in the NRC lexicon   
for emotion dimension, ES k , 
where ES k ∈ E, then the emotion/sentiment score for 
ES k can be determined by: 
𝑆 𝑖𝑘
= ⁡ 𝑛 𝑘𝑖
 
To give more weight to the emotions that are 
dominating, we calculate the fraction of the matching 
words found in the lexicon for a tweet Tw i, multiplying 
with the number of matching words found in the lexicon. 
The resultant emotional vector for tweet, Tw i , is of the   
form: 
{S ik1, S ik2. . . .S ik10} 
Value-Based Retweet Prediction on Twitter Informatica 45 (2021) 267–276 271 
To further simplify, we convert the scores in this 
vector to binary scores based on a threshold as past studies 
have shown that a tweet may reflect more than one 
emotion. We consider the threshold as the mean of the 
emotional scores in the vector. If a score is greater than the 
threshold, we mark it as a 1 else a 0. However, for the 
sentiments, we consider the bigger of the two sentiment 
value based on if the tweet reflects a higher Positive value 
or a higher negative value. 
4.1.4 Topic-specific emotion 
This feature, given the topic of a tweet reflecting certain 
emotional states, captures its similarity with the user’s 
emotional states on this topic. 
To create this feature, we first extracted topic out of a 
tweet using LDA GIBBS sampling method[13]. We then 
used conditional probabilities to extract topic specific 
emotions for the target tweet using the method suggested 
by [11]. 
A target tweet, for a user, in such a case can be 
represented by a vector containing conditional 
probabilities, { 𝑃 ( 𝐸𝑆
1
| 𝑇 𝑖 ) , 𝑃 ( 𝐸𝑆
2
| 𝑇 𝑖 ) … . 𝑃 ( 𝐸𝑆
10
| 𝑇 𝑖 ) }, 
for all emotion dimensions given a specific topic the user 
is interested in. 
  These conditional probabilities can be defined as: 
𝑃 ( 𝐸𝑆
𝑗 | 𝑇 𝑖 ) = ⁡ 𝑃 ( 𝐸𝑆
𝑗 , 𝑇 𝑖 ) / 𝑃 ( 𝑇 𝑖 ) ⁡ 
Where 𝑃 ( 𝐸𝑆
𝑗 , 𝑇 𝑖 ) is the probability of emotion 
dimension ES j and topic T i occurring together in user’s 
profile and,  
𝑃 ( 𝑇 𝑖 ) is the probability of user’s tweets/retweets 
reflecting topic T i. 
Mathematically, it can be written as: 
𝑃 ( 𝐸𝑆
𝑗 , 𝑇 𝑖 ) = 𝑇 𝑜𝑡𝑎 𝑙 𝑖𝑗
/ 𝑋 
𝑃 ( 𝑇 𝑖 ) = 𝑇 𝑜 𝑡𝑎 𝑙 𝑖 / 𝑋 
where, Total ij are the total number of tweets/retweets 
where emotion ES j and Topic T i co-occur in user’s profile, 
X is the total number of tweets/retweets posted by the user 
and Total i is the total number of tweets/retweets of the user 
on topic T i. 
4.1.5 Conventional features 
URL’s and Hashtags 
URL’s and Hashtags have been an important factor in 
determining the retweet decision of the user [46][42][1]. 
For our work, we checked if the URLs and hashtags in the 
target tweet is similar to their user profile. If so, we create 
a score of 1 else a 0. The URLs and hashtags interest were 
taken from the user interest profile. 
User Interest Vector 
Text Similarity is a well-known algorithm for the task of 
retweet prediction [46][42][26][1]. To compute text 
similarity between the target tweet and the past 
tweets/retweets of the user, we create user interest vector 
and interest vector for the target tweet by using word2vec 
algorithm [30]. Cosine Similarity is used to further 
calculate the text similarity between the two vectors. 
5 Experiment 
We performed separate experiments to evaluate value-
based, emotion/sentiment based and topic-specific 
emotion-based models for the task of retweet prediction.  
Conventional features were used in conjunction with these 
models. 
To perform the experiment, the target tweet/retweet 
dataset was divided into training and test set with a test 
ratio of 0.3. Each model was trained on the train set and 
evaluated on the test set. All the models were run using 
four different classifiers: Random Forest, Logistic 
Regression, XGB and GBT. A 10-fold cross-validation 
was performed to get optimal parameter values for the 
models in order to avoid overfitting. 
These experiments were implemented using python 3 
with a PyCharm editor on a machine with a processor of 
2.2 GHz 6-core Intel Core i7 and memory of 16 GB. 
5.1 Data checks and preprocessing 
To prepare the modeling data, several data checks and 
preprocessing techniques were applied including 
skewness checks, handling null values and encoding the 
categorical features. As the target label was highly 
imbalanced, we used SMOTE sampling to balance out the 
imbalance between the class labels [6]. 
5.2 Modeling 
For our work, we built the following models to compare 
the value-based model with previously used models for 
retweet behavior prediction, namely, emotion/sentiment-
based model, topic-specific emotion-based model. We 
also compared our work with one of the baseline models 
proposed in previous studies [15][11]. This baseline 
model is called as the user-interest model. 
5.2.1 Value-based Model (VM) 
This model explores the impact of using user’s value 
systems on their retweet decisions. The model uses 
features based on the value systems viz target value 
system and the similarity score between the target value 
system and value system in the user interest profile.  
5.2.2 Emotion-based Model (EM) 
This model intends to capture the effect of user’s emotions 
and sentiments on their retweet behavior. The model uses 
the 10-dimensional emotion and sentiment score extracted 
by the process described in the Feature generation section. 
5.2.3 Topic-Specific Emotion Model (TSM) 
The topic-specific model was built to investigate the effect 
of topic specific emotions on user’s retweeting decision, 
as different users can express different emotions for a 
specific topic. It uses the 10-dimensional conditional 
probabilities score to predict the retweet decision of the 
user. The probabilities are calculated using conditional 
272 Informatica 45 (2021) 267–276  S. Kakar et al. 
probability of an emotion dimension, given a specific 
topic. This tells us how likely the user is to express an 
emotion given a specific topic. 
5.2.4 User Interest Model (UIM) 
This model is used as a baseline model and intends to 
explore the text similarity between the user interest vector 
and the target tweet. The vectors are created using the 
word2vec algorithm. Cosine Similarity is used to infer the 
similarity between user interest vector and target tweet 
vector. 
To calculate the accuracy of our retweet predictions 
models, we used the accuracy metric which can be defined 
as the ratio of number of correctly classified instances to 
the total number of instances.  
 
 
Figure 1: Model Accuracies for a) Value based Model b) 
Emotion based model c) Topic-specific emotion-based 
model d) User Interest Model. 
6 Results and discussion 
All the above models were initially evaluated on the 
accuracy metric. 
Figure 1 shows the accuracies of value-based models 
along with previously used models for the task of 
predicting retweet decision of users. The accuracies are 
calculated using classifiers namely, Random Forest (RF), 
Logistic Regression (LR), XGB (Extreme gradient 
boosting trees), GBT (Gradient boosting trees). The figure 
shows 4 sub-parts demonstrating the accuracies of value-
based model, emotion-based model, topic-specific 
emotion-based model and user interest model 
respectively. 
The value-based model (VM model) uses value 
system and the value similarity score between target tweet 
and user profile. This model has a comparable 
performance across all the classifiers used, with XGB 
performing slightly better than others. 
The emotion-based model (EM model) simply uses 
the 10-dimensional emotion vector as a feature for the 
prediction. As seen in Figure b), this model as well has a 
comparable performance across all classifiers used with a 
slight improvement with the random forest classifier. 
Figure c) shows the topic-specific emotion-based 
model (TSM model) accuracy for the task of retweet 
prediction. It uses the topic-specific emotion feature for 
predicting retweet behavior by using the conditional 
probability of an emotion given a topic in the target tweet. 
For this model, it can be seen that logistic regression 
performs the best when drawn a comparison with other 
classifiers. 
The user-interest model (UIM model) uses the cosine 
similarity between the user interest vector and the target 
vector as a feature. The accuracy of this model varies with 
the type of classifier used. We can see that when using 
logistic regression classifier, this model performs the 
worst but shows a great improvement when tested with 
other classifiers. 
 
Table 3: A comparison of various models on the basis of 
accuracy. 
 
Table 4: A comparison of various models based on 
precision, recall and F1 score. 
Table 3 and 4 shows the comparative performance 
between value-based model with previously used retweet 
prediction models. Table 3 draws a comparison between 
different models based on accuracy. We used four 
classifiers viz, Random Forest, XGB, GBT and Logistic 
Regression, for each of the models to be compared. 
 The user-interest model (UIM model) uses the cosine 
similarity between the user interest vector and the target 
vector as a feature. The topic-specific emotion-based 
model (TSM model) uses the topic-specific emotion 
feature for predicting retweet behavior. It uses the 
conditional probability of an emotion given a topic in the 
target tweet.  Emotion-based model (EM model) simply 
uses the 10-dimensional emotion vector as a feature for 
the prediction. Value based model (VM model) on the 
Value-Based Retweet Prediction on Twitter Informatica 45 (2021) 267–276 273 
other hand, uses value system and the value similarity 
score between target tweet and user profile.  
As it can be seen, the UIM model achieves the worst 
performance as compared to other models. This is in 
conformance to our expectation as previous studies use 
this model as a baseline [15][11]. TSM model shows a 
better performance than the EM model across all 
classifiers, however, EM model has an improved accuracy 
with Random Forest classifier. This indicates that mutual 
effect of topic and emotions can be treated as a comparable 
feature to the use of emotions for predicting retweet 
behaviors.  
Comparing VM model to TSM and EM model, VM 
model has an improved accuracy across all classifiers.  
This indicates that using value systems of an 
individual can prove to be potential predictor of their 
retweet decision. Also, we believe that the use of 
word2vec model to generate similar words for the 
valueDict lexicon captures the underlying contextual 
information in the content which when used to label the 
value systems of users helps in having a higher 
performance for the retweet prediction task.  
Accuracy is a good metric when the distribution of our 
target is balanced. However, in case of imbalanced 
classes, it is good to evaluate our test set based on other 
metrics like precision, recall and F1 score.  
Precision is the ratio of correctly classified true 
instances to total classified instances as positive. Recall is 
the ratio of correctly identified true positives to the total 
instances that were originally positive. 
Precision can be also written as: 
𝑃 𝑟 𝑒 𝑐𝑖 𝑠 𝑖 𝑜 𝑛 = 𝑇 𝑟 𝑢 𝑒 ⁡ 𝑃 𝑜𝑠 𝑡𝑖𝑣𝑒 𝑠 / ( 𝑇 𝑟 𝑢 𝑒 ⁡ 𝑃 𝑜𝑠 𝑖 𝑡𝑖𝑣𝑒 𝑠 + 𝐹 𝑎𝑙 𝑠 𝑒 ⁡ 𝑃 𝑜𝑠 𝑖 𝑡𝑖 𝑣𝑒 𝑠 ) 
 Recall can also be expressed as in: 
𝑅𝑒𝑐𝑎 𝑙𝑙 = 𝑇 𝑟 𝑢 𝑒 ⁡ 𝑃 𝑜𝑠 𝑖 𝑡 𝑖 𝑣 𝑒𝑠 / ( 𝑇 𝑟 𝑢 𝑒 ⁡ 𝑃 𝑜𝑠 𝑖 𝑡𝑖𝑣 𝑒𝑠 + 𝐹 𝑎𝑙 𝑠 𝑒 ⁡ 𝑁𝑒𝑔 𝑎𝑡𝑖 𝑣𝑒 𝑠 ) 
F1 score is a metric that represents the harmonic mean 
of precision and recall. It is important to look at this metric 
as a model with a very high precision and a very low recall 
is also not considered to be a useful model. Hence F1 score 
provides a mean to judge the performance of both metrics. 
Hence, we present a comparison based on evaluation 
metrics that we used for our test set, namely, precision, 
recall and F1 score in Table 4. 
The Table presents a comparison between our 
proposed value-based model with the previous state of art 
models used for the given classification task. 
A similar pattern as that in accuracy can be seen in 
these metrics while comparing across the different 
models. The precision of the VM model proves to be 
higher as compared to TSM, EM and UIM models across 
all classifiers. This again proves the ability of using value 
systems as a feature for the retweet prediction.  
Comparing the TSM and EM model in terms of 
precision, we can see that TSM model shows a higher 
precision when used with all classifiers except Random 
Forest. This proves again that both features can be said to 
be potential predictors rather than one being superior to 
the other.  
Looking at the recall, we see that almost all the 
models have a comparable performance, with VM model 
having a slightly better performance than others. 
However, to have a look at both the measures jointly, 
we consider the harmonic mean of precision and recall, 
used as F1 score for our evaluation. Through the results, 
we can see that VM model has a higher F1 score as 
compared to all the other models. 
As expected, the baseline using UIM Model has a 
lower performance for all the metrics. 
These results confirm the importance of value systems 
as a potential predictor of retweet behaviors in addition to 
state of art features previously used: emotions, sentiments, 
and topic-specific emotions. This work can be used in all 
the applications of retweet prediction including viral 
marketing, emergency response and tweet 
recommendation. Value Systems of an individual can also 
be used in practice to identify spammers. 
7 Conclusion 
Predicting retweet decisions of a user is a challenging 
problem. The retweet behavior of a user correlates with 
factors like emotions, sentiments, topic-specific emotions 
as studied and showed by the past studies. Value systems 
have also been shown in the past studies as an important 
predictor of user decisions, however, its impact was not 
yet explored in the domain of retweet prediction. Hence, 
in this work, our objective was to explore the impact of 
value systems on the retweet decisions of the user. Value 
Systems, being a latent attribute of a user have a potential 
to have a good predictive power in deciphering retweet 
behavior of the user. We presented a value-based model 
explaining the methodologies to extract value related 
features. We also compared our model with previous state 
of art models used. Through different experiments, our 
work shows that value systems, are indeed an important 
factor in predicting retweeting decisions of the user. The 
future work of our paper includes studying and comparing 
other state of art models with value-based models. 
References 
[1] Abel, F., Gao, Q., Houben, G. J., Tao, K. 2011. 
Analyzing user modeling on twitter for personalized 
news recommendations. In international conference 
on user modeling, adaptation, and personalization, 
pages 1–12. Springer. 
https://doi.org/10.1007/978-3-642-22362-4_1 
[2] Ali, S., Katoma, V., Tyobeka, E. 2015. Identification 
of key values and behaviours influencing leadership 
orientation in Southern Africa. Journal of Emerging 
Trends in Educational Research and Policy Studies, 
6(1):6–12. 
[3] Bonsón, E., Perea,  D.,  Bednárová,  M. 2019.  
Twitter  as a tool for citizen engagement:  An 
empirical study  of the Andalusian municipalities. 
Government Information Quarterly, 36(3):480–489. 
https://doi.org/10.1016/j.giq.2019.03.001 
[4] Boyd, D., Goder, S., Lotan, G. 2010. Tweet, tweet, 
retweet: Conversational aspects of retweeting on 
274 Informatica 45 (2021) 267–276  S. Kakar et al. 
twitter. In 2010 43rd Hawaii International 
Conference on System Sciences, pages 1–10. IEEE. 
https://doi.org/10.1109/hicss.2010.412 
[5] Can, E. F., Oktay, H., & Manmatha, R. (2013, 
October). Predicting retweet count using visual cues. 
In Proceedings of the 22nd ACM international 
conference on information & knowledge 
management (pp. 1481-1484). 
https://doi.org/10.1145/2505515.2507824 
[6] Chawla, N. V., Bowyer, K. W., Hall, L. O., 
Kegelmeyer, W. P. 2002. SMOTE: Synthetic 
Minority Over- sampling Technique. Journal of 
Artificial Intelligence Research, 16:321–357. 
https://doi.org/10.1613/jair.953 
[7] Chen, J., Hsieh, G., Mahmud, J. U., Nichols, J. 2014. 
Understanding individuals’ personal values from 
social media word use. In Proceedings of the 17th 
ACM conference on Computer supported 
cooperative work & social computing, pages 405–
414. ACM. 
https://doi.org/10.1145/2531602.2531608 
[8] Chen, K., Chen, T., Zheng, G., Yao, J. O., Yu, E., Y 
2012. Collaborative personalized tweet 
recommendation. Proceedings of the 35th 
international ACM SIGIR conference on Research 
and development in information retrieval 2012, 
ACM, pages 661–670. 
https://doi.org/10.1145/2348283.2348372 
[9] Dehghani, M., Gratch, J., Sachdeva, S., Sagae, K. 
2011. Analyzing conservative and liberal blogs 
related to the construction of the ‘Ground Zero 
Mosque’. Proceedings of the Annual Meeting of the 
Cognitive Science Society, 33.  
[10] Deng, Z., Yan, M., Sang, J., Xu, C. 2015. Twitter is 
faster: personalized time-aware video 
recommendation from Twitter to YouTube. ACM 
Trans Multimed Comput Commun Appl (TOMM), 
11(2):31–31. https://doi.org/10.1145/2637285 
[11] Firdaus, S. N., Ding, C., Sadeghian, A. 2019. Topic 
specific emotion detection for retweet prediction. 
International Journal of Machine Learning and 
Cybernetics, 10(8):2071–2083. 
https://doi.org/10.1007/s13042-018-0798-5 
[12]  Fleischmann, K. R., Oard, D. W., Cheng, A.-S., 
Wang, P., Ishita, E. 2009. Automatic classification 
of human values: Applying computational thinking 
to information ethics. Proceedings of the American 
Society for Information Science and Technology, 
46(1):1–4. 
https://doi.org/10.1002/meet.2009.1450460345 
[13] Griffiths, T. L. Steyvers, M. 2004. Finding scientific 
topics. Proceedings of the National Academy of 
Sciences, 101(Supplement 1):5228–5235. 
https://doi.org/10.1073/pnas.0307752101 
[14] Hong, O., Dan, B. D., Davison 2011. Predicting 
popular messages in twitter. Proceedings of the 20th 
international conference companion on World wide 
web, pages 57–58. 
https://doi.org/10.1145/1963192.1963222 
[15] Huang, D., Zhou, J., Mu, D., Yang, F. 2014. Retweet 
behavior prediction in twitter. 2014 IEEE Seventh 
international symposium computational intelligence 
and design (ISCID), 2:30–33. 
https://doi.org/10.1109/iscid.2014.187 
[16] Ishita, E., Oard, D. W., Fleischmann, K. R.,  Cheng,  
A.-S.,  Templeton,  T. C. 2010.  Investigating multi-  
label classification for human values. Proceedings of 
the American Society for Information Science and 
Technology, 47(1):1–4. 
https://doi.org/10.1002/meet.14504701116 
[17] Jenders, M., Kasneci, G., Naumann,  F. 2013.  
Analyzing and predicting viral tweets.  Proceedings 
of the  22nd international conference on world wide 
web 2013, ACM, pages 657–664. 
https://doi.org/10.1145/2487788.2488017 
[18] Jiang, B., Lu, Z., Li, N., Wu, J., Jiang, Z. 2018. 
Retweet prediction using social-aware probabilistic 
matrix factorization. International Conference on 
Computational Science, pages 316–327. 
https://doi.org/10.1007/978-3-319-93698-7_24 
[19] Jiang, B., Yi, F., Wu, J., Lu, Z. 2019. Retweet 
prediction using context- aware coupled matrix-
tensor factorization. International Conference on 
Knowledge Science, Engineering and Management, 
pages 185– 196. 
https://doi.org/10.1007/978-3-030-29551-6_17 
[20] Kakar, S., Dhaka, D., Mehrotra, M. 2020. Value-
Based Behavioral Analysis of Users Using Twitter. 
In Inventive Communication and Computational 
Technologies, Lecture Notes in Networks and 
Systems, Springer, volume 145. 
https://doi.org/10.1007/978-981-15-7345-3_23 
[21] Kanavos, A., Perikos, I., Vikatos, P., 
Hatzilygeroudis, I., Makris, C., Tsakalidis, A. 2014. 
Modeling retweet diffusion using emotional content. 
In IFIP International conference on artificial 
intelligence applications and innovations, pages 
101–110. Springer. 
https://doi.org/10.1007/978-3-662-44654-6_10 
[22] Kaufmann, E. 2016. It’s NOT the economy, stupid: 
Brexit as a story of personal values. British Politics 
and Policy at LSE. 
[23] Lee, K., Mahmud, J., Chen, J., Zhou, M., & Nichols, 
J. 2015. Who will retweet this? detecting strangers 
from twitter to retweet information. ACM 
Transactions on Intelligent Systems and Technology 
(TIST), 6(3), 1-25. https://doi.org/10.1145/2700466 
[24] Lee, W. J., Oh, K. J., Lim, C. G., Choi, H. J. 2014. 
User profile extraction from twitter for personalized 
news recommendation. 16th International 
conference on advanced communication technology, 
pages 779–783. 
https://doi.org/10.1109/icact.2014.6779068 
[25] Lu, C., Lam, W., Zhang, Y. 2012. Twitter user 
modeling and tweets recommendation based on 
Wikipedia concept graph. Workshops at the Twenty-
Sixth AAAI conference on artificial intelligence. 
[26] Luo, Z., Osborne, M., Tang, J., Wang, T. 2013. Who 
will retweet me? Finding retweeters in Twitter. 
Proceedings of the 36th international ACM SIGIR 
conference on Research and development in 
information retrieval, pages 869–872. 
Value-Based Retweet Prediction on Twitter Informatica 45 (2021) 267–276 275 
https://doi.org/10.1145/2484028.2484158 
[27] Ma, R., Hu, X., Zhang, Q., Huang, X., Jiang, Y. G. 
2019. Hot topic-aware retweet prediction with 
masked self-attentive model. Proceedings of the 
42nd International ACM SIGIR Conference on 
Research and Development in Information Retrieval, 
pages 525–534. 
https://doi.org/10.1145/3331184.3331236 
[28] Macskassy, S. A. Michelson, M. 2011. Why do 
people retweet? Anti-homophily wins the day! In 5th 
international AAAI conference on weblogs and 
social media, pages 209–216. 
[29] Mazzi, M. A., Rimondini, M., van der Zee, E., 
Boerma, W., Zimmermann, C., Bensing, J. 2018. 
Which  patient and doctor behaviours make a 
medical consultation more effective from a patient 
point of view. Results from a European multicentre 
study in 31 countries. Patient Education and 
Counseling, 101(10):1795-1803. 
https://doi.org/10.1016/j.pec.2018.05.019 
[30] Mikolov, T., Chen, K., Corrado, G., Dean, J. 2013. 
Efficient estimation of word representations in 
vector space. 
[31] Mohammad, S. M. Turney, P. D. 2013. 
CROWDSOURCING A WORD-EMOTION 
ASSOCIATION LEXICON. Computational 
Intelligence, 29(3):436–465. 
https://doi.org/10.1111/j.1467-8640.2012.00460.x 
[32] Naveed, N., Gottron, T., Kunegis, J., Alhadi, A. C. 
2011. Bad news travel fast: A content-based analysis 
of interestingness on twitter. In Proceedings of the 
3rd international web science conference, pages 1–7. 
ACM. https://doi.org/10.1145/2527031.2527052 
[33] Peng, H. K., Zhu, J., Piao, D., Yan, R., Zhang, Y. 
2011. Retweet modeling using conditional random 
fields. 2011 IEEE 11th International conference on 
data mining workshops (ICDMW), pages 336–343. 
https://doi.org/10.1109/icdmw.2011.146 
[34] Perikos, I. Hatzilygeroudis, I. 2013. Recognizing 
emotion presence in natural language sentences. In 
International conference on engineering applications 
of neural networks 2013, pages 30–39. Springer. 
https://doi.org/10.1007/978-3-642-41016-1_4 
[35] Petrovic, S., Osborne, M., & Lavrenko, V. (2011, 
July). Rt to win! predicting message propagation in 
twitter. In Proceedings of the International AAAI 
Conference on Web and Social Media (Vol. 5, No. 
1). 
[36] Pfitzner, R., Garas, A., Schweitzer, F. 2012. 
Emotional divergence influences information 
spreading in Twitter. Sixth international AAAI 
conference on weblogs and social media, 12. 
[37] Plutchik, R. 2001. The Nature of Emotions. 
American Scientist, 89(4):344–344.  
[38] Rao, Y., Li, Q., Wenyin, L., Wu, Q., Quan, X. 2014. 
Affective topic model for social emotion detection. 
Neural Networks, 58:29–37. 
https://doi.org/10.1016/j.neunet.2014.05.007 
[39] Rathbun, B. C., Kertzer, J. D., Reifler, J., Goren, P., 
Scotto, T. J. 2016. Taking Foreign Policy Personally: 
Personal Values and Foreign Policy Attitudes. 
International Studies Quarterly, 60(1):124–137. 
https://doi.org/10.1093/isq/sqv012 
[40] Roberts, K., Roach, M. A., Johnson, J., Guthrie, J., 
Harabagiu, A. M. 2012. Empatweet: annotating and 
detecting emotions on Twitter. LREC 12, 12:3806–
3813. 
[41] Schwartz,  S. H. 1994.  Are there universal aspects 
in the structure and contents of human values?   
Journal of social issues, 50(4):19–45. 
https://doi.org/10.1111/j.1540-4560.1994.tb01196.x 
[42] Suh, B., Hong, L., Pirolli, P., Chi, E. H. 2010. Want 
to be retweeted? large scale analytics on factors 
impacting retweet in twitter network. In and others, 
editor, IEEE Second International Conference on 
Social Computing. 
https://doi.org/10.1109/socialcom.2010.33 
[43] Templeton, T. C., Fleischmann, K. R., Boyd-Graber, 
J. 2011. Simulating audiences: Automating analysis 
of values, attitudes, and sentiment. 2011 IEEE Third 
International Conference on Privacy, Security, Risk 
and Trust and 2011 IEEE Third International 
Conference on Social Computing, pages 734–737. 
https://doi.org/10.1109/passat/socialcom.2011.238 
[44] Wang, Q., Li, L., Wang, D. D., Zeng 2017. 
Incorporating message embedding into co-factor 
matrix factorization for retweeting prediction. 
International Joint Conference on Neural Networks 
(IJCNN), pages 1265–1272. 
https://doi.org/10.1109/ijcnn.2017.7965998 
[45] Wang, W., Zuo, Y., Wang 2015. A multidimensional 
nonnegative matrix factorization model for 
retweeting behavior prediction. Mathematical 
Problems in Engineering. 
https://doi.org/10.1155/2015/936397 
[46] Xu, Z. Yang, Q. 2012. Analyzing user retweet 
behavior on twitter. Proceedings of the 2012 
international conference on advances in social 
networks analysis and mining, pages 46–50. 
https://doi.org/10.1109/asonam.2012.18 
[47] Yang, Z. 2010. Understanding retweeting behaviors 
in social networks. CIKM, pages 1633–1636. 
https://doi.org/10.1145/1871437.1871691 
[48] Ye, S., Soutar, G. N., Sneddon, J. N., Lee, J. A. 2017. 
Personal values and the theory of planned behaviour: 
A study of values and holiday trade-offs in young 
adults. Tourism Management, 62:107–109. 
https://doi.org/10.1016/j.tourman.2016.12.023 
[49] Zacharias, C. 2017. Twint-twitter intelligence tool. 
[50] Zhang, J., Tang, J., Li, J., Liu, Y., Xing, C. 2015a. 
Who influenced you? predicting retweet via social 
influence locality. ACM Trans. Knowl. Disc. Data 
(TKDD), 9(3):25–25. 
https://doi.org/10.1145/2700398 
[51] Zhang, K., Yun, X., Liang, J., Zhang, X. Y., Li, C., 
Tian, B. 2016. Retweeting behavior prediction using 
probabilistic matrix factorization. IEEE Symposium 
on Computers and Communication (ISCC). 
https://doi.org/10.1109/iscc.2016.7543897 
[52] Zhang, Q., Gong, Y., Guo, Y., Huang, X. 2015b. 
Retweet behavior prediction using hierarchical 
276 Informatica 45 (2021) 267–276  S. Kakar et al. 
dirichlet process. Twenty-Ninth AAAI Conference 
on Artificial Intelligence.