1
Emotion analysis in socially 
unacceptable discourse
Jasmin FRANZA 
Faculty of Arts, University of Ljubljana
Bojan EVKOSKI 
Jožef Stefan International Postgraduate School; Jožef Stefan Institute
Darja FIŠER
Faculty of Arts, University of Ljubljana; Jožef Stefan Institute;  
Institute of Contemporary History
Texts often express the writer’s emotional state, and it was shown that emo-
tion information has potential for hate speech detection and analysis. In this 
work, we present a methodology for quantitative analysis of emotion in text. 
We define a simple, yet effective metric for an overall emotional charge of text 
based on the NRC Emotion Lexicon and Plutchik’s eight basic emotions. Using 
this methodology, we investigate the emotional charge of content with socially 
unacceptable discourse (SUD), as a distinct and potentially harmful type of 
text which is spreading on social media. We experiment with the proposed 
method on a corpus of Facebook comments, resulting in four datasets in two 
languages, namely English and Slovene, and two discussion topics, LGBT+ 
rights, and the European Migrants crisis. We reveal that SUD content is sig-
nificantly more emotional than non-SUD comments. Moreover, we show dif-
ferences in the expression of emotions depending on the language, topic, and 
target of the comments. Finally, to underpin the findings of the quantitative 
Franza, J., Evkoski, B., Fišer, D.: Emotion analysis in socially unacceptable 
discourse. Slovenščina 2.0, 10(1): 1–22. 
1.01 Izvirni znanstveni članek / Original Scientific Article
DOI: https://doi.org/10.4312/slo2.0.2022.1.1-22
https://creativecommons.org/licenses/by-sa/4.0/
2
Slovenščina 2.0, 2022 (1) | Razprave
investigation of emotions, we perform a qualitative analysis of the corpus, ex-
ploring in more detail the most frequent emotional words of each emotion, for 
all four datasets. The qualitative analysis shows that the source of emotions 
in SUD texts heavily depends on the topic of discussion, with substantial over-
laps between languages.
Keywords: emotions, socially unacceptable discourse (SUD), hate speech, so-
cial media, corpora
1  Introduction
Emotions are a key component of human behaviour and communication. 
Most often, we use language to manifest, transmit and explain emotions. 
Meanwhile, the continuously increasing popularity of social media pro-
duces unprecedented amounts of user-generated content from people 
all around the world and in all languages. Oftentimes, this content (posts, 
comments, descriptions etc.) includes words that reveal the scope of 
emotions the author tries to unveil while evoking specific emotions from 
the reader as well. The social media era has also introduced very open 
outbursts of socially unacceptable discourse (SUD), such as hate, dis-
criminatory, offensive or threatening speech. This has given rise to the 
necessity of analysing SUD communication practices in order to better 
understand and effectively tackle them. Here we dive into the field of 
Emotion Recognition (ER), which aims to recognize and categorize ver-
balized emotions in texts. By doing so, we hope to understand what out-
lines SUD content through the viewpoint of emotions.
In this paper, we introduce a novel, yet simple method to analyse 
emotions in text by utilizing the NRC Emotion Lexicon (Mohammad and 
Turney, 2010). A metric which we name Emotional Charge (EC), calcu-
lates the overall emotion intensity of a comment. We utilize our approach 
on social media content by answering how emotions depend regarding 
the language, topic and most importantly, its SUD contribution.
For that purpose, we focus on emotions expressed in socially unac-
ceptable Facebook comments in Slovene and English on the topics of 
the European migrant crisis (hereinafter referred to as Migrants) and 
LGBT+ rights (hereinafter referred to as LGBT+) from the FRENK data-
3
Emotion analysis in socially unacceptable discourse
set (Ljubešić et al., 2021), as it is a uniquely carefully annotated multi-
lingual dataset on SUD content which covers two topics. We perform a 
quantitative analysis for both languages and topics, taking into account 
the degree of emotional charge in each comment and the represen-
tation of individual categories of emotions by using the NRC emotion 
lexicon, which organizes words into one of the eight basic emotions by 
Plutchik (1980). To complement the quantitative approach, a qualita-
tive analysis of the emotional words in SUD comments is added, ena-
bling a more thorough understanding of the emotional charge findings.
The two main research questions covered in this paper are:
• Does SUD content differ from non-SUD content in the expression of 
emotions?
• Does the emotional footprint of SUD comments differ depending 
on the topic and target they address?
The paper is organised as follows. Section 2 gives an overview of 
the background and related work; Section 3 focuses on the descrip-
tion of the dataset used and describes the methods for calculating the 
emotional charge of comments. Subsequently, Section 4 presents the 
analysis on the emotional landscape of SUD, both from a statistical 
point of view and from a qualitative angle giving a deeper look at the 
emotional lexicon connected to SUD. Finally, Section 5 concludes the 
paper with a discussion and ideas for future work.
2  Background and related work
In the past decade, there has been an increase in research in the field 
of automatic detection of emotions in user-generated content (Alm 
et al., 2005; Al-Saqqa et al., 2018). However, although SUD has been 
intensively analysed in various disciplines and methodological frame-
works, approaches to SUD via emotion analysis has so far received little 
attention (Gitari et al., 2015; Martins et al., 2018). This article presents 
an approach to comprehensively analyse SUD with the help of emotion 
lexica, as Markov et al. (2021) showed that emotion-based features 
provide useful cues for its automatic detection. In this section, we pre-
sent the theoretical underpinnings for the analytical part of our study.
4
Slovenščina 2.0, 2022 (1) | Razprave
2.1  Emotions
In psychology, there is no general unanimity on the definition of emo-
tions and their number. Research mostly focuses on two approaches to 
the representation of emotions, namely the category model and the di-
mensional model (Scherer, 2005). In the category model, emotions are 
presented as sets of different basic emotional states (e.g., joy, anger) 
where basic emotions are understood as those that appear in very early 
childhood development and their expression and recognition are cultur-
ally independent. In the dimensional approach, emotions are presented 
in the space where each emotion occupies its place in an emotion dimen-
sion (e.g., value dimension: positive-negative axis, strength axis: high-
low; Russell, 1980). The categorical approach is more widespread in 
computational linguistics than the dimensional one (Aman and Szpako-
wicz, 2007; Ghazi, 2016) because it is more intuitive and easier to apply, 
especially in computational models, which is why we adopt it in the study 
presented in this paper. We use the categorization into 8 basic emotions 
according to Plutchik (1980), namely joy, sadness, anger, fear, trust, 
disgust, surprise and anticipation, as they represent the basic and proto-
typical emotions, with the combination of which we can build more com-
plex ones, e.g., love, awe, contempt. This model is also called Plutchik’s 
wheel of emotions, as each fundamental emotion also has its opposite 
emotion (e.g., joy - sadness, fear – anger; Plutchik, 2001; see Figure 1). 
Their organisation is based on the physiological purpose of each.
Martins et al. (2018) show that the most critical emotions to iden-
tify hate speech are the negative ones – anger, disgust, fear and sad-
ness as they occur in 2/3 of hate speech texts, while they claim surprise 
can be interpreted as a neutral emotion in hate speech. On the other 
hand, anticipation, joy and trust can be classified in the positive emo-
tions group.
2.2  Emotion Recognition from Text
Emotion recognition from text can be divided into two groups: the ear-
lier approaches, based on lexical datasets (Mohammad and Turney, 
2010), and the latter ones, based on annotated training corpora (Aman 
and Szpakowicz, 2007; Canales et al., 2019). In the corpus approach, 
5
Emotion analysis in socially unacceptable discourse
machine learning methods based on pre-annotated texts are employed 
to develop models for annotating new texts, while the lexical approach 
for identifying emotions in texts uses an external set of vocabulary with 
emotion tags. Due to the greater universality and adaptability to dif-
ferent domains and genres, we follow the latter paradigm. Addition-
ally, previous research confirms the adequacy of the lexical approach. 
Mohammad and Yang (2011) have successfully used the NRC Emotion 
Lexicon to identify predominant emotions in love letters, hate emails, 
and suicide records. They were mainly interested in the difference 
between the linguistic expression of emotions in men and women. A 
similar approach with the help of the sentiwordnet lexical database, 
which contains strings of synonyms with assigned sentiment tags, was 
successfully used by Denecke (2008) on machine-translated texts to 
predict sentiment.
Figure 1: PLUTCHIK’S WHEEL OF EMOTIONS. It shows 8 basic emotions: joy, trust, fear, 
surprise, sadness, anticipation, anger, and disgust. 
ANGER
ANTICIPATION
JOY
TRUST
FEAR
SURPRISE
SADNESS
ANGER
6
Slovenščina 2.0, 2022 (1) | Razprave
2.3  Socially Unacceptable Discourse (SUD)
Hate speech is a widespread phenomenon that attracts many re-
searchers from diverse areas. However, the term is usually used in a 
very narrow, legally defined sense in the literature, which is why we 
adopt the term Socially Unacceptable Discourse (SUD), comprising 
all forms of hateful, discriminatory, offensive, violent or threatening 
speech (Fišer et al., 2017). A significant part of contemporary SUD 
research takes place within critical discourse analysis in combination 
with corpus linguistics (cf. Brindle, 2016; Knoblock, 2017), intending 
to identify and analyse SUD and its evolution. Assimakopoulos et al. 
(2017) present several European research projects on SUD, in which 
the analysis of online content is predominant. They point out that EU 
legislation alone is not enough to solve the spread of online hate and 
improve its understanding, as SUD can manifest itself in many subtle 
ways, such as stereotyping and categorization, patriotism, metaphori-
cal expression, sarcasm, allusions etc., which makes comprehensive 
linguistic approaches extremely important for better awareness of the 
issue. A deeper understanding of SUD would mean principally better 
prevention and identification.
In Slovenia, the most valuable resource of SUD data is the manually 
annotated FRENK corpus of Facebook comments (Ljubešić et al., 2019; 
Ljubešić et al., 2021; see Section 3). Vehovar et al. (2020) show that 
about half of all the comments appearing in the FRENK dataset were 
identified as SUD: the share is significantly higher for the topic of Mi-
grants (58%) than for the topic of LGBT+ (48%). The dataset was also 
analysed from the linguistic point of view, revealing SUD has a different 
lexical footprint (Franza and Fišer, 2019) and showing SUD comments 
are less standard than non-SUD comments with also a lower frequency 
of emoticons/emojis and punctuation (Pahor de Maiti et al., 2019).
3  Dataset and methodology
In this section, we describe in detail how the FRENK dataset, which is 
also used in this paper, was constructed and how it was processed for 
the purposes of this analysis. Next, we present the NRC Emotion Lexi-
con and the emotion labels it contains. Based on these, the emotional 
7
Emotion analysis in socially unacceptable discourse
charge of each comment in the FRENK dataset is calculated, which is 
presented in the final subsection.
3.1  FRENK Corpus
The FRENK corpus (Ljubešič et al., 2019; Ljubešić et al., 2021)1 was 
collected from Facebook pages of three mainstream news media out-
lets for each examined language, including Slovene and English. It cov-
ers two topics, the EU Migrants crisis and the LGBT+ rights, and was 
enriched with manual annotations of the comments (Ljubešič et al., 
2019). The Slovene part of the corpus contains 30 posts with 6545 
comments for Migrants, and 93 posts with 4571 comments for LGBT+. 
The English part of the corpus consists of 16 posts with 5855 com-
ments for Migrants, and 14 posts with 5906 comments for LGBT+. Ad-
ditionally, comments were annotated for the type of SUD they produce 
(acceptable, background-violence, background-offence, other-threat, 
other-offence and unacceptable), as well as a categorization of the 
people being the target of the comment (Migrants, members of the 
LGBT+ community, persons related to Migrants or LGBT+, journalists or 
media, fellow commenter, other).
The dataset is linguistically processed with the CLASSLA pipeline 
for Slovene (Ljubešić, 2019, 2020) and Stanza for English (Peng Qi et 
al., 2020) on the levels of tokenization and sentence splitting, PoS-tag-
ging and lemmatization. Therefore, we were able to annotate the lem-
matized English and Slovene datasets with the NRC Emotion Lexicon 
for the corresponding language, which resulted in a bilingual and com-
parable emotion-labelled dataset of SUD Facebook comments that we 
analyse in the remainder of this paper.
3.2  Emotion Annotation
To identify emotions, we used the NRC Emotion Lexicon. The lexicon 
contains all words from Roget’s Thesaurus that appear more than 
120,000 times in Google’s n-gram corpus, resulting in 14,200 entries. 
1 The FRENK corpus, besides its Slovene and English parts, was created also for Croatian, 
French and Dutch, http://nl.ijs.si/frenk/english. The Dutch version was created within the Li-
LaH project: https://lilah.eu. 
8
Slovenščina 2.0, 2022 (1) | Razprave
Each word in the lexicon has a label for its polarity (positive, nega-
tive) and for Plutchik’s 8 basic emotions (anger, anticipation, disgust, 
fear, joy, sadness, surprise, trust). It was annotated manually using 
the crowdsourcing platform Amazon Mechanical Turk. The lexicon was 
originally created for English, and was later also automatically translated 
into 105 languages, including Slovene, with the help of Google Trans-
late (2017). We have performed manual post-editing of the machine-
translated lexicon (Daelemans et al., 2020). Examples of the translated 
lexicon along with the emotion labels can be found in Table 1.
Table 1: Examples of emotion annotation in the NRC Emotion Lexicon
English abandoned happiness wise ghost refugee
Slovene  
translation
opuščen, 
zapuščen, 
prekinjen, 
zavržen
sreča,  
veselje
moder duh, prikazen begunec
Anger Yes No No No No
AnticipAtion No Yes No No No
Disgust No No No No No
FeAr Yes No No Yes No
Joy No Yes No No No
sADness Yes No No No Yes
surprise No No No No No
trust No Yes No No No
Note. For each English entry, there is a manually post-edited machine translation in Slo-
vene and annotations for each of the 8 basic emotions.
3.3  Lexicon Limitations
The lexical approach is an efficient method to tackle emotion recogni-
tion from text (cf. 2.2). It is essential to work with datasets that are 
carefully prepared and verified to have reliable results. Our approach 
in this paper tests this method and achieves interesting outcomes. 
Nonetheless, it is important to also state the limitations of this specific 
emotion lexicon, the NRC emotion lexicon (Mohammad and Turney, 
2010). We identified two main issues, namely the presence of biases 
and questionable emotion labelling. 
9
Emotion analysis in socially unacceptable discourse
Our work focuses on SUD, and it is important to point out that the 
lexicon has non-neutral annotations for the two topics we are deal-
ing with, which can be linked to the lack of control and documentation 
about who the annotators were in the first place as the lexicon is the re-
sult of crowdsourcing. For example, immigrant is annotated with fear, 
fugitive with fear, anger, disgust, sadness and trust, lesbian with dis-
gust and sadness. It is possible to note that there are some prejudices 
in these labels and it could be problematic as our work aims to fight 
against biases. Moreover, some labels appear to be ambiguous. For ex-
ample, nurture is annotated with anger, anticipation, disgust, fear, joy 
and trust, which suggests contradictory emotions together and does 
not give an insightful perspective of the word.
There have been many attempts to create an emotion lexicon, but 
the NRC emotion lexicon attracted the most attention due to its avail-
ability, size, and its choice of Plutchik’s expressive eight-class emotion 
model (Zad et al., 2021). This is also the reason why we decided to use 
it, but we will take into account the potentially problematic labels in our 
interpretation of the results and we will complement the analysis with 
a qualitative study to check for potentially problematic consequences 
of using the lexicon. There have been also several attempts to improve 
the NRC emotion lexicon (cf. Zad et al., 2021), which should be further 
explored in the future.
3.4  Lexicon Coverage
Table 2 shows statistics regarding the NRC lexicon coverage of our data-
set, for each of the subsets. Lexicon coverage has been calculated as 
the percentage of unique emotionally eligible words found in the lexicon, 
which means not all of them are labelled with emotion tags. The English 
language subsets contain around 5000 to 9000 unique words, with the 
NRC lexicon coverage of 20%. Meanwhile, the Slovene subsets, although 
of similar size, contain more unique words, with around 6000 to 12,000, 
depending on the dataset. Expectedly, since the Slovene NRC lexicon is 
the result of the machine-translated English lexicon and has a generally 
higher number of unique words, the NRC lexicon coverage of the Slovene 
dataset is a bit lower (around 16%), with small differences depending 
on the subset. Manual examinations have shown that there is a small 
10
Slovenščina 2.0, 2022 (1) | Razprave
number of false positives and a higher number of false negatives, imply-
ing that the lexicon should be further improved. A random (subjective) 
sample evaluation of 100 English and 100 Slovene comments on the 
performance of the lexicon revealed the following:
• English NRC lexicon: Precision – 0.96; Recall – 0.65
• Slovene NRC lexicon: Precision – 0.91; Recall – 0.64
The low recall for both English and Slovene shows that the lexicon 
fails to recognize a large portion of the emotional words present in the 
comments, which is expected as we focus on a very specific kind of dis-
course on social media with specific characteristics on a very narrow 
topic. It is possible to find an explanation for the low recall also in the 
false negative emotionally eligible words (emotional, but not covered by 
the lexicon), as for example shootings, frightened. Moreover, SUD com-
ments exhibit a peculiar tendency towards nonstandard features (Pahor 
de Maiti et al., 2019), which compromises emotional words recognition, 
for example strelat instead of streljati (eng. to shoot). Additionally, the 
evaluation indicates a lower precision of the Slovene lexicon, which could 
be possibly explained because of more false positives (not emotional, 
but included in the lexicon), which are mainly due to polysemy and non-
canonically spelled words. For example, the Slovene lexicon contains the 
adjective sam (eng. alone), but in the comments it is used as an adverb 
(meaning just), which should not be an emotional word.
Table 2: Statistics regarding the NRC Lexicon coverage of our dataset, divided per topic, 
language and SUD/non-SUD comments
Language Topic SUD/ 
non-SUD
Comments Unique 
words
Emotionally eligible 
words (nouns, verbs, 
adjectives and adverbs)
Lexicon 
coverage
English Migrants Non-SUD 2964 8401 5291 1046 (20%)
English Migrants SUD 2867 9323 6818 1323 (19%)
English LGBT Non-SUD 1777 8514 5374 1124 (21%)
English LGBT SUD 4080 5622 4297 977 (23%)
Slovene Migrants Non-SUD 2646 8401 5889 863 (15%)
Slovene Migrants SUD 3795 12486 10020 1325 (13%)
Slovene LGBT Non-SUD 1855 6199 4745 878 (19%)
Slovene LGBT SUD 2606 10108 8392 1329 (16%)
11
Emotion analysis in socially unacceptable discourse
3.5  Calculating Emotional Charge
The final stage is to use the lemmatized comments and the lexicon to 
calculate a metric that defines the overall emotion intensity. We intro-
duce this metric in order to be able to compare comments not just on 
the level of a specific emotion, but also have a universal comparison 
which includes all, answering the questions posed in the Introduction.
We define Emotional Charge (EC) as follows: let W be the list of all 
nouns, verbs, adjectives and adverbs in one comment (as the emo-
tionally eligible word functions). Then, let WE be the list of all words in 
W which are labelled as emotional by the emotion lexicon. We define 
emotional charge EC of a comment as follows:
To put it simply, Emotional Charge (EC) calculates the portion of 
emotional words labelled by the lexicon in the total number of emo-
tionally eligible words. Note that W and WE are defined as lists and not 
as sets, thus an emotional word being present twice in a comment is 
also counted twice in the total score.
Using the emotional charge of each comment, we were able to get 
a sampling distribution of emotional charge for the desired group of 
comments (e.g., SUD vs. non-SUD, Slovene vs. English, Migrants vs. 
LGBT+). Figure 2 shows an example of the procedure for calculating 
the emotional charge.
Taking into account only word types that can contain emotion 
(nouns, verbs, adjectives and adverbs) as well as using the emotional 
charge formula that normalizes comment length makes the emotion-
al charge scores more robust. Yet, this way of calculating emotional 
charge introduces many “non-emotional” and “highly” emotional short 
comments, where the emotional charge is 0 or 1 respectively, based 
on only a few words. Thus, we made a pragmatic decision of excluding 
comments with less than three words from the rest of our analysis.
EC = ---------
|WE|
|W|
12
Slovenščina 2.0, 2022 (1) | Razprave
Figure 2: Example of an emotionally annotated sentence. 
All nouns, verbs, adjectives and adverbs are emotionally 
eligible words. Only some of them are contained in the NRC 
Emotion Lexicon. The ones in the Lexicon are counted in the 
emotional charge of the sentence.
4  Results
In this section, we present our research findings using the emotional 
charge of SUD comments for both Slovene and English language.
4.1  Emotional Charge Analysis
SUD is more emotional than non-SUD content. Here, we check whether 
emotion annotation is informative for differentiating between SUD and non-
SUD comments by comparing their emotional charge. Once we calculated 
the distribution of emotional charge for each of the groups, we applied the 
Kolmogorov-Smirnov two-sample test (Pratt and Gibbons, 1981), which 
showed a statistical difference between SUD and non-SUD across all four 
combinations of language and topic (Migrants English p=3×10-7; d=0.18, 
LGBT+ English p=3×10-8; d=0.23, Migrants Slovene p=1×10-10; d=0.18 
and LGBT+ Slovene p=3×10-4; d=0.12). The effect size d according to Co-
hen’s formula (Cohen, 1988) is considered small to medium (depending 
on the combination). Figure 3 shows the distribution mean and deviation 
of all four combinations. Thus, we conclude that a specific analysis on SUD 
content could indeed be informative as the data showed that these com-
ments are significantly more emotionally charged than non-SUD.
13
Emotion analysis in socially unacceptable discourse
Figure 3: Comparison of emotional charge between languages and topics of SUD com-
ments and non-SUD comments. The figure shows distributions (rectangles) and variance 
(lines), SUD comments are significantly more emotionally charged than non-SUD ones.
Topics differ in emotional charge – LGBT+ evokes more emotions than 
the Migrants topic. After confirming a higher emotional charge in SUD 
comments, we analysed whether one topic attracts more emotional 
charge than the other. Figure 4 shows, side-by-side, the distributions 
of the sets we compare (LGBT+ vs. Migrants). The Kolmogorov-Smirnov 
test suggests that there is a statistical difference in the emotional charge 
between Migrants and LGBT+, as both in English (p=3×10-9) and Slo-
vene (p=2×10-11) comments, the LGBT+ topic carries a higher emotional 
charge. According to Cohen’s coefficient, in English, the effect size is me-
dium (d=0.209) while in Slovene it is small (d=0.183).
Figure 4: Difference of SUD Emotional charge between the LGBT+ and migrant topics in 
English and Slovene. The figure shows distributions (rectangles) and variance (lines), re-
sulting in LGBT+ comments being more emotionally charged.
14
Slovenščina 2.0, 2022 (1) | Razprave
Comments are more emotional when targeted at Migrants/LGBT+. 
One of the metadata information of the FRENK dataset is the target of 
the comment, or in other words, who the comment is directed at. We 
compared “the commenter” and “the target – migrant/LGBT+ person” 
which are the two most frequent targets in the FRENK dataset (see Sec-
tion 3.1). The comments targeted at migrants/LGBT+ persons are explic-
itly aimed at migrants or members of the LGBT+ community, while the 
others are targeted at another commenter in the discussion thread. As 
shown in Figure 5, for both topics and languages, comments targeted 
at migrants or LGBT+ are generally more emotional than comments tar-
geted at interlocutors (fellow-commenters in the discussion thread). 
Figure 5: Comparison of emotional charge for different targets of the SUD comment, 
namely Commenter or Target (LGBT+ persons/migrants), between Slovene and English. 
The figure shows distributions (rectangles) and variance (lines).
Different topics provoke different emotions. In order to extract the 
data for a specific emotion, we calculated how much a specific emotion 
contributes to the total emotional charge. Then, by having the emo-
tional charge distribution of each particular emotion, we were able to 
compare their manifestation for the two different topics: Migrants and 
LGBT+. On average, we observed that in English comments users man-
ifest more disgust and joy for the topic of LGBT+, and more sadness, 
fear and surprise for Migrants (Figure 6). In Slovene, users manifest 
significantly more anticipation and joy for LGBT+, while for the topic of 
Migrants, they manifest more anger and fear (Figure 6). It is interest-
ing to observe that in both languages the LGBT+ topic invokes more 
15
Emotion analysis in socially unacceptable discourse
joy, while the Migrants topic invokes more fear. It is also quite evident 
that emotions are not homogenous for all four subset combinations, 
with trust and fear being the most dominant emotions with more than 
15% of the total emotional spectrum, while surprise is the least pre-
sent, taking less than 5% of the emotional spectrum.
Figure 6: Distribution of emotions in English and Slovene SUD comments for the topics 
of LGBT+ and Migrants. The figure shows averages (bars) and their confidence intervals 
(95%) for each emotion present in the NRC Lexicon.
4.2  Emotional Words Analysis
In order to better understand the above quantitative analysis and have 
a closer look at the investigated data, we performed a qualitative anal-
ysis of the emotional words in the corpus. 
Table 3 shows the three most frequent emotional words for each 
language, topic and emotion with the purpose of understanding which 
words are most commonly connected to which emotions. Some dif-
ferences and similarities among topics and languages are observed. 
For the LGBT+ topic, the English commenters seem to be more reli-
gion-oriented, frequently using words such as God, disgusting and sin. 
16
Slovenščina 2.0, 2022 (1) | Razprave
Meanwhile, the Slovene take the discussion to a more family-orient-
ed field, using words such as mother, child and nurture. This could be 
due to the referendum in Slovenia for legalising same-sex marriage 
that took place in the same period as the data was harvested and has 
heavily influenced the discussions under Facebook posts by the Slo-
vene media on this topic at the time, where people against framed their 
arguments around the notion of the traditional family unit, expressing 
concern with children’s rights and same-sex couples’ adoptions. In-
teresting enough, the roles are reversed for the Migrants topic, as now 
it is the Slovene commenters who are more concerned with religion, 
using words such as religion and God, possibly showing fear of a differ-
ent religion. On the other hand, the English commenters seem to feel 
more physically threatened by the migrants, using words such as fight, 
kill and idiot. This could be due to the unprecedented migrant wave 
through the Balkan route that took place in the same period as the data 
was harvested and has heavily influenced the discussions under Face-
book posts by the Slovene media on this topic at the time as Slovenia 
has never before experienced anywhere near this rate of the migrant 
influx, while the topic has been present in the UK political and public 
debates for many decades.
Individual emotions across languages exhibit mostly similar con-
cepts, yet there are some differences. For example, for the Migrants 
topic, both English and Slovene commenters use similar words to ex-
press emotions, in particular, both groups express disgust with the 
word terrorist and show fear with immigrant/fugitive. On the other 
hand, English commenters express anger for the LGBT+ topic in differ-
ent terms than the Slovene ones. The English commenters use words 
such as disgusting, hate and sin, taking the attitude that being LGBT+ is 
sinful and repulsive, whereas the Slovene ones show hostility towards 
the target and, once again, concern regarding children with expres-
sions such as violence, nurture and against.
Expectedly, in both languages, word usage varies depending on 
the topic. For example, English commenters express sadness with the 
word problem for both the Migrants and the LGBT+ topic, suggesting 
they perceive both topics as an issue. Yet, Slovene commenters show 
disgust differently for the two topics, exposing what bothers them most 
17
Emotion analysis in socially unacceptable discourse
for each. As expected, for the Migrants topic words such as fugitive, ter-
rorist and back occur frequently, while for the LGBT+ topic gay, garbage 
and nurture are recurring.
Table 3: Top three most frequent emotional words in SUD comments for each emotion, di-
vided per language and topic (Slovene words have their translation after the dash)
ENG Migrants ENG LGBT+ SLO Migrants SLO LGBT+
anger fight (4.45%) 
hate (3.12%) 
money (2.96%)
disgusting (4.22%) 
hate (3.95%) 
sin (2.98%)
begunec – fugitive (11.84%) 
proti – versus (3.73%) 
terorist – terrorist (2.73%)
proti – against (5.29%) 
nasilje – violence (2.82%) 
vzgajati – nurture (2.79%)
anticip. child (7.7%) 
good (4.71%) 
time (4.57%)
God (15.78%), 
marriage (6.56%) 
sex (6.25%)
otrok – child (7.16%) 
vera – religion (6.08%) 
svet – world (4.60%)
otrok – child (25.28%) 
zakon – marriage (6.06%) 
svet – world (3.14%)
disgust hate (4.32%) 
idiot (3.92%), 
terrorist (3.27%)
disgusting (4.45%) 
sick (4.23%) 
hate (4.16%)
begunec – fugitive (14.31%) 
nazaj – back (7.20%) 
terorist – terrorist (3.30%)
peder – gay (5.11%) 
smeti – garbage (3.52%) 
vzgajati – nurture (3.31%)
fear problem (4.80%), 
immigrant (4.56%) 
war (3.99%)
God (12.24%) 
disgusting (2.99%) 
hate (2.80%)
begunec – fugitive (9.75%) 
vojna – war (4.07%) 
bog – God (3.00%)
nasilje – violence (2.64%) 
vzgajati – nurture (2.61%) 
bog – God (2.30%)
joy child (8.72%) 
good (5.34%) 
money (3.82%)
God (15.15%) 
love (8.85%) 
marriage (6.30%)
otrok – child (9.01%) 
vera – religion (7.65%) 
bog – God (4.50%)
otrok – child (28.10%) 
zakon – marriage (7.00%) 
mama – mother (3.17%)
sadness problem (6.60%) 
kill (4.63%) 
leave (4.20%)
sick (4.36%) 
hate (4.29%) 
problem (3.23%)
sam – alone (11.57%) 
begunec – fugitive (10.76%) 
brez – without (3.47%)
sam – alone (7.73%) 
peder – gay (3.88%) 
mama – mother (3.30%)
surprise good (8.68%) 
leave (8.67%) 
money (6.37%)
good (9.6%) 
Trump (6.72%) 
marry (6.23%)
dober – good (8.38%) 
terorist – terrorist (6.51%) 
lep – beautiful (5.76%)
dober – good (6.59%) 
dobro – cool (4.73%) 
lep – beautiful (4.65%)
trust good (3.16%) 
show (3.13%) 
religion (3.11%)
God (12.14%) 
marriage (5.05%) 
sex (4.81%)
begunec – fugitive (8.20%) 
vera – religion (4.27%) 
svet – council (3.23%)
zakon – marriage (5.60%) 
pravica – right (4.29%) 
svet – world (2.90%)
Note. Percentages show the absolute frequency of the word with respect to all the words 
of the specific emotion. E.g., the word fight covers 4.45% of all anger words for the ENG 
Migrants dataset.
5  Conclusions
In this paper, we have presented a quantitative analysis of emotions 
in SUD comments in order to obtain an insight into the emotional foot-
print of this type of discourse. Applying the NRC Emotion Lexicon, we 
developed a novel metric named Emotional Charge of the comments 
to analyse SUD. We implemented this simple, yet effective methodol-
ogy on the most relevant SUD multilingual dataset which also contains 
18
Slovenščina 2.0, 2022 (1) | Razprave
Slovene data, namely the FRENK dataset, which comprises Facebook 
comments to posts related to the LGBT+ and the Migrants topic. We 
showed that SUD comments are more emotional than non-SUD. We 
also presented how emotions differ depending on the topic. For exam-
ple, according to the emotion lexicon, the LGBT+ topic invokes more 
joy, while the Migrants topic invokes more fear. When comparing the 
emotional charge of SUD comments depending on its target, we ob-
served that comments are more emotional when a user targets the 
group (LGBT+ or Migrants) compared to a fellow commenter they are 
having an argument with. Furthermore, we also performed a qualitative 
analysis of the emotional words, which showed some trends in their 
usage depending on the topic and language. Slovene commenters to 
LGBT+ posts are very much concerned with children’s wellbeing, while 
the English ones tend to manifest their opposition and disgust. For the 
Migrants topic, there is a common tendency in both languages of ex-
pressing the same emotion with similar words (e.g., disgust – terrorist; 
fear – fugitive/immigrant).
An original contribution of this study is its demonstration of the 
methodological potential of the lexical approach for identifying emo-
tions in SUD, which has not been used in the Slovene context yet. The 
research presented in this paper complements international literature 
in this domain with the use of richly annotated corpora, emotion lexica 
and quantitative measures, while also adding a qualitative analysis.
The metric of measuring emotional intensity we have proposed in 
this paper has proved to be useful and insightful in our research, yet its 
simplicity could potentially oversimplify the highly complex problem of 
expressing emotions on social media which transcends linguistic ex-
pression and is not only highly context-dependent but is also very cul-
turally nuanced, a common shortcoming of lexicon-based approaches. 
This is why we propose to experiment with context-aware models and 
metrics in future work that will better be able to take into account the 
complexity of this type of communication. We also stress the need for 
in-depth qualitative sociolinguistic analysis to always complement 
quantitative and automated approaches that will not only critically 
evaluate the quantitative approaches of such complex and sensitive 
phenomena but will also ensure that all relevant aspects of the com-
19
Emotion analysis in socially unacceptable discourse
munication reality are considered before interpreting the results, draw-
ing conclusions and making policy recommendations.
Acknowledgments
The work described in this paper was funded by the Slovenian Re-
search Agency within the national research project »Resources, meth-
ods, and tools for the understanding, identification, and classification 
of various forms of socially unacceptable discourse in the information 
society« (J7-8280, 2017–2019), the Slovenian-Flemish bilateral basic 
research project »Linguistic landscape of hate speech on social me-
dia« (N06-0099, 2019–2023), the national research programme »Slo-
vene Language – Basic, Contrastive, and Applied Studies« (P6-0215) 
and the national research programme »Digital Humanities: Resources, 
Tools and Methods« (P6-0436).
References
Alm, C., Roth, D., & Sproat, R. (2005). Emotions from Text: Machine Learning for 
Text-based Emotion Prediction. Proceedings of the Human Language Tech-
nology Conference and Conference on Empirical Methods in Natural Lan-
guage Processing, October 2005, Vancouver, Canada (pp. 579–586). As-
sociation for Computational Linguistics. doi:10.3115/1220575.1220648
Al-Saqqa, S., Abdel-Nabi, H., & Awajan, A. (2018). A survey of textual emo-
tion detection. 8th International Conference on Computer Science and 
Information Technology (CSIT), July 2018 (pp. 136–142). doi: 10.1109/
CSIT.2018.8486405
Aman, S., & Szpakowicz, S. (2007). Identifying Expressions of Emotion in Text. 
In V. Matoušek & P. Mautner (Eds.), Text, Speech and Dialogue, SD 2007. 
Lecture Notes in Computer Science (Vol. 4629) (pp. 196–205). Berlin, Hei-
delberg: Springer. 
Assimakopoulos, S., Baider, F. H., & Millar, S. (2017). Online Hate Speech in the 
European Union. A Discourse-Analytic Perspective. Cham: Springer Inter-
national Publishing.
Brindle, A. (2016). The Language of Hate. A Corpus Linguistic Analysis of White 
Supremacist Language. London and New York: Routledge.
Canales, L., Daelemans, W., Boldrini, E., & Martinez-Barco, P. (2019). EmoLa-
bel: Semi-Automatic Methodology for Emotion Annotation of Social Media 
20
Slovenščina 2.0, 2022 (1) | Razprave
Text. IEEE Transactions on Affective Computing. Retrieved from https://
ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8758380 
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Rout-
ledge.
Daelemans, W., Fišer, D., Franza, J., Kranjčić, D., Lemmens, J., Ljubešić, N., 
Markov, I., & Popič, D. (2020). The LiLaH Emotion Lexicon of Croatian, 
Dutch and Slovene. Slovenian language resource repository CLARIN.SI. 
https://www.clarin.si/repository/xmlui/handle/11356/1318 
Denecke, K. (2008). Using SentiWordNet for Multilingual Sentiment Analysis. 
Proceedings of the 24th International Conference on Data Engineering, 
7–12 April 2008, Cancun, Mexico (pp. 507–512).
Fišer, D., Ljubešić, N., & Erjavec, T. (2017). Legal framework, dataset and an-
notation schema for socially unacceptable online discourse practices in 
Slovene. Proceedings of the 1st Workshop on Abusive Language Online, 
ACL 2017, Vancouver, Canada (pp. 46–51). Association for Computation-
al Linguistics. doi: 10.18653/v1/W17-3007
Franza, J., & Fišer, D. (2019). The lexical inventory of Slovene socially unac-
ceptable discourse on Facebook. Proceedings of the 7th Conference on 
Computer-Mediated Communication (CMC) and Social Media Corpora, 
CMC-Corpora 2019, Cergy-Pontoise, France. Retrieved from https://hal.
archives-ouvertes.fr/hal-02292616/document#page=50 
Ghazi, D. (2016). Identifying Expressions of Emotions and Their Stimuli in Text. 
PhD dissertation. Canada: University of Ottawa.
Gitari, N. D., Zuping, Z., Hanyurwimfura, D., & Long, J. (2015). A Lexicon-based 
Approach for Hate Speech Detection. International Journal of Multimedia 
and Ubiquitous Engineering (Vol. 10, No.4) (pp. 215–230).
Knoblock, N. (2017). Xenophobic Trumpeters: A corpus-assisted discourse 
study of Donald Trump’s Facebook conversations. In A. Musolff (Ed.), 
Journal of Language Aggression and Conflict (Vol. 5, No.7) (pp. 295–322). 
Amsterdam/Philadelphia: John Benjamins Publishing Company.
Ljubešić, N. (2019). The CLASSLA-StanfordNLP model for morphosyntactic an-
notation of standard Slovenian. Ljubljana: Slovenian language resource 
repository CLARIN.SI. Retrieved from http://hdl.handle.net/11356/1251 
Ljubešić, N. (2020). The CLASSLA-StanfordNLP model for lemmatisation of 
standard Slovenian 1.1, Slovenian language resource repository CLARIN.
SI. http://hdl.handle.net/11356/1286 
Ljubešić, N., Fišer, D., & Erjavec, T. (2019). The FRENK datasets of Socially Un-
acceptable Discourse in Slovene and English. International Conference on 
21
Emotion analysis in socially unacceptable discourse
Text, Speech, and Dialogue. Springer, Cham. doi: 10.1007/978-3-030-
27947-9_9
Ljubešić, N., Fišer, D., Erjavec, T., & Šulc, A. (2021). Offensive language data-
set of Croatian, English and Slovenian comments FRENK 1.1. Ljubljana: 
Slovenian language resource repository CLARIN.SI. Retrieved from http://
hdl.handle.net/11356/1462 
Markov, I., Ljubešić, N., Fišer, D., & Daelemans, W. (2021). Exploring Stylo-
metric and Emotion-Based Features for Multilingual Cross-Domain Hate 
Speech Detection. Proceedings of the Eleventh Workshop on Computa-
tional Approaches to Subjectivity, Sentiment and Social Media Analysis 
(pp. 149–159). Association for Computational Linguistics. Retrieved from 
https://aclanthology.org/2021.wassa-1.16/ 
Martins, R., Gomes, M., Almeida, J. J., Novais, P., & Henriques, P. (2018). Hate 
Speech Classification in Social Media Using Emotional Analysis. 7th Bra-
zilian Conference on Intelligent Systems (BRACIS), 22–25 October 2018, 
Sao Paulo, Brazil (pp. 61–66). doi: 10.1109/BRACIS.2018.00019
Mohammad, S., & Yang T. (2011). Tracking Sentiment in Mail: How Genders 
Differ on Emotional Axes. Proceedings of the 2nd Workshop on Computa-
tional Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011) 
(pp. 70–79). Portland, Oregon: Association for Computational Linguistics.
Mohammad, S., & Turney, P. D. (2010). Emotions Evoked by Common Words 
and Phrases: Using Mechanical Turk to Create an Emotion Lexicon. Pro-
ceedings of the NAACL HLT 2010 Workshop on Computational Approaches 
to Analysis and Generation of Emotion in Text, June 2010, Los Angeles, 
California (pp. 26–34).
Pahor de Maiti, K., Fišer, D., & Ljubešić, N. (2019). How haters write: analy-
sis of nonstandard language in online hate speech. Proceedings of the 
7th Conference on Computer-Mediated Communication (CMC) and Social 
Media Corpora, CMC-Corpora, 9–10 September 2019, Cergy-Pontoise, 
France. Retrieved from https://hal.archives-ouvertes.fr/hal-02292616/
document#page=44
Peng Q., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza: A 
Python Natural Language Processing Toolkit for Many Human Languages. 
Retrieved from https://arxiv.org/abs/2003.07082 
Plutchik, R. (1980). Emotion: Theory, research and experience, 1. Academic 
Press.
Plutchik, R. (2001). The Nature of Emotions: Human Emotions Have Deep 
Evolutionary Roots, a Fact That May Explain Their Complexity and Provide 
Tools for Clinical Practice. American Scientist 89(4), 344–350.
22
Slovenščina 2.0, 2022 (1) | Razprave
Pratt, J. W., & Gibbons, J. D. (1981). Kolmogorov-Smirnov two-sample tests. 
Concepts of nonparametric theory. Springer, New York, NY. 318–344.
Russell, J. (1980). A circumplex model of affect. Journal of Personality and 
Social Psychology, 39(6), 1161–1178. doi: 10.1037/h0077714
Scherer, K. R. (2005). What are emotions? And how can they be measured? Social 
Science Information, 44(4), 695–729. doi: 10.1177/05390184050582 
Vehovar, V., Povž, B., Fišer, D., Ljubešić, N., Šulc, A., & Jontes, D. (2020). 
Družbeno nesprejemljivi diskurz na Facebookovih straneh novičarskih 
portalov. Teorija in Praksa, 57(2), 622–645. 
Zad, S., Jimenez, J., & Finlayson, M. A. (2021). Hell Hath No Fury? Correcting 
Bias in the NRC Emotion Lexicon. Proceedings of the 5th Workshop on On-
line Abuse and Harms, 6 August 2021, Bangkok, Thailand (pp. 102–111). 
Retrieved from https://aclanthology.org/2021.woah-1.pdf 
Analiza čustev v družbeno nesprejemljivem diskurzu
Besedila pogosto izražajo avtorjevo čustveno stanje in pokazalo se je, da imajo 
informacije o čustvih potencial za odkrivanje in analizo sovražnega govora. V 
prispevku predstavljamo kvantitativno metodologijo analize čustev v besedilu. 
Na podlagi leksikona čustev NRC Emotion Lexicon in Plutchikovega modela 
osmih osnovnih čustev smo definirali preprosto, a učinkovito metodo za od-
krivanje čustvene zaznamovanosti besedila. Z navedeno metodologijo smo 
raziskali čustveno zaznamovanost besedil, označenih kot družbeno nespre-
jemljivi diskurz (DND), ki predstavlja izrazito in potencialno škodljivo vrsto 
besedila ter se dandanes hitro širi na družbenih omrežjih. Metodo čustvene 
zaznamovanosti smo aplicirali na korpus komentarjev s Facebooka. Primer-
javo in analizo smo izvajali na štirih zbirkah podatkov v dveh jezikih, in sicer v 
angleščini in slovenščini, ter na dveh temah, pravice LGBT+ skupnosti in evrop-
ska migrantska kriza. Ugotovili smo, da je vsebina DND komentarjev bistveno 
bolj čustvena od tistih, ki ne vsebujejo DND. Poleg tega smo pokazali razlike v 
izražanju čustev glede na jezik, temo in tarčo komentarjev. Izsledke kvantita-
tivne metodologije analize čustev smo podprli s kvalitativno analizo korpusa, 
kjer smo preučili najpogostejše čustveno zaznamovane besede, povezane z 
vsakim čustvom v vseh štirih zbirkah podatkov. Ugotovili smo, da se čustveno 
zaznamovane besede v DND bistveno razlikujejo glede na temo, medtem ko 
obstaja med jeziki precejšnje prekrivanje.
Ključne besede: čustva, družbeno nesprejemljivi diskurz (DND), sovražni go-
vor, družbena omrežja, korpusi