https://doi.or g/10.31449/inf.v47i3.4758 Informatica 47 (2023) 335–348 335
Complaints with T arget Scope Identification on Social Media
Kazuhiro Ito
1
, T aichi Murayama
2
, Shuntaro Y ada
1
, Shoko W akamiya
1
and Eiji Aramaki
1
1
Nara Institute of Science and T echnology , Nara, Japan
2
SANKEN, Osaka University , Osaka, Japan
E-mail: ito.kazuhiro.ih4@is.naist.jp, s-yada@is.naist.jp, wakamiya@is.naist.jp, aramaki@is.naist.jp,
taichi@sanken.osaka-u.ac.jp
Keywords: complaint, dataset, T witter , social media, annotation
Received: March 22, 2023
A complaint is utter ed when r eality fails to meet one’ s expectations. Resear ch on complaints, which con-
tributes to our understanding of basic human behavior , has been conducted in the fields of psychology , lin-
guistics, and marketing. Although several appr oaches have been implemented to the study of complaints,
studies have yet focused on a tar get scope of complaints. Examination of a tar get scope of complaints is
crusial because the functions of complaints, such as evocation of emotion, use of grammar , and intention,
ar e differ ent depending on the tar get scope. W e first tackle the construction and r elease of a complaint
dataset of 6,418 tweets by annotating Japanese texts collected fr om T witter with labels of the tar get scope.
Our dataset is available at https://github.com/sociocom/JaGUCHI . W e then benchmark the anno-
tated dataset with several machine learning baselines and obtain the best performance of 90.4 F1-scor e
in detecting whether a text was a complaint or not, and a micr o-F1 scor e of 72.2 in identifying the tar get
scope label. Finally , we conducted case studies using our model to demonstrate that identifying a tar get
scope of complaints is useful for sociological analysis.
Povzetek: Raziskava se osr edotoča na analizo pritožb iz 6.418 tvitov z več metodami str ojnega učenja.
1 Intr oduction
1
A complaint is “ a basic speech act used to expr ess a neg-
ative disagr eement between r eality and expectations for a
state, pr oduct, or ganization, or event ” [23, p.195 – 208].
An analysis of complaints contributes not only to linguis-
tically [30] and psychologically [1, 18] interesting but also
beneficial for marketing [17].
Understanding why people are dissatisfied can help im-
prove their well-being by analyzing the situation of their
complaints. The methods required to deal with complaints
vary greatly depending on whether the tar get scope of com-
plaints is him/herself, other people, or the environment
(e.g., in the workplace, the way of improvement dif fers
when employees are complaining about their own skills
or about their work environment). The categorization pre-
sented above, regarding the tar get scope, aligns with James’
three psychological categories for the Self as the object of
reference [13]: the spiritual Self, the social Self, and the
material Self, respectively .
In the field of natural language processing (NLP), there
are some studies on how to determine whether a text is a
complaint or not [26, 9, 14], or how to identify its sever -
ity [15], but no studies have been conducted yet to identify
a tar get scope of complaints, which means the object to-
ward which/whom the complaint is directed. Our study is
1
This paper is extended version of our study [12] presented in The 1 1th
International Symposium on Information and Communication T echnology
(SOICT2022)
an attempt to apply a computational approach focusing on
a tar get scope of complaints on social media. More specifi-
cally , we emphasize the importance of identifying whether
the complaints are intended for the complainer him/herself,
for an individual, for a group, or for the surrounding envi-
ronment.
This paper introduces a novel Japanese complaint dataset
collected from T witter that includes labels indicating the
tar get scope of complaints
2
. W e then investigated the va-
lidity of our dataset using two classification tasks: a bi-
nary classification task (shortly binary task) that identifies
whether a text is a complaint or not, and a multiclass clas-
sification task (shortly multiclass task) that identifies the
tar get scope of complaints. Furthermore, we apply our tar -
get scope classification model to case studies: COVID-19,
of fice work, and the 201 1 of f the Pacific coast of T ohoku
earthquake (we call T ohoku earthquake), aiming to analyze
social phenomena.
Our contributions are as follows:
– W e constructed a dataset of complaints extracted from
T witter labeled with the tar get scope of complaints.
– W e conducted an experiment with identifying the tar -
get scope of complaints and achieved an F1 score of
90.4 in detecting whether a text is a complaint or not,
and a micro-F1 score of 72.2 in identifying the tar get
scope label.
2
Our dataset is available at https://github.com/sociocom/
JaGUCHI
336 Informatica 47 (2023) 335–348 K. Ito et al.
T able 1: Counts and examples of complaint tweets per tar get scope label in our dataset
T ar get Scope Label # of T weets Example T weet
SELF 468
し か し た ぶ ん 全 部 顔 と か 行 動 に 出 ち ゃ っ て る か ら 最 低 な の は 自
分 な ん だ よ ね 向 こ う に は 落 ち 度 は な い し 勝 手 に 苛 つ い て る だ け
だ し ね (Maybe I’m the one who’ s the worst because it’ s all showing
on my face and in my actions. It’ s not the other person’ s fault, I’m just
irritated by myself.)
IND 3,866
わ た し が 居 な い と ミ ル ク し ま っ て あ る 場 所 す ら わ か ん な い の か
よ (Y ou do not even know where the milk is stored without me?)
GRP 648
価 値 観 の 違 い か も し れ な い け ど 物 買 う の は 3 千 円 で も し ぶ る
の に ギ ャ ン ブ ル に 平 気 で 金 突 っ 込 む ひ と の 気 持 ち が わ か ら な い
(Maybe it’ s a dif ference in values, but I do not understand people who
are reluctant to spend even 3,000 yen to buy something, but do not mind
excessively spending money on gambling.)
ENV 1,436
保育士の給料上がらないかな ～ 手取り 15 ～ 18 じゃやってけない
よ な (...) 政 治 家 の 給 料 と か よ り 保 育 士 に 回 し て ほ し い わ、 切 実
に (I wonder if childcare workers’ salaries will go up. I can not make it
on 15 to 18 take-home pay . (...) I’d really like to see more money spent
on childcare workers than on politicians’ salaries.)
– W e conducted three case studies to demonstrate the
usefulness of identifying a tar get scope of complaints
for sociological analysis.
2 Related work
In pragmatics, a complaint is defined as “ a basic speech
act used to expr ess a negative disagr eement between r eal-
ity and expectations for a state, pr oduct, or ganization, or
event ” [23, p.195 – 208]. What makes complaints dif ferent
from negative sentiment polarity is that complaints tend to
include expressions of the breaches of the speaker ’ s expec-
tations [26], and include reasons or explanations [31].
The dataset construction is actively conducted to analyse
the substance of complaints. A previous study collected
complaints about food products sent to governmental insti-
tutions and built an automatic classification model accord-
ing to the nature of the complaint [9]. The classification
classes were set up taking into account the use of customer
support, the type of economic activity related, the priority
of the treatment, and whether it is under the responsibility of
the authority or not. Another study has created complaints
dataset with labels for service categories (e.g., foods, cars,
electronics, etc.) collected from reply posts to company ac-
counts on T witter [26]. Another study has also constructed
a complaint dataset with four labels [15]: (1) No explicit
reproach: there is no explicit mention of the cause and the
complaint is not of fensive, (2) Disapproval: express ex-
plicit negative emotions such as dissatisfaction, annoyance,
dislike, and disapproval, (3) Accusation: asserts that some-
one did something reprehensible, and (4) Blame: assumes
the complainee is responsible for the undesirable result.
These four categories follow the definitions of the stan-
dard in pragmatics [29]. [7] has assigned the intensity of
complaints as a continuous value using the best-worst scal-
ing method [20] by crowdsourcing. Another corpus based
on the data accumulated by Fuman Kaitori Center collects
Japanese complaints about products and services [22]. The
corpus includes labels about a tar get of complaints such as
product or service names, which is dif ferent in granularity
from our study .
As mentioned above, although some studies have con-
structed datasets that collect complaints, they have not yet
constructed them that are labeled with a tar get scope to
which complaints are directed.
3 Dataset
3.1 Collection
W e constructed a Japanese complaint dataset using T wit-
ter . For our dataset, we collected 64,313 tweets including
“# 愚 痴 (/gu-chi/)” (a hashtag of a Japanese term for com-
plaints) from March 26, 2006 to September 30, 2021 us-
ing the T witter API
3
. W e excluded URLs, duplicates, and
retweets, and extracted only those tweets with a relatively
low possibility of being a bot. Specifically , we extracted
only those tweets for which the posting application was
T witter for iPad, T witter for iPhone, T witter W eb App, T wit-
ter W eb Client, or Keitai W eb . All hashtags were removed
from the text. T weets with less than 30 characters were ex-
cluded. W e extracted tweets for each month through a strat-
ified sampling and finally obtained 7,573 tweets, which are
3
https://developer .twitter .com/
Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 337
of similar size with datasets recently released for NLP for
social media [16, 24, 5, 3, 21].
3.2 Annotation
W e annotated the 7,573 tweets with the tar get scope label.
The tweets were divided into three sets (2,524, 2,524, and
2,525 tweets in each set), and three trained external anno-
tators annotated each set.
First stage: Whether the tweet is a complaint or not is
identified. Because most of the tweets are complaints
owing to the inclusion of “# 愚痴 ”, we remove tweets
identified as non-complaints. Following Olshtain’ s
definition [23, p.195 – 208], we identified tweets
that expressed a negative disagreement between the
tweeter ’ s expectations and reality as complaints.
Examples of non-complaints tweets removed by this
process is shown below .
“If a company is violating the Labor Standards Act,
gathering evidence is critical to remedy the situation.”
“It’ s easy to complain, so I’m going to shift my
thinking to the positive and creative.”
“I came home exhausted again today . But I saw Mt.
Fuji for a bit on the train on the way home, and it kind
of loosened me up. I thought I was going to cry .”
Second stage: W e identify the tar get scope of complaints.
W e assigned one of four labels, SELF , IND, GRP ,
and ENV . Although our labels broadly follow James’
theory of Self [13], we separate IND (individual) and
GRP (group) because we believe that the nature of the
complaints dif fers depending on whether the tar get is
an individual or a group. In the case of individuals, it
is associated with abuse, while in the case of groups, it
is associated with hate speech. When the tar get scope
was not determined uniquely or was unclear , it was
removed from the dataset. W e show definitions and
examples of labels below .
SELF : A tar get scope includes the complainer .
e.g., “I have said too much again.”
IND : A tar get scope does not include the complainer ,
which is one or several other persons.
e.g., “I hate that my boss puts me in char ge of his
work!”
GRP : A tar get scope does not include the complainer
and has a group.
e.g., “I cannot be interested in people who only
think about money .”
ENV : A tar get scope is not human.
e.g., “It’ s raining today , so I do not feel like doing
anything.”
As a result of the annotation, among the 7,573 texts,
6,418 were considered as complaints. Among the com-
plaint tweets, the number of labels per tar get scope is 468
for SELF , 3,866 for IND, 648 for GRP , and 1,436 for ENV .
As a result, we collected 6,418 tweets. The agreement ra-
tio (Kappa coef ficient) between the annotators and an eval-
uator was measured to be 0.798 for the binary identifica-
tion and 0.728 for the four -label classification. Agreement
values are between the upper part of the substantial agree-
ment [2]. Figure 1 presents the confusion matrix of human
agreement on four classes normalized over the actual val-
ues (rows). Examples of text for each tar get scope label and
number of tweets are shown in T able 1.
Figure 1: Confusion matrix of annotator agreement on four tar get
scope of complaints.
T able 2: Statistics on the number of characters per label. The label
with the highest mean number of characters in the texts
is GRP , whereas the label with the lowest mean number
of characters in the texts is SELF .
T ar get Scope Label Mean Median Std
SELF 76.8 74.0 32.2
IND 83.2 83.0 32.4
GRP 87.8 89.0 32.5
ENV 77.8 74.0 33.8
ALL 82.0 81.0 32.8
3.3 Data analysis
W e conducted two types of analysis for the contents of the
dataset 　 to gain linguistic insight into this task and the
data: the number of characters and the emotions. The re-
sults of each analysis are shown below .
3.3.1 Number of characters
The average number of characters in the entire dataset is
82.0, and the median is 81.0. The label with the most char -
acters is GRP (mean of 87.8 and median of 89.0), and the
label with the fewest characters is SELF (mean of 76.8 and
median of 74.0). This suggests that while descriptions of
other groups tend to be detailed, those of him/herself have
338 Informatica 47 (2023) 335–348 K. Ito et al.
T able 3: Results of emotion analysis using JIWC. W e investigated the average score for each emotion per label. The highest results are
in bold .
T ar get Scope Label Sadness Anxiety Anger Disgust T rust Surprise Joy
SELF 0.448 0.502 0.774 0.858 0.591 0.467 0.459
IND 0.424 0.425 0.846 0.904 0.568 0.457 0.451
GRP 0.407 0.431 0.861 0.954 0.564 0.477 0.444
ENV 0.434 0.490 0.773 0.824 0.545 0.464 0.482
ALL 0.426 0.445 0.826 0.888 0.564 0.461 0.458
relatively not in detail. The statistics of the number of char -
acters per label are shown in T able 2. Note that we removed
tweets of less than 30 characters in Section 3.1.
3.3.2 Emotion
W e examine the relationship between our dataset and emo-
tions, and the dif ferences in emotions between tar get scope.
T o do so, we used the Japanese Linguistic Inquiry and
W ord Count (JIWC) emotion dictionary
4
. This dictionary
matches words with seven emotion categories (Joy , Sad-
ness, Anger , Surprise, T rust, Anxiety , and Disgust) based
on a translation of Pluchik’ s emotion wheel [25], obtained
from a naturalistic dataset of emotional memories. The
scores for each tweet (S
ij
) were a ratio of the number of
emotion terms in each category (W
ij
), to the total number
of terms (tokens;W
∗ i
) in each tweet:
S
ij
=
W
ij
W
∗ i
log
2
(W
ij
+1) (1)
W e used the scores from this emotion dictionary to calculate
the emotion score for each tweet in our dataset and inves-
tigated the average score for each emotion per label. The
results are shown in T able 3.
For SELF , the low value for Anger and high value for
Anxiety are consistent with our intuition. When the com-
plainer is him/herself, it can be interpreted that Anxiety is
stronger than Anger . Disgust is higher for GRP than for
IND. This indicates that feelings of Disgust are stronger for
groups than individuals. In the case of Anger , both IND
and GRP are high.
3.3.3 T opic
T o investigate whether it is possible to extract the detailed
contents of complaints in our dataset, we analyzed tweets’
topics using the Latent Dirichlet Allocation (LDA), a kind
of topic model [4]. The number of topics is set to 8, and
LDA is applied only to nouns with two or more Japanese
characters. T able 4 shows each topic and assigned words.
The following is an interpretation of the topics. Some of
the topics are work-related (T opics 1, 3, 4, and 5), suggest-
ing that work is the majority of complaints posted on T wit-
4
https://github.com/sociocom/JIWC-Dictionary
ter . 　 Among work-related topics, there were topics re-
lated to mental health (T opic 3), including “mood,” “stress,”
and “hospital,” and topics related to family (T opic 1), in-
cluding “husband” and “children,” which were divided into
several tendencies. The other topic focused on COVID-19
(T opic 8), which includes “COVID-19” and “mask.” Al-
though only recent tweets are relevant to this topic, it is
suggested that many such complaints had been posted in-
tensively .
4 Experiment
4.1 Settings
In this section, we demonstrate the validity of the dataset us-
ing two types of classification tasks: a binary task (2-way)
that identifies whether a text is a complaint and a multiclass
task (4-way) that classifies the tar get scope of complaints.
These tasks correspond to the first and second stages of an-
notation, respectively .
W e employ two types of machine learning models: Long
Short-T erm Memory (LSTM) [1 1] and Bidirectional En-
coder Representations from T ransformers (BER T) [6]. The
BER T model is a fine-tuned version of a model pretrained
on the Japanese version of W ikipedia published by T ohoku
University
5
.
Before training, the dataset was preprocessed into lower -
case, and all numbers were replaced with zeros. W e split the
dataset, into training, validation, and test sets (7:1.5:1.5).
When we split the dataset the label distribution was main-
tained.
W e set each parameter of the LSTM model as follows:
the number of dimensions of the word embedding repre-
sentation is 10, the number of dimensions of the hidden
layer is 128, cross-entropy is used as the loss function, a
Stochastic Gradient Descent (SGD) was applied as the op-
timization method, the learning rate is 0.01, and 100 epochs
are used. W e also set each parameter of the BER T model as
follows: The maximum number of tokens per tweet is 128,
the number of batches is 32, Adam is used as the optimiza-
tion method, the learning rate is1. 0× 10
− 5
, and 10 epochs
are used. After examination of the validation data, we used
the above parameters. Then, for the binary task, we added
5
https://github.com/cl-tohoku/bert-japanese
Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 339
T able 4: The top 5 words per topic (translated from Japanese). Some of the topics are work-related (T opics 1, 3, 4, and 5), suggesting
that work is the majority of complaints posted on T witter . The other topic focused on COVID-19 (T opic 8), which includes
“COVID-19” and “mask”.
T opic 1 T opic 2 T opic 3 T opic 4 T opic 5 T opic 6 T opic 7 T opic 8 　
husband child human company really why without saying angry
child movie workplace husband stupid friend adult COVID-19
boss parents’ house mood world vacation everyday money cry
mood block stress place word child senior member forbidden word
senior member article hospital mother company meal staf f mask
6,000 tweets to the dataset that were randomly sampled and
removed complaints according to our annotation method.
4.2 Metrics
W e report predictive performance of the binary task as the
mean accuracy , macro-F1 score, and ROC AUC as well
as existing complaints study [26]. On the other hand, we
report predictive performance of the multiclass task as the
micro-F1 score and macro-F1 score.
4.3 Results
4.3.1 Binary task (2-way)
The results of the binary task reach an accuracy level of
83.5, an F1 score of 83.7, and an AUC of 83.5 for the LSTM
model, and a level of accuracy of 89.6, an F1 score of 90.4,
and an AUC of 89.4 for the BER T model (as shown in T a-
ble 5). The confusion matrix of the BER T model has a T rue
Positive rate of 0.92, False Positive rate of 0.14, False Neg-
ative rate of 0.08, and T rue Negative rate of 0.86. For the
BER T model, false negatives were reduced in number in
comparison to the LSTM model. Figure 2 (a) and (b) show
the confusion matrices for the LSTM and BER T models,
respectively .
T able 5: Results of the binary and multiclass tasks. The BER T
model outperformed Major Class and the LSTM model
for each metric. The bold font indicates the best score for
each evaluation metric.
T ask Metric Major Class LSTM BER T
Accuracy 51.7 83.5 89.6
Binary F1 score 69.3 83.7 90.4
AUC 50.0 83.5 89.4
Multiclass
micro-F1 score 62.1 51.7 72.2
macro-F1 score 19.2 30.1 54.5
W e are interested in what types of tokens our complaint
model tries to capture. T o interpret the behavior of the
model, we used LIME [28], a method for explaining ma-
chine learning models, to create a visualization. W e vi-
sualize the attention weights extracted from BER T model
for the following example (translated from Japanese): “Re-
cently , I had an encounter where all the free time I worked
hard to make for a paid vacation was wasted because of the
(a) LSTM model (b) BER T model
Figure 2: Confusion matrices of the binary task (2-way).
absence of a part-time worker who comes to work only once
a week.” W e observed that the model paid attention to the
expression “wasted because of the absence of a part-time
worker who comes to work only once a week” for classifi-
cation (as shown in Figure 3). In this example, the reason
was the cause of the complaint, suggesting that our model
pays attention to the same part as human intuition.
(a) Binary Classification Model
(b) Multi Classification Model
Figure 3: V isualization of the attention weights for the sample
sentences in our binary (a) and multi (b) classification
models. The orange line highlights the cue of classifi-
cation. For (a), highlighted words are “wasted because
of the absence of a part-time worker who comes to work
only once a week.” For (b), highlighted words are “The
husband who plays the role of ... too disgusting.”
4.3.2 Multiclass task (4-way)
The results of the multiclass classification task are a micro-
F1 score of 51.7 for the LSTM model, and a micro-F1 score
of 72.2 for the BER T model. Figure 4 (a) and (b) show
the confusion matrices for the LSTM and BER T models,
respectively .
In the LSTM model, a relatively lar ge number of tweets
are classified as either IND or ENV , reflecting the bias in
the number of tweets in the dataset. Although the BER T
model mitigates the ef fect of label bias in the dataset in
340 Informatica 47 (2023) 335–348 K. Ito et al.
(a) LSTM model (b) BER T model (c) BER T with down sampling
Figure 4: Confusion matrices of the multiclass task (4-way). The LSTM model classified a relatively lar ge number of tweets as IND or
ENV . The results likely reflect the bias in the number of tweets in the dataset. The BER T model mitigates the ef fect of label
bias in the dataset in comparison to the LSTM model. The BER T model with down sampling results show little bias among
the labels.
T able 6: Examples of error cases in the binary task.
ID
Complaint Label
T weet
T rue Predicted
(1) non-complaint complaint
お 仕 事 終 わ り ! 定 時 で 上 が れ た け ど、 フ ィ ッ ト ネ ス に 行 く か ヤ
フ オ ク の 発 送 か ... 。 明 日 は 遅 番 だ か ら ジ ム 行 く の が 得 策。 来 週
ま で 行 け な い し。 (I finished the work! I was able to leave work on
time, but I don’ t know if I should go to the fitness center or ship the
Y ahoo Auction... I have a late shift tomorrow , so going to the gym is
in my best interest. I can’ t go to there until next week.)
(2) non-complaint complaint
何 か 作 り た い な ー と い う 気 分 が 出 て 来 た だ け マ シ か な ー と 思
う 昨 今。 風 邪 の 熱 に 浮 か さ れ て る だ け か も し れ な い が。 フ ォ
トショ起動するのもめんどくさいモードだけど。 うん。 (I think
it’ s better that I feel like making something these days. I may just be
suf fering from a fever from a cold. Although I’m too lazy to start up
Photoshop right now .)
(3) non-complaint complaint
今 日 は 寝 坊 し て 大 変 だ っ た か ら 早 め （で も も う 0 時 ; ） に 寝 よ
う。 お休みなさい ! (I overslept and had a hard time today , so I’ll go
to bed early (but it’ s already midnight;). Good night!)
(4) complaint non-complaint
今、 カ ラ オ ケ に 行 っ て る ら し い。 職 場 に コ ロ ナ 持 ち 込 ま な い
で ね ー !! 感 染 者 出 た ら、 あ な た の 責 任 で す か ら ! (Now they are
going to karaoke, I heard. Don’ t bring coronavirus into the workplace!
If anyone gets infected, it’ s your fault!)
(5) complaint non-complaint
感 情 豊 か で す ね っ て、 そ の 状 況、 人 に 合 わ せ て 自 分 を 作 っ て
ん だ よ (People tell me I’m very emotional, but I make myself fit the
situation and the people around me.)
comparison to the LSTM model, the accuracy per label
shows that SELF tend to be misclassified as ENV . This re-
flects the fact that it is dif ficult to classify SELF and ENV
because they have the common tendency to omit the tar -
get scope in statements about themselves. The accuracy of
GRP is relatively low because when a complainer refers to
a group that does not include him/herself, the complainer
does not always use words that explicitly express that tar -
gets are multiple. In short, the LSTM model greatly outper -
formed the major class results in macro-F1, and the BER T
model somewhat mitigated the bias in the number of la-
bels that af fected the LSTM classification results, further
improving the macro-F1.
As well as binary task, we show the visualization of what
types of tokens our complaint model tries to capture for the
following example (translated from Japanese): “The hus-
band who plays the role of “a man sneezing boldly” even
though he knows his family doesn’ t like it is too disgusting.
He does it occasionally , and it’ s so dull because it’ s so arti-
ficial and it shows on his face”. This tweet was identified
Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 341
T able 7: Examples of error cases in the multiclass classification task.
ID
T ar get Scope Label
T weet
T rue Predicted
(6) SELF IND
あー、 でも休みの日とか、 歩いてる時とか、 ショッピングの時にア
イディア浮かぶかも。 もう、 おっちゃんアイディア出ないから、 も
っと若い人に頑張って欲しいなぁ。 (Maybe ideas happen when I’m on
vacation, or walking, or shopping. As an old man, I can’ t come up with any
more ideas so I wish more young people would try their best.)
(7) SELF ENV
頑 張 っ て も 報 わ れ な い し 人 間 関 係 で い つ も と ん 挫 す る し ど う す り
ゃ い い の か わ か ん な い な、 も う (I don’ t know what to do because my
hard work is not rewarded and I always fail in personal relationships.)
(8) GRP IND
とある it 企業のデバッガーとして勤めてますが、 今日だけは言わせ
てください。 デバッガーを馬鹿にするな。 (I work as a debugger for an
IT company , and let me say this today . Don’ t mock debuggers.)
(9) ENV GRP
ニ キ ビ 死 ね ー ー ー ー ー ー ー っ っ っ !!!!!!!! お 前 の せ い で ブ ス さ 倍 増
す ん だ よ ク ソ 野 郎 !!!!!!!! (Pimples go away!!!!!!!!!!!!!!!! Y ou make me
look twice as ugly , damn you !!!!!!!!)
as IND by our model. The model paid the most attention to
the words “The husband who plays the role of ... too dis-
gusting” for classification (as shown in Figure 3). These
words clearly illustrate the tar get of the complaint, “hus-
band”, and the feeling of “too disgusting” for that person,
thus the cues to which the model assigned the labels are
clearly interpretable to us.
4.4 Downsampling
Because the error in our multiclass task might be highly in-
fluenced by the unbalanced labels of the dataset, we exper -
imented with a dataset with down sampling. W e negatively
sampled the number of data for labels other than SELF to
approximately equal the number of labels for SELF , which
has the fewest number of labels. For this experiment, we
employ the BER T model and the settings are equal to Sec-
tion 4.2. The result is a micro-F1 score of 55.3 and a macro-
F1 score of 55.5. The results, as illustrated in Figure 4(c),
indicate little bias among the labels. This result still shows
a relatively high level of confusion between IND and GRP ,
suggesting that these pairs of labels tend to be similar lan-
guages. In addition, there were relatively many cases where
ENV tweets were classified as SELF , suggesting that this
error may be due to the omission of the tar get to which the
complaint is directed (See Section 4.5).
4.5 Err or analysis
4.5.1 Binary task (2-way)
Although the BER T model showed a high score of F1 score
of 90.4, the model could not classify tweets correctly in
some cases. The examples of error cases are shown in T a-
ble 6.
(1), (2), and (3) in T able 6 show the results of False Pos-
itive. In the example of (1), although the tweeter writes an
expression that is not sure about the choice, it is labeled as
NEGA TIVE in the true data because It does not contain any
negative emotions related to the complaint. In the example
of (2), although the word “lazy”, which is closely related to
complaints, appear in the sentence, the expression “I think
it’ s better” is the intent of the entire sentence. In the exam-
ple of (3), the word “overslept” indicates an unfavorable sit-
uation, but the whole sentence is not a complaint because it
is simply a tweet indicating the intention to go to bed early .
In all of these cases, although negative elements are used in
some parts of the tweets, the purpose of the tweet is other
than just complaining. These tend to be False Positive.
On the other hand, in the case of (4) and (5) in T able 6,
the results are False Negative. The example of (4), syntac-
tically , it is a tweet indicating a kind of request to the tar get
scope, but semantically it is a sentence accusing the tar -
get of going out to play . The tweet in (5), tweeter corrects
an error in the tar get’ s perception and intends to express
that he/she is feeling uncomfortable. As in these examples,
there are often cases in which there is no explicitly com-
plaint language or syntax in the tweets, but words appear
that semantically imply a complaint.
4.5.2 Multiclass task (4-way)
W e use the results of the BER T model with high accuracy to
analyze error cases. The examples of error cases are shown
in T able 7.
In many cases, the model predicts tweets as IND or ENV
whose true labels are SELF . For example, in (6) in T able 7,
there are two possible error factors: first, if the model fo-
cused on the sentence “I want more young people would
try their best” and recognized “young people” as the tar -
342 Informatica 47 (2023) 335–348 K. Ito et al.
get, it would be a false identification because the tweeter
him/herself is the tar get scope for the purpose of the tweet.
The second is that the tweeter , who is the true tar get scope,
is paraphrased as “old man,” and thus this word is perceived
as if he were a third party . Example (7) is a tweet that tar gets
him/herself, which the model predicts as a label for ENV ,
since the scope of the tweet is not explicitly stated. Also,
the model predicts tweets as IND or ENV whose true labels
are GRP . In example of (8), although it can be inferred from
the context that there is more than one person who is the tar -
get scope of the complaint, it is dif ficult to determine from
the text whether the number is singular or plural, because
there is no noun specified that indicates the tar get scope of
the complaint. In example of (9), the use of the expression
“go away” for a non-living tar get, commonly used to call
out to a human, results in the incorrect identification of the
tar get as a human being. Overall, the model tended to mis-
classify tweets that implied the tar get scope, which could
only be inferred from extra-textual knowledge or the tone
of the comments.
5 Case studies
W e apply the constructed classification model of a tar get
scope of complaints to tweets related to COVID-19, of fice
work, and T ohoku earthquake to show that it is useful for
sociological analysis.
5.1 Case 1: COVID-19
W e obtained 698,950 Japanese tweets including “ コ ロ ナ
(/ko-ro-na/)” which is a Japanese word for COVID-19 from
January 1, 2020 to December 31, 2021 using the T witter
API.
The time series data presented in Figure 5 show that ENV
accounted for a lar ge ratio of cases during the early stages
of the pandemic, and that this ratio decreased over time. In
the tweets classified as IND or GRP , there were many com-
plaints for others whose views on COVID-19 were dif ferent
from those of the complainer , whereas in the tweets classi-
fied as ENV , there were many complaints for SARS-COV -2
and life during the pandemic. The examples of tweets la-
beled as each label is shown in T able 8.
In addition, T o confirm our hypothesis that a content of
complaints varies depending on a tar get scope, we analyzed
the topics of the tweets using the Latent Dirichlet Alloca-
tion (LDA), a kind of topic model [4]. The number of top-
ics is set to 16, and LDA is applied only to nouns and ad-
jectives. T able 9 shows the five characteristic topics and
five words extracted from the top 10 words per topic. The
words that appear in topics about tweets labeled SELF in-
clude a number of adjectives such as “afraid,” “happy ,”
and “sad,” expressing their state of mind. IND is closely
related to the tweeter ’ s personal relations, such as “girl-
friend,” “family ,” and “parents’ house.” Complaints about
GRP tend to tar get public things, such as “government,”
“politics,” “Olympics,” and “celebrity .” ENV frequently
contains words related to the services of their customers,
such as “lesson,” “movie,” “vaccine,” and “news.”
The dif ferences in topics per label showed a certain in-
terpretability , suggesting that automatic classification of a
tar get scope of complaints at the granularity of our dataset
also contributes to a c ategorization of the content of com-
plaints.
(a) T weets Counts
(b) The ratio of tweets labeled with each label
Figure 5: T ime series data on the number of tweets per tar get
scope of complaints related to COVID-19. ENV ac-
counted for a lar ge proportion of cases during the early
stages of the pandemic, and this proportion decreased
over time.
(a) T weets Counts
(b) The ratio of tweets labeled with each label
Figure 6: T ime series data on the number of tweets per tar get
scope of complaints related to of fice work. There were
few changes in the number of complaints per tar get
scope over time.
Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 343
T able 8: The examples of tweets related to COVID-19 labeled as each label
T ar get Scope
Label
T weet
IND
旦 那 ね、 色 ん な と こ ろ で 営 業 回 っ て る 人 だ か ら よ く 風 邪 ひ い た り 熱 出 た り す ん の。 手 洗 い う が い し て ね っ
て言ってもしねぇの。 こいつのことこれからコロナさんって呼ぶことにした。 (My husband is a salesman who
goes around to various places so he often catches a cold or gets a fever . I tell him to wash his hands and gar gle, but he
doesn’ t. I’ve decided to call him Mr . COVID from now on.)
一 生、 平 行 線 な ん で も う い い ん じ ゃ な い で す か。 あ な た は、 コ ロ ナ は 大 し た こ と な い と 思 っ て る、 私 は 違
う。 これでいいですよ。 (All along, it’ s failed to reach an agreement, so I think we’re done. Y ou think COVID-19 is
no big deal, I don’ t. I’m fine with this.)
ENV
コ ロ ナ が 長 引 く と 永 遠 に 子 供 に 会 え な く な り ま す 子 供 は そ の 環 境 に 馴 染 ん で し ま う か ら う ち は 何 と か line
で 繋 げ よ う と し て る け ど、 も う 手 遅 れ な ん で そ れ は 悲 し い こ と (If the situation with COVID-19 is prolonged,
we won’ t be able to see our child forever ... W e are trying to connect with them via LINE so that they don’ t get used to
that environment, but it’ s too late now , and that’ s sad ... .)
ホ ン ト 疲 れ ち ゃ っ た し、 我 慢 し て る こ と も 多 い か ら 辛 い よ コ ロ ナ 禍 じ ゃ な き ゃ と っ く に 東 京 と か も 行 っ て
る し、 何 よ り ラ イ ブ 出 来 て た だ ろ う し ね (It’ s hard because I’m really tired and I have to endure so much ... . If
it wasn’ t the situation with COVID-19, I would have been in T okyo by now , and more importantly , I would have been
able to go to live shows.)
T able 9: Five characteristic topics and five words extracted from the top 10 words per topic (translated from Japanese). SELF contains
many adjectives such as “afraid,” “happy ,” and “sad,” expressing their state of mind. IND is closely related to the tweeter ’ s
personal relations, such as “girlfriend,” “family ,” and “parents’ house.” Complaints about GRP tend to tar get public things,
such as “government,” “politics,” “Olympics,” and “celebrity .” ENV frequently include words related to the service for which
the tweeter is a customer , such as “lesson,” “movie,” “vaccine,” and “news.”
T ar get Scope Label
W ords extracted from the top 10 words per topic
T opic 1 T opic 2 T opic 3 T opic 4 T opic 5
SELF
afraid hobby natural meal a lot
happy ruin stress really complex
painful symptoms dislike word surprised
timing vaccine tough patience result
sane wedding cheerful sad life
IND
part-time job concert mask stupid afraid
stress child parents’ house money really
travel hospital test family you
disturbed afteref fect fool mother friend
promise girlfriend afraid please bad
GRP
treatment covering up Olympics vaccine player
new type doctor report young man prejudice
government politics af ford governor train
success opinion slander criticism citizen
demonstration civil servants media celebrity trash
ENV
lesson movie vaccine news infection
cancellation ticket afraid money pain
postponement gym time metropolis universal
hospitalization patience positivity summer vacation like
return to country really insurance dead closing down
5.2 Case 2: office work
W e obtained 731,000 Japanese tweets including a word “ 仕
事 (/shi-go-to/)”, which is related to of fice work from Jan-
uary 1, 2020 to December 31, 2021 using the T witter API.
Note that among the tweets collected in Case 2, 12,626
tweets overlapped with those collected in Case 1.
The time series data presented in Figure 6 show few
changes in the ratio of complaints per tar get scope over
time. This suggests that complaints regarding of fice work
tended to be consistent regardless of the social situation.
During the year -end and New Y ear ’ s periods, the overall
number of complaints tended to decrease, while the tweets
classified as ENV did not decrease during this period.
As in Case 1, we analyzed the topics of the classified
tweets in Case 2. T able 10 shows the five characteristic
topics and five words extracted from the top 10 words per
topic. The same tendency as in Case 1 was observed for
all labels except ENV , with higher weights given to adjec-
344 Informatica 47 (2023) 335–348 K. Ito et al.
tives such as “nervous,” “anxious,” and “sad” for SELF ,
words indicating personal relations such as “boss,” “you,”
and “husband” for IND, and words indicating public tar gets
such as “idol,” “company ,” and “voice actor” for GRP .
W ith regard to ENV , while in Case 1, words indicating
services to which the tweeter is a customer appeared, in
Case 2, words indicating workload or vacation were com-
mon, suggesting that the environment in which complaints
tar get varies greatly depending on the domain.
5.3 Case 3: T ohoku earthquake
In Case 1, the time series data show that complaints labeled
as ENV accounted for a lar ge proportion of cases during
the early stages of the pandemic, but decreased over time,
while complaints labeled as IND and GRP are flat over
time. This tendency suggests our labels of the tar get scope
of complaints caught phenomenon called “ a paradise built
in hell ” [27]. This concept means that victims often exhibit
altruistic behavior , engaging in voluntary mutual aid after a
disaster . In the case of our classification model, we hypoth-
esize that if the phenomenon of “ a paradise built in hell ”
occurs, the ratio of complaints labeled as ENV is high in the
early period after the disaster , while the ratio of complaints
labeled as IND or GRP increases over time.
W e obtained 106,732 Japanese tweets including “ 東 日
本 大 震 災 (/hi-ga-shi-ni-ho-n-da-i-shi-n-sa-i/)” which is a
Japanese word for T ohoku earthquake from March 1 1, 201 1
to March 10, 2013 using the T witter API. The time series
data presented in Figure 7 show that complaints labeled as
ENV accounted for a lar ge ratio of cases during the early
period after the disaster and that this ratio decreased over
time. In contrast to the complaints labeled as ENV , the ra-
tio of complaints labeled as GRP increased from one year
after the disaster . These trends suggest that our classifica-
tion model for the tar get scope of complaints can be used to
detect the phenomenon of “ a paradise built in hell ” in T o-
hoku earthquake. The examples of tweets labeled as each
label is shown in T able 1 1.
6 Conclusion & futur e work
W e examined the use of computational linguistics and ma-
chine learning methods to analyze the complaints subjects.
W e introduced the first complaint dataset including labels
that indicate a tar get scope of complaints. W e then built
BER T -based classification models that achieved F1 score
of 90.4 for a binary classification task and micro-F1 score
of 72.2 for a multiclass classification task, suggesting the
validity of our dataset. Our dataset is available to the re-
search community to foster further research on complaints.
While we tried to adjust the unbalanced labels of the dataset
by down sampling, it is also possible to adjust it by semi-
supervised learning [19, 10] or data augmentation [8]. The
validation of methods to improve model performance, in-
cluding these methods, is our future work.
(a) T weets Counts
(b) The ratio of tweets labeled with each label
Figure 7: T ime series data on the number of tweets per tar get
scope of complaints related to T ohoku earthquake. The
complaints labeled as ENV accounted for a lar ge pro-
portion of cases during the early period after the disaster
and this proportion decreased over time. In contrast to
the complaints labeled as ENV , the ratio of complaints
labeled as GRP increased from about one year after the
disaster .
Furthermore, from the results of the case studies, we
could show the possibility of applying the constructed mod-
els to perform sociological analysis. In case study , we ap-
plied our model to tweets extracted using queries related to
COVID-19, of fice work, and T ohoku earthquake. In the
case of COVID-19, we identified that the ratio of com-
plaints tar geting the surrounding environment decreases
over time. W e found that complaints tar geting the sur -
rounding environment and specific individuals were more
frequent, with the former being complaints about “others
whose views on COVID-19 dif fer from the tweeter” and
the latter being complaints about “the COVID-19 virus and
the environment in which infectious disease is spreading.”
These results suggest most complaints can be divided into
two categories: complaints that divide people and com-
plaints generate empathy and cooperation. In the case
of the 201 1 of f The Pacific Coast of T ohoku Earthquake,
we showed the potential of our model to detect the phe-
nomenon of “ a paradise built in hell .” These viewpoints
show the potential of our dataset as a starting point for so-
ciological analysis.
W e also experimented with a topic model for each tar get
scope label as a case study using tweets about COVID-19
and of fice work, respectively . The distribution of words per
topic confirms our hypothesis that the content of complaints
varies greatly depending on the tar get scope. In addition,
we observed that the complaints classified by our model as
environmentally tar get scope varied greatly depending on
the domain. In the future, as attempted through the case
Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 345
T able 10: Five characteristic topics and five words extracted from the top 10 words per topic (translated from Japanese). Higher weights
were given to adjectives such as “nervous,” “anxious,” and “sad” for SELF , words indicating personal relations such as “boss,”
“you,” and “husband” for IND, words indicating public tar gets such as “idol,” “company ,” and “voice actor” for GRP , and
words indicating the day of the week, busy season, and vacation for ENV
T ar get Scope Label
W ords extracted from the top 10 words per topic
T opic 1 T opic 2 T opic 3 T opic 4 T opic 5
SELF
nervous human like sad lonely
overtime really motivation depressed busy
hard painful anxious dislike dif ficult
bothersome stress happiness despair weekend
painful get a job patience adult beautiful
IND
boss you vacation bath meal
every day son computer senior member husband
me plan mistake meal work place
information absolutely salary tough bath
really fool husband friend time
GRP
idol recruitment voice actor everybody doctor
type salary politics tough crime
occupation serious interesting professional The Diet
stupid company government of ficial on time last train
left-wing woman knowledge understanding really
ENV
tired busy vacation good game
go to work event tough a fun thing tomorrow
Monday afraid tired refrain weekend
Friday tough study end-of-year happy
everybody reservation nap dull sleep
T able 1 1: The examples of tweets related to T ohoku earthquake labeled as each label
T ar get Scope
Label
T weet
GRP
今、 電 車 に 乗 っ て い ま す が、 み ん な 暑 い 服 着 て い ま す ね。 だ か ら、 余 計 な 電 力 が 必 要 な の で す。 も う す ぐ
東日本大震災から 2 年。 もう一度、 見つめ直しましょう。 あぁあ、 電車の空調が入っちゃった。 (I’m taking
the train now , everyone is wearing hot clothes. So we need extra electric power . It will soon be two years since T ohoku
earthquake. Let’ s look back once again. Ahhh, the air conditioning is on in the train.)
東 日 本 大 震 災 の 被 災 に 関 し て 言 え ば、 未 だ に 復 興 ど こ ろ か 復 旧 す ら 出 来 て い な い 所 も あ る。 ま し て や、 福
島 県 の 一 部 県 民 は、 ふ る さ と へ 帰 れ な い ま ま で す。 選 挙 を し て る 場 合 で し ょ う か ね ぇ。 (As for the damage
caused by T ohoku earthquake, there are still some areas that have not even been restored, let alone repaired. And some
residents of Fukushima Prefecture are still unable to return to their hometowns. I wonder if it’ s a matter of time to hold
elections.)
ENV
勉 強 横 目 に 東 日 本 大 震 災 の ド キ ュ メ ン タ リ ー 見 て る け ど、 恐 す ぎ る。 こ れ 今 日 寝 れ な い や つ だ。 や っ ぱ 1
人 恐 い。 。 (I’m watching a documentary about T ohoku earthquake while studying, it’ s too scary . I’m sure I won’ t be
able to sleep today . I’m afraid of being alone..)
いつ災害がくるかわかりません。 東日本大震災のとき、 カセットボンベの買い置きがなくて困ったよ。 (Y ou
never know when a disaster will happen. When T ohoku earthquake happened, I was in trouble because I didn’ t have
any cassette cylinders left over .)
study , we won’ t only be able to identify a tar get scope of
complaints in a text, but also be able to reveal potential so-
cial problems by investigating the temporal change of a tar -
get scope of complaints. Furthermore, the analysis results
can be applied beyond social media platforms. For exam-
ple, we are interested in investigating the relationship be-
tween workplace well-being and complaints by measuring
the number of complaints and their tar get scope in the daily
reports of a particular company . Such applications will be
useful for achieving a comfortable life within society .
Acknowledgement
This work was supported by JST -Mirai Program Grant
Number JPMJMI21J2, Japan.
Refer ences
[1] Mark Alicke et al. “Complaining Behavior in Social
Interaction”. In: Personality and Social Psychology
346 Informatica 47 (2023) 335–348 K. Ito et al.
Bulletin 18 (1992), pp. 286–295. DOI: 10 . 1177 /
0146167292183004 .
[2] Ron Artstein and Massimo Poesio. “Survey Arti-
cle: Inter -Coder Agreement for Computational Lin-
guistics”. In: Computational Linguistics 34.4 (2008),
pp. 555–596. DOI: 10.1162/coli.07- 034- R2 .
[3] T ilman Beck et al. “Investigating label suggestions
for opinion mining in German Covid-19 social me-
dia”. In: Pr oceedings of the 59th Annual Meeting of
the Association for Computational Linguistics and
the 1 1th International Joint Confer ence on Natu-
ral Language Pr ocessing (V olume 1: Long Papers) .
Online: Association for Computational Linguistics,
Aug. 2021, pp. 1–13. DOI: 10 . 18653 / v1 / 2021 .
acl- long.1 .
[4] David M Blei, Andrew Y Ng, and Michael I Jor -
dan. “Latent dirichlet allocation”. In: the Journal of
machine Learning r esear ch 3 (2003), pp. 993–1022.
DOI: 10.5555/944919.944937 .
[5] Y i-Ling Chung et al. “CONAN - COunter
NArratives through Nichesourcing: a Multilin-
gual Dataset of Responses to Fight Online Hate
Speech”. In: Pr oceedings of the 57th Annual
Meeting of the Association for Computational
Linguistics . Florence, Italy: Association for Com-
putational Linguistics, July 2019, pp. 2819–2829.
DOI: 10.18653/v1/P19- 1271 .
[6] Jacob Devlin et al. “Bert: Pre-training of deep bidi-
rectional transformers for language understanding”.
In: arXiv pr eprint arXiv:1810.04805 (2018). DOI:
10.48550/arXiv.1810.04805 .
[7] Ming Fang et al. “Analyzing the Intensity of Com-
plaints on Social Media”. In: Findings of the Associ-
ation for Computational Linguistics: NAACL 2022 .
Seattle, United States: Association for Computa-
tional Linguistics, July 2022, pp. 1742–1754. DOI:
10.18653/v1/2022.findings- naacl.132 .
[8] Steven Y . Feng et al. “A Survey of Data Augmenta-
tion Approaches for NLP”. In: Findings of the Asso-
ciation for Computational Linguistics: ACL-IJCNLP
2021 . Online: Association for Computational Lin-
guistics, Aug. 2021, pp. 968–988. DOI: 10.18653/
v1/2021.findings- acl.84 .
[9] João Filgueiras et al. “Complaint Analysis and Clas-
sification for Economic and Food Safety”. In: Pr o-
ceedings of the Second W orkshop on Economics and
Natural Language Pr ocessing . Hong Kong: Asso-
ciation for Computational Linguistics, Nov. 2019,
pp. 51–60. DOI: 10.18653/v1/D19- 5107 .
[10] Akash Gautam et al. “Semi-Supervised Iterative Ap-
proach for Domain-Specific Complaint Detection in
Social Media”. In: Pr oceedings of the 3r d W orkshop
on e-Commer ce and NLP . Seattle, W A, USA: As-
sociation for Computational Linguistics, July 2020,
pp. 46–53. DOI: 10.18653/v1/2020.ecnlp- 1.7 .
[1 1] Sepp Hochreiter and Jür gen Schmidhuber. “Long
short-term memory”. In: Neural computation 9.8
(1997), pp. 1735–1780. DOI: 10 . 1162 / neco .
1997.9.8.1735 .
[12] Kazuhiro Ito et al. “Identifying A T ar get Scope of
Complaints on Social Media”. In: Pr oceedings of
the 1 1th International Symposium on Information
and Communication T echnology . SoICT ’22. Hanoi,
V ietnam, 2022, pp. 1 1 1–1 18. DOI: 10 . 1145 /
3568562.3568659 .
[13] W illiam James. The Principles of Psychology . Lon-
don, England: Dover Publications, 1890.
[14] Mali Jin and Nikolaos Aletras. “Complaint Identifi-
cation in Social Media with T ransformer Networks”.
In: Pr oceedings of the 28th International Con-
fer ence on Computational Linguistics . Barcelona,
Spain (Online): International Committee on Com-
putational Linguistics, Dec. 2020, pp. 1765–1771.
DOI: 10.18653/v1/2020.coling- main.157 .
[15] Mali Jin and Nikolaos Aletras. “Modeling the Sever -
ity of Complaints in Social Media”. In: Pr oceedings
of the 2021 Confer ence of the North American Chap-
ter of the Association for Computational Linguis-
tics: Human Language T echnologies . Online: As-
sociation for Computational Linguistics, June 2021,
pp. 2264–2274. DOI: 10.18653/v1/2021.naacl-
main.180 .
[16] Mali Jin et al. “Automatic Identification and Clas-
sification of Bragging in Social Media”. In: Pr o-
ceedings of the 60th Annual Meeting of the Associa-
tion for Computational Linguistics (V olume 1: Long
Papers) . Dublin, Ireland: Association for Computa-
tional Linguistics, May 2022, pp. 3945–3959. DOI:
10.18653/v1/2022.acl- long.273 .
[17] Chul-min Kim et al. “The ef fect of attitude and per -
ception on consumer complaint intentions”. In: Jour -
nal of Consumer Marketing 20 (2003), pp. 352–371.
DOI: 10.1108/07363760310483702 .
[18] Robin M. Kowalski. “Complaints and complaining:
functions, antecedents, and consequences.” In: Psy-
chological bulletin 1 19 2 (1996), pp. 179–96. DOI:
10.1037/0033- 2909.119.2.179 .
[19] Dong-Hyun Lee. “Pseudo-Label : The Simple and
Ef ficient Semi-Supervised Learning Method for
Deep Neural Networks”. In: 2013.
[20] Jordan J Louviere, T erry N Flynn, and Anthony Al-
fred John Marley. Best-worst scaling: Theory , meth-
ods and applications . Cambridge University Press,
2015. DOI: 10.1017/cbo9781107337855 .
Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 347
[21] Julia Mendelsohn, Ceren Budak, and David Jur gens.
“Modeling Framing in Immigration Discourse on
Social Media”. In: Pr oceedings of the 2021 Confer -
ence of the North American Chapter of the Associ-
ation for Computational Linguistics: Human Lan-
guage T echnologies . Online: Association for Com-
putational Linguistics, June 2021, pp. 2219–2263.
DOI: 10.18653/v1/2021.naacl- main.179 .
[22] Kensuke Mitsuzawa et al. “FKC Corpus : a Japanese
Corpus from New Opinion Survey Service”. In: In
pr oceedings of the Novel Incentives for Collecting
Data and Annotation fr om People: types, implemen-
tation, tasking r equir ements, workflow and r esults .
Portorož, Slovenia, 2016, pp. 1 1–18.
[23] Elite Olshtain and Liora W einbach. “10. Complaints:
A study of speech act behavior among native and
non-native speakers of Hebrew”. In: 1987. DOI: 10.
1075/pbcs.5.15ols .
[24] Silviu Oprea and W alid Magdy. “iSarcasm: A
Dataset of Intended Sarcasm”. In: Pr oceedings of the
58th Annual Meeting of the Association for Compu-
tational Linguistics . Online: Association for Compu-
tational Linguistics, July 2020, pp. 1279–1289. DOI:
10.18653/v1/2020.acl- main.118 .
[25] Robert Plutchik. “Chapter 1 - A GENERAL
PSYCHOEVOLUTIONAR Y THEOR Y OF EMO-
TION”. In: Theories of Emotion . Ed. by Robert
Plutchik and Henry Kellerman. Academic Press,
1980, pp. 3–33. DOI: 10 . 1016 / B978 - 0 - 12 -
558701- 3.50007- 7 .
[26] Daniel Preoţiuc-Pietro, Mihaela Gaman, and Niko-
laos Aletras. “Automatically Identifying Complaints
in Social Media”. In: Pr oceedings of the 57th An-
nual Meeting of the Association for Computational
Linguistics . Florence, Italy: Association for Compu-
tational Linguistics, July 2019, pp. 5008–5019. DOI:
10.18653/v1/P19- 1495 .
[27] Solnit Rebecca. A paradise built in hell: The extraor -
dinary communities disaster . Penguin, 2010.
[28] Marco T ulio Ribeiro, Sameer Singh, and Carlos
Guestrin. “”Why Should I T rust Y ou?”: Explaining
the Predictions of Any Classifier”. In: Pr oceedings
of the 22nd ACM SIGKDD International Confer ence
on Knowledge Discovery and Data Mining . KDD
’16. San Francisco, California, USA: Association for
Computing Machinery, 2016, pp. 1 135–1 144. DOI:
10.1145/2939672.2939778 .
[29] Anna T rosbor g. Interlanguage Pragmatics:
Requests, Complaints, and Apologies . De
Gruyter Mouton, 201 1. DOI: doi : 10 . 1515 /
9783110885286 .
[30] Camilla Vásquez. “Complaints online: The case
of T ripAdvisor”. In: Journal of Pragmatics 43.6
(201 1). Postcolonial pragmatics, pp. 1707–1717.
DOI: 10.1016/j.pragma.2010.11.007 .
[31] Guangyu Zhou and Kavita Ganesan. “Linguistic Un-
derstanding of Complaints and Praises in User Re-
views”. In: Jan. 2016, pp. 109–1 14. DOI: 10 .
18653/v1/W16- 0418 .
348 Informatica 47 (2023) 335–348 K. Ito et al.