https://doi.or g/10.31449/inf.v47i3.4758 Informatica 47 (2023) 335–348 335 Complaints with T arget Scope Identification on Social Media Kazuhiro Ito 1 , T aichi Murayama 2 , Shuntaro Y ada 1 , Shoko W akamiya 1 and Eiji Aramaki 1 1 Nara Institute of Science and T echnology , Nara, Japan 2 SANKEN, Osaka University , Osaka, Japan E-mail: ito.kazuhiro.ih4@is.naist.jp, s-yada@is.naist.jp, wakamiya@is.naist.jp, aramaki@is.naist.jp, taichi@sanken.osaka-u.ac.jp Keywords: complaint, dataset, T witter , social media, annotation Received: March 22, 2023 A complaint is utter ed when r eality fails to meet one’ s expectations. Resear ch on complaints, which con- tributes to our understanding of basic human behavior , has been conducted in the fields of psychology , lin- guistics, and marketing. Although several appr oaches have been implemented to the study of complaints, studies have yet focused on a tar get scope of complaints. Examination of a tar get scope of complaints is crusial because the functions of complaints, such as evocation of emotion, use of grammar , and intention, ar e differ ent depending on the tar get scope. W e first tackle the construction and r elease of a complaint dataset of 6,418 tweets by annotating Japanese texts collected fr om T witter with labels of the tar get scope. Our dataset is available at https://github.com/sociocom/JaGUCHI . W e then benchmark the anno- tated dataset with several machine learning baselines and obtain the best performance of 90.4 F1-scor e in detecting whether a text was a complaint or not, and a micr o-F1 scor e of 72.2 in identifying the tar get scope label. Finally , we conducted case studies using our model to demonstrate that identifying a tar get scope of complaints is useful for sociological analysis. Povzetek: Raziskava se osr edotoča na analizo pritožb iz 6.418 tvitov z več metodami str ojnega učenja. 1 Intr oduction 1 A complaint is “ a basic speech act used to expr ess a neg- ative disagr eement between r eality and expectations for a state, pr oduct, or ganization, or event ” [23, p.195 – 208]. An analysis of complaints contributes not only to linguis- tically [30] and psychologically [1, 18] interesting but also beneficial for marketing [17]. Understanding why people are dissatisfied can help im- prove their well-being by analyzing the situation of their complaints. The methods required to deal with complaints vary greatly depending on whether the tar get scope of com- plaints is him/herself, other people, or the environment (e.g., in the workplace, the way of improvement dif fers when employees are complaining about their own skills or about their work environment). The categorization pre- sented above, regarding the tar get scope, aligns with James’ three psychological categories for the Self as the object of reference [13]: the spiritual Self, the social Self, and the material Self, respectively . In the field of natural language processing (NLP), there are some studies on how to determine whether a text is a complaint or not [26, 9, 14], or how to identify its sever - ity [15], but no studies have been conducted yet to identify a tar get scope of complaints, which means the object to- ward which/whom the complaint is directed. Our study is 1 This paper is extended version of our study [12] presented in The 1 1th International Symposium on Information and Communication T echnology (SOICT2022) an attempt to apply a computational approach focusing on a tar get scope of complaints on social media. More specifi- cally , we emphasize the importance of identifying whether the complaints are intended for the complainer him/herself, for an individual, for a group, or for the surrounding envi- ronment. This paper introduces a novel Japanese complaint dataset collected from T witter that includes labels indicating the tar get scope of complaints 2 . W e then investigated the va- lidity of our dataset using two classification tasks: a bi- nary classification task (shortly binary task) that identifies whether a text is a complaint or not, and a multiclass clas- sification task (shortly multiclass task) that identifies the tar get scope of complaints. Furthermore, we apply our tar - get scope classification model to case studies: COVID-19, of fice work, and the 201 1 of f the Pacific coast of T ohoku earthquake (we call T ohoku earthquake), aiming to analyze social phenomena. Our contributions are as follows: – W e constructed a dataset of complaints extracted from T witter labeled with the tar get scope of complaints. – W e conducted an experiment with identifying the tar - get scope of complaints and achieved an F1 score of 90.4 in detecting whether a text is a complaint or not, and a micro-F1 score of 72.2 in identifying the tar get scope label. 2 Our dataset is available at https://github.com/sociocom/ JaGUCHI 336 Informatica 47 (2023) 335–348 K. Ito et al. T able 1: Counts and examples of complaint tweets per tar get scope label in our dataset T ar get Scope Label # of T weets Example T weet SELF 468 し か し た ぶ ん 全 部 顔 と か 行 動 に 出 ち ゃ っ て る か ら 最 低 な の は 自 分 な ん だ よ ね 向 こ う に は 落 ち 度 は な い し 勝 手 に 苛 つ い て る だ け だ し ね (Maybe I’m the one who’ s the worst because it’ s all showing on my face and in my actions. It’ s not the other person’ s fault, I’m just irritated by myself.) IND 3,866 わ た し が 居 な い と ミ ル ク し ま っ て あ る 場 所 す ら わ か ん な い の か よ (Y ou do not even know where the milk is stored without me?) GRP 648 価 値 観 の 違 い か も し れ な い け ど 物 買 う の は 3 千 円 で も し ぶ る の に ギ ャ ン ブ ル に 平 気 で 金 突 っ 込 む ひ と の 気 持 ち が わ か ら な い (Maybe it’ s a dif ference in values, but I do not understand people who are reluctant to spend even 3,000 yen to buy something, but do not mind excessively spending money on gambling.) ENV 1,436 保育士の給料上がらないかな ~ 手取り 15 ~ 18 じゃやってけない よ な (...) 政 治 家 の 給 料 と か よ り 保 育 士 に 回 し て ほ し い わ、 切 実 に (I wonder if childcare workers’ salaries will go up. I can not make it on 15 to 18 take-home pay . (...) I’d really like to see more money spent on childcare workers than on politicians’ salaries.) – W e conducted three case studies to demonstrate the usefulness of identifying a tar get scope of complaints for sociological analysis. 2 Related work In pragmatics, a complaint is defined as “ a basic speech act used to expr ess a negative disagr eement between r eal- ity and expectations for a state, pr oduct, or ganization, or event ” [23, p.195 – 208]. What makes complaints dif ferent from negative sentiment polarity is that complaints tend to include expressions of the breaches of the speaker ’ s expec- tations [26], and include reasons or explanations [31]. The dataset construction is actively conducted to analyse the substance of complaints. A previous study collected complaints about food products sent to governmental insti- tutions and built an automatic classification model accord- ing to the nature of the complaint [9]. The classification classes were set up taking into account the use of customer support, the type of economic activity related, the priority of the treatment, and whether it is under the responsibility of the authority or not. Another study has created complaints dataset with labels for service categories (e.g., foods, cars, electronics, etc.) collected from reply posts to company ac- counts on T witter [26]. Another study has also constructed a complaint dataset with four labels [15]: (1) No explicit reproach: there is no explicit mention of the cause and the complaint is not of fensive, (2) Disapproval: express ex- plicit negative emotions such as dissatisfaction, annoyance, dislike, and disapproval, (3) Accusation: asserts that some- one did something reprehensible, and (4) Blame: assumes the complainee is responsible for the undesirable result. These four categories follow the definitions of the stan- dard in pragmatics [29]. [7] has assigned the intensity of complaints as a continuous value using the best-worst scal- ing method [20] by crowdsourcing. Another corpus based on the data accumulated by Fuman Kaitori Center collects Japanese complaints about products and services [22]. The corpus includes labels about a tar get of complaints such as product or service names, which is dif ferent in granularity from our study . As mentioned above, although some studies have con- structed datasets that collect complaints, they have not yet constructed them that are labeled with a tar get scope to which complaints are directed. 3 Dataset 3.1 Collection W e constructed a Japanese complaint dataset using T wit- ter . For our dataset, we collected 64,313 tweets including “# 愚 痴 (/gu-chi/)” (a hashtag of a Japanese term for com- plaints) from March 26, 2006 to September 30, 2021 us- ing the T witter API 3 . W e excluded URLs, duplicates, and retweets, and extracted only those tweets with a relatively low possibility of being a bot. Specifically , we extracted only those tweets for which the posting application was T witter for iPad, T witter for iPhone, T witter W eb App, T wit- ter W eb Client, or Keitai W eb . All hashtags were removed from the text. T weets with less than 30 characters were ex- cluded. W e extracted tweets for each month through a strat- ified sampling and finally obtained 7,573 tweets, which are 3 https://developer .twitter .com/ Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 337 of similar size with datasets recently released for NLP for social media [16, 24, 5, 3, 21]. 3.2 Annotation W e annotated the 7,573 tweets with the tar get scope label. The tweets were divided into three sets (2,524, 2,524, and 2,525 tweets in each set), and three trained external anno- tators annotated each set. First stage: Whether the tweet is a complaint or not is identified. Because most of the tweets are complaints owing to the inclusion of “# 愚痴 ”, we remove tweets identified as non-complaints. Following Olshtain’ s definition [23, p.195 – 208], we identified tweets that expressed a negative disagreement between the tweeter ’ s expectations and reality as complaints. Examples of non-complaints tweets removed by this process is shown below . “If a company is violating the Labor Standards Act, gathering evidence is critical to remedy the situation.” “It’ s easy to complain, so I’m going to shift my thinking to the positive and creative.” “I came home exhausted again today . But I saw Mt. Fuji for a bit on the train on the way home, and it kind of loosened me up. I thought I was going to cry .” Second stage: W e identify the tar get scope of complaints. W e assigned one of four labels, SELF , IND, GRP , and ENV . Although our labels broadly follow James’ theory of Self [13], we separate IND (individual) and GRP (group) because we believe that the nature of the complaints dif fers depending on whether the tar get is an individual or a group. In the case of individuals, it is associated with abuse, while in the case of groups, it is associated with hate speech. When the tar get scope was not determined uniquely or was unclear , it was removed from the dataset. W e show definitions and examples of labels below . SELF : A tar get scope includes the complainer . e.g., “I have said too much again.” IND : A tar get scope does not include the complainer , which is one or several other persons. e.g., “I hate that my boss puts me in char ge of his work!” GRP : A tar get scope does not include the complainer and has a group. e.g., “I cannot be interested in people who only think about money .” ENV : A tar get scope is not human. e.g., “It’ s raining today , so I do not feel like doing anything.” As a result of the annotation, among the 7,573 texts, 6,418 were considered as complaints. Among the com- plaint tweets, the number of labels per tar get scope is 468 for SELF , 3,866 for IND, 648 for GRP , and 1,436 for ENV . As a result, we collected 6,418 tweets. The agreement ra- tio (Kappa coef ficient) between the annotators and an eval- uator was measured to be 0.798 for the binary identifica- tion and 0.728 for the four -label classification. Agreement values are between the upper part of the substantial agree- ment [2]. Figure 1 presents the confusion matrix of human agreement on four classes normalized over the actual val- ues (rows). Examples of text for each tar get scope label and number of tweets are shown in T able 1. Figure 1: Confusion matrix of annotator agreement on four tar get scope of complaints. T able 2: Statistics on the number of characters per label. The label with the highest mean number of characters in the texts is GRP , whereas the label with the lowest mean number of characters in the texts is SELF . T ar get Scope Label Mean Median Std SELF 76.8 74.0 32.2 IND 83.2 83.0 32.4 GRP 87.8 89.0 32.5 ENV 77.8 74.0 33.8 ALL 82.0 81.0 32.8 3.3 Data analysis W e conducted two types of analysis for the contents of the dataset   to gain linguistic insight into this task and the data: the number of characters and the emotions. The re- sults of each analysis are shown below . 3.3.1 Number of characters The average number of characters in the entire dataset is 82.0, and the median is 81.0. The label with the most char - acters is GRP (mean of 87.8 and median of 89.0), and the label with the fewest characters is SELF (mean of 76.8 and median of 74.0). This suggests that while descriptions of other groups tend to be detailed, those of him/herself have 338 Informatica 47 (2023) 335–348 K. Ito et al. T able 3: Results of emotion analysis using JIWC. W e investigated the average score for each emotion per label. The highest results are in bold . T ar get Scope Label Sadness Anxiety Anger Disgust T rust Surprise Joy SELF 0.448 0.502 0.774 0.858 0.591 0.467 0.459 IND 0.424 0.425 0.846 0.904 0.568 0.457 0.451 GRP 0.407 0.431 0.861 0.954 0.564 0.477 0.444 ENV 0.434 0.490 0.773 0.824 0.545 0.464 0.482 ALL 0.426 0.445 0.826 0.888 0.564 0.461 0.458 relatively not in detail. The statistics of the number of char - acters per label are shown in T able 2. Note that we removed tweets of less than 30 characters in Section 3.1. 3.3.2 Emotion W e examine the relationship between our dataset and emo- tions, and the dif ferences in emotions between tar get scope. T o do so, we used the Japanese Linguistic Inquiry and W ord Count (JIWC) emotion dictionary 4 . This dictionary matches words with seven emotion categories (Joy , Sad- ness, Anger , Surprise, T rust, Anxiety , and Disgust) based on a translation of Pluchik’ s emotion wheel [25], obtained from a naturalistic dataset of emotional memories. The scores for each tweet (S ij ) were a ratio of the number of emotion terms in each category (W ij ), to the total number of terms (tokens;W ∗ i ) in each tweet: S ij = W ij W ∗ i log 2 (W ij +1) (1) W e used the scores from this emotion dictionary to calculate the emotion score for each tweet in our dataset and inves- tigated the average score for each emotion per label. The results are shown in T able 3. For SELF , the low value for Anger and high value for Anxiety are consistent with our intuition. When the com- plainer is him/herself, it can be interpreted that Anxiety is stronger than Anger . Disgust is higher for GRP than for IND. This indicates that feelings of Disgust are stronger for groups than individuals. In the case of Anger , both IND and GRP are high. 3.3.3 T opic T o investigate whether it is possible to extract the detailed contents of complaints in our dataset, we analyzed tweets’ topics using the Latent Dirichlet Allocation (LDA), a kind of topic model [4]. The number of topics is set to 8, and LDA is applied only to nouns with two or more Japanese characters. T able 4 shows each topic and assigned words. The following is an interpretation of the topics. Some of the topics are work-related (T opics 1, 3, 4, and 5), suggest- ing that work is the majority of complaints posted on T wit- 4 https://github.com/sociocom/JIWC-Dictionary ter .   Among work-related topics, there were topics re- lated to mental health (T opic 3), including “mood,” “stress,” and “hospital,” and topics related to family (T opic 1), in- cluding “husband” and “children,” which were divided into several tendencies. The other topic focused on COVID-19 (T opic 8), which includes “COVID-19” and “mask.” Al- though only recent tweets are relevant to this topic, it is suggested that many such complaints had been posted in- tensively . 4 Experiment 4.1 Settings In this section, we demonstrate the validity of the dataset us- ing two types of classification tasks: a binary task (2-way) that identifies whether a text is a complaint and a multiclass task (4-way) that classifies the tar get scope of complaints. These tasks correspond to the first and second stages of an- notation, respectively . W e employ two types of machine learning models: Long Short-T erm Memory (LSTM) [1 1] and Bidirectional En- coder Representations from T ransformers (BER T) [6]. The BER T model is a fine-tuned version of a model pretrained on the Japanese version of W ikipedia published by T ohoku University 5 . Before training, the dataset was preprocessed into lower - case, and all numbers were replaced with zeros. W e split the dataset, into training, validation, and test sets (7:1.5:1.5). When we split the dataset the label distribution was main- tained. W e set each parameter of the LSTM model as follows: the number of dimensions of the word embedding repre- sentation is 10, the number of dimensions of the hidden layer is 128, cross-entropy is used as the loss function, a Stochastic Gradient Descent (SGD) was applied as the op- timization method, the learning rate is 0.01, and 100 epochs are used. W e also set each parameter of the BER T model as follows: The maximum number of tokens per tweet is 128, the number of batches is 32, Adam is used as the optimiza- tion method, the learning rate is1. 0× 10 − 5 , and 10 epochs are used. After examination of the validation data, we used the above parameters. Then, for the binary task, we added 5 https://github.com/cl-tohoku/bert-japanese Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 339 T able 4: The top 5 words per topic (translated from Japanese). Some of the topics are work-related (T opics 1, 3, 4, and 5), suggesting that work is the majority of complaints posted on T witter . The other topic focused on COVID-19 (T opic 8), which includes “COVID-19” and “mask”. T opic 1 T opic 2 T opic 3 T opic 4 T opic 5 T opic 6 T opic 7 T opic 8   husband child human company really why without saying angry child movie workplace husband stupid friend adult COVID-19 boss parents’ house mood world vacation everyday money cry mood block stress place word child senior member forbidden word senior member article hospital mother company meal staf f mask 6,000 tweets to the dataset that were randomly sampled and removed complaints according to our annotation method. 4.2 Metrics W e report predictive performance of the binary task as the mean accuracy , macro-F1 score, and ROC AUC as well as existing complaints study [26]. On the other hand, we report predictive performance of the multiclass task as the micro-F1 score and macro-F1 score. 4.3 Results 4.3.1 Binary task (2-way) The results of the binary task reach an accuracy level of 83.5, an F1 score of 83.7, and an AUC of 83.5 for the LSTM model, and a level of accuracy of 89.6, an F1 score of 90.4, and an AUC of 89.4 for the BER T model (as shown in T a- ble 5). The confusion matrix of the BER T model has a T rue Positive rate of 0.92, False Positive rate of 0.14, False Neg- ative rate of 0.08, and T rue Negative rate of 0.86. For the BER T model, false negatives were reduced in number in comparison to the LSTM model. Figure 2 (a) and (b) show the confusion matrices for the LSTM and BER T models, respectively . T able 5: Results of the binary and multiclass tasks. The BER T model outperformed Major Class and the LSTM model for each metric. The bold font indicates the best score for each evaluation metric. T ask Metric Major Class LSTM BER T Accuracy 51.7 83.5 89.6 Binary F1 score 69.3 83.7 90.4 AUC 50.0 83.5 89.4 Multiclass micro-F1 score 62.1 51.7 72.2 macro-F1 score 19.2 30.1 54.5 W e are interested in what types of tokens our complaint model tries to capture. T o interpret the behavior of the model, we used LIME [28], a method for explaining ma- chine learning models, to create a visualization. W e vi- sualize the attention weights extracted from BER T model for the following example (translated from Japanese): “Re- cently , I had an encounter where all the free time I worked hard to make for a paid vacation was wasted because of the (a) LSTM model (b) BER T model Figure 2: Confusion matrices of the binary task (2-way). absence of a part-time worker who comes to work only once a week.” W e observed that the model paid attention to the expression “wasted because of the absence of a part-time worker who comes to work only once a week” for classifi- cation (as shown in Figure 3). In this example, the reason was the cause of the complaint, suggesting that our model pays attention to the same part as human intuition. (a) Binary Classification Model (b) Multi Classification Model Figure 3: V isualization of the attention weights for the sample sentences in our binary (a) and multi (b) classification models. The orange line highlights the cue of classifi- cation. For (a), highlighted words are “wasted because of the absence of a part-time worker who comes to work only once a week.” For (b), highlighted words are “The husband who plays the role of ... too disgusting.” 4.3.2 Multiclass task (4-way) The results of the multiclass classification task are a micro- F1 score of 51.7 for the LSTM model, and a micro-F1 score of 72.2 for the BER T model. Figure 4 (a) and (b) show the confusion matrices for the LSTM and BER T models, respectively . In the LSTM model, a relatively lar ge number of tweets are classified as either IND or ENV , reflecting the bias in the number of tweets in the dataset. Although the BER T model mitigates the ef fect of label bias in the dataset in 340 Informatica 47 (2023) 335–348 K. Ito et al. (a) LSTM model (b) BER T model (c) BER T with down sampling Figure 4: Confusion matrices of the multiclass task (4-way). The LSTM model classified a relatively lar ge number of tweets as IND or ENV . The results likely reflect the bias in the number of tweets in the dataset. The BER T model mitigates the ef fect of label bias in the dataset in comparison to the LSTM model. The BER T model with down sampling results show little bias among the labels. T able 6: Examples of error cases in the binary task. ID Complaint Label T weet T rue Predicted (1) non-complaint complaint お 仕 事 終 わ り ! 定 時 で 上 が れ た け ど、 フ ィ ッ ト ネ ス に 行 く か ヤ フ オ ク の 発 送 か ... 。 明 日 は 遅 番 だ か ら ジ ム 行 く の が 得 策。 来 週 ま で 行 け な い し。 (I finished the work! I was able to leave work on time, but I don’ t know if I should go to the fitness center or ship the Y ahoo Auction... I have a late shift tomorrow , so going to the gym is in my best interest. I can’ t go to there until next week.) (2) non-complaint complaint 何 か 作 り た い な ー と い う 気 分 が 出 て 来 た だ け マ シ か な ー と 思 う 昨 今。 風 邪 の 熱 に 浮 か さ れ て る だ け か も し れ な い が。 フ ォ トショ起動するのもめんどくさいモードだけど。 うん。 (I think it’ s better that I feel like making something these days. I may just be suf fering from a fever from a cold. Although I’m too lazy to start up Photoshop right now .) (3) non-complaint complaint 今 日 は 寝 坊 し て 大 変 だ っ た か ら 早 め (で も も う 0 時 ; ) に 寝 よ う。 お休みなさい ! (I overslept and had a hard time today , so I’ll go to bed early (but it’ s already midnight;). Good night!) (4) complaint non-complaint 今、 カ ラ オ ケ に 行 っ て る ら し い。 職 場 に コ ロ ナ 持 ち 込 ま な い で ね ー !! 感 染 者 出 た ら、 あ な た の 責 任 で す か ら ! (Now they are going to karaoke, I heard. Don’ t bring coronavirus into the workplace! If anyone gets infected, it’ s your fault!) (5) complaint non-complaint 感 情 豊 か で す ね っ て、 そ の 状 況、 人 に 合 わ せ て 自 分 を 作 っ て ん だ よ (People tell me I’m very emotional, but I make myself fit the situation and the people around me.) comparison to the LSTM model, the accuracy per label shows that SELF tend to be misclassified as ENV . This re- flects the fact that it is dif ficult to classify SELF and ENV because they have the common tendency to omit the tar - get scope in statements about themselves. The accuracy of GRP is relatively low because when a complainer refers to a group that does not include him/herself, the complainer does not always use words that explicitly express that tar - gets are multiple. In short, the LSTM model greatly outper - formed the major class results in macro-F1, and the BER T model somewhat mitigated the bias in the number of la- bels that af fected the LSTM classification results, further improving the macro-F1. As well as binary task, we show the visualization of what types of tokens our complaint model tries to capture for the following example (translated from Japanese): “The hus- band who plays the role of “a man sneezing boldly” even though he knows his family doesn’ t like it is too disgusting. He does it occasionally , and it’ s so dull because it’ s so arti- ficial and it shows on his face”. This tweet was identified Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 341 T able 7: Examples of error cases in the multiclass classification task. ID T ar get Scope Label T weet T rue Predicted (6) SELF IND あー、 でも休みの日とか、 歩いてる時とか、 ショッピングの時にア イディア浮かぶかも。 もう、 おっちゃんアイディア出ないから、 も っと若い人に頑張って欲しいなぁ。 (Maybe ideas happen when I’m on vacation, or walking, or shopping. As an old man, I can’ t come up with any more ideas so I wish more young people would try their best.) (7) SELF ENV 頑 張 っ て も 報 わ れ な い し 人 間 関 係 で い つ も と ん 挫 す る し ど う す り ゃ い い の か わ か ん な い な、 も う (I don’ t know what to do because my hard work is not rewarded and I always fail in personal relationships.) (8) GRP IND とある it 企業のデバッガーとして勤めてますが、 今日だけは言わせ てください。 デバッガーを馬鹿にするな。 (I work as a debugger for an IT company , and let me say this today . Don’ t mock debuggers.) (9) ENV GRP ニ キ ビ 死 ね ー ー ー ー ー ー ー っ っ っ !!!!!!!! お 前 の せ い で ブ ス さ 倍 増 す ん だ よ ク ソ 野 郎 !!!!!!!! (Pimples go away!!!!!!!!!!!!!!!! Y ou make me look twice as ugly , damn you !!!!!!!!) as IND by our model. The model paid the most attention to the words “The husband who plays the role of ... too dis- gusting” for classification (as shown in Figure 3). These words clearly illustrate the tar get of the complaint, “hus- band”, and the feeling of “too disgusting” for that person, thus the cues to which the model assigned the labels are clearly interpretable to us. 4.4 Downsampling Because the error in our multiclass task might be highly in- fluenced by the unbalanced labels of the dataset, we exper - imented with a dataset with down sampling. W e negatively sampled the number of data for labels other than SELF to approximately equal the number of labels for SELF , which has the fewest number of labels. For this experiment, we employ the BER T model and the settings are equal to Sec- tion 4.2. The result is a micro-F1 score of 55.3 and a macro- F1 score of 55.5. The results, as illustrated in Figure 4(c), indicate little bias among the labels. This result still shows a relatively high level of confusion between IND and GRP , suggesting that these pairs of labels tend to be similar lan- guages. In addition, there were relatively many cases where ENV tweets were classified as SELF , suggesting that this error may be due to the omission of the tar get to which the complaint is directed (See Section 4.5). 4.5 Err or analysis 4.5.1 Binary task (2-way) Although the BER T model showed a high score of F1 score of 90.4, the model could not classify tweets correctly in some cases. The examples of error cases are shown in T a- ble 6. (1), (2), and (3) in T able 6 show the results of False Pos- itive. In the example of (1), although the tweeter writes an expression that is not sure about the choice, it is labeled as NEGA TIVE in the true data because It does not contain any negative emotions related to the complaint. In the example of (2), although the word “lazy”, which is closely related to complaints, appear in the sentence, the expression “I think it’ s better” is the intent of the entire sentence. In the exam- ple of (3), the word “overslept” indicates an unfavorable sit- uation, but the whole sentence is not a complaint because it is simply a tweet indicating the intention to go to bed early . In all of these cases, although negative elements are used in some parts of the tweets, the purpose of the tweet is other than just complaining. These tend to be False Positive. On the other hand, in the case of (4) and (5) in T able 6, the results are False Negative. The example of (4), syntac- tically , it is a tweet indicating a kind of request to the tar get scope, but semantically it is a sentence accusing the tar - get of going out to play . The tweet in (5), tweeter corrects an error in the tar get’ s perception and intends to express that he/she is feeling uncomfortable. As in these examples, there are often cases in which there is no explicitly com- plaint language or syntax in the tweets, but words appear that semantically imply a complaint. 4.5.2 Multiclass task (4-way) W e use the results of the BER T model with high accuracy to analyze error cases. The examples of error cases are shown in T able 7. In many cases, the model predicts tweets as IND or ENV whose true labels are SELF . For example, in (6) in T able 7, there are two possible error factors: first, if the model fo- cused on the sentence “I want more young people would try their best” and recognized “young people” as the tar - 342 Informatica 47 (2023) 335–348 K. Ito et al. get, it would be a false identification because the tweeter him/herself is the tar get scope for the purpose of the tweet. The second is that the tweeter , who is the true tar get scope, is paraphrased as “old man,” and thus this word is perceived as if he were a third party . Example (7) is a tweet that tar gets him/herself, which the model predicts as a label for ENV , since the scope of the tweet is not explicitly stated. Also, the model predicts tweets as IND or ENV whose true labels are GRP . In example of (8), although it can be inferred from the context that there is more than one person who is the tar - get scope of the complaint, it is dif ficult to determine from the text whether the number is singular or plural, because there is no noun specified that indicates the tar get scope of the complaint. In example of (9), the use of the expression “go away” for a non-living tar get, commonly used to call out to a human, results in the incorrect identification of the tar get as a human being. Overall, the model tended to mis- classify tweets that implied the tar get scope, which could only be inferred from extra-textual knowledge or the tone of the comments. 5 Case studies W e apply the constructed classification model of a tar get scope of complaints to tweets related to COVID-19, of fice work, and T ohoku earthquake to show that it is useful for sociological analysis. 5.1 Case 1: COVID-19 W e obtained 698,950 Japanese tweets including “ コ ロ ナ (/ko-ro-na/)” which is a Japanese word for COVID-19 from January 1, 2020 to December 31, 2021 using the T witter API. The time series data presented in Figure 5 show that ENV accounted for a lar ge ratio of cases during the early stages of the pandemic, and that this ratio decreased over time. In the tweets classified as IND or GRP , there were many com- plaints for others whose views on COVID-19 were dif ferent from those of the complainer , whereas in the tweets classi- fied as ENV , there were many complaints for SARS-COV -2 and life during the pandemic. The examples of tweets la- beled as each label is shown in T able 8. In addition, T o confirm our hypothesis that a content of complaints varies depending on a tar get scope, we analyzed the topics of the tweets using the Latent Dirichlet Alloca- tion (LDA), a kind of topic model [4]. The number of top- ics is set to 16, and LDA is applied only to nouns and ad- jectives. T able 9 shows the five characteristic topics and five words extracted from the top 10 words per topic. The words that appear in topics about tweets labeled SELF in- clude a number of adjectives such as “afraid,” “happy ,” and “sad,” expressing their state of mind. IND is closely related to the tweeter ’ s personal relations, such as “girl- friend,” “family ,” and “parents’ house.” Complaints about GRP tend to tar get public things, such as “government,” “politics,” “Olympics,” and “celebrity .” ENV frequently contains words related to the services of their customers, such as “lesson,” “movie,” “vaccine,” and “news.” The dif ferences in topics per label showed a certain in- terpretability , suggesting that automatic classification of a tar get scope of complaints at the granularity of our dataset also contributes to a c ategorization of the content of com- plaints. (a) T weets Counts (b) The ratio of tweets labeled with each label Figure 5: T ime series data on the number of tweets per tar get scope of complaints related to COVID-19. ENV ac- counted for a lar ge proportion of cases during the early stages of the pandemic, and this proportion decreased over time. (a) T weets Counts (b) The ratio of tweets labeled with each label Figure 6: T ime series data on the number of tweets per tar get scope of complaints related to of fice work. There were few changes in the number of complaints per tar get scope over time. Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 343 T able 8: The examples of tweets related to COVID-19 labeled as each label T ar get Scope Label T weet IND 旦 那 ね、 色 ん な と こ ろ で 営 業 回 っ て る 人 だ か ら よ く 風 邪 ひ い た り 熱 出 た り す ん の。 手 洗 い う が い し て ね っ て言ってもしねぇの。 こいつのことこれからコロナさんって呼ぶことにした。 (My husband is a salesman who goes around to various places so he often catches a cold or gets a fever . I tell him to wash his hands and gar gle, but he doesn’ t. I’ve decided to call him Mr . COVID from now on.) 一 生、 平 行 線 な ん で も う い い ん じ ゃ な い で す か。 あ な た は、 コ ロ ナ は 大 し た こ と な い と 思 っ て る、 私 は 違 う。 これでいいですよ。 (All along, it’ s failed to reach an agreement, so I think we’re done. Y ou think COVID-19 is no big deal, I don’ t. I’m fine with this.) ENV コ ロ ナ が 長 引 く と 永 遠 に 子 供 に 会 え な く な り ま す 子 供 は そ の 環 境 に 馴 染 ん で し ま う か ら う ち は 何 と か line で 繋 げ よ う と し て る け ど、 も う 手 遅 れ な ん で そ れ は 悲 し い こ と (If the situation with COVID-19 is prolonged, we won’ t be able to see our child forever ... W e are trying to connect with them via LINE so that they don’ t get used to that environment, but it’ s too late now , and that’ s sad ... .) ホ ン ト 疲 れ ち ゃ っ た し、 我 慢 し て る こ と も 多 い か ら 辛 い よ コ ロ ナ 禍 じ ゃ な き ゃ と っ く に 東 京 と か も 行 っ て る し、 何 よ り ラ イ ブ 出 来 て た だ ろ う し ね (It’ s hard because I’m really tired and I have to endure so much ... . If it wasn’ t the situation with COVID-19, I would have been in T okyo by now , and more importantly , I would have been able to go to live shows.) T able 9: Five characteristic topics and five words extracted from the top 10 words per topic (translated from Japanese). SELF contains many adjectives such as “afraid,” “happy ,” and “sad,” expressing their state of mind. IND is closely related to the tweeter ’ s personal relations, such as “girlfriend,” “family ,” and “parents’ house.” Complaints about GRP tend to tar get public things, such as “government,” “politics,” “Olympics,” and “celebrity .” ENV frequently include words related to the service for which the tweeter is a customer , such as “lesson,” “movie,” “vaccine,” and “news.” T ar get Scope Label W ords extracted from the top 10 words per topic T opic 1 T opic 2 T opic 3 T opic 4 T opic 5 SELF afraid hobby natural meal a lot happy ruin stress really complex painful symptoms dislike word surprised timing vaccine tough patience result sane wedding cheerful sad life IND part-time job concert mask stupid afraid stress child parents’ house money really travel hospital test family you disturbed afteref fect fool mother friend promise girlfriend afraid please bad GRP treatment covering up Olympics vaccine player new type doctor report young man prejudice government politics af ford governor train success opinion slander criticism citizen demonstration civil servants media celebrity trash ENV lesson movie vaccine news infection cancellation ticket afraid money pain postponement gym time metropolis universal hospitalization patience positivity summer vacation like return to country really insurance dead closing down 5.2 Case 2: office work W e obtained 731,000 Japanese tweets including a word “ 仕 事 (/shi-go-to/)”, which is related to of fice work from Jan- uary 1, 2020 to December 31, 2021 using the T witter API. Note that among the tweets collected in Case 2, 12,626 tweets overlapped with those collected in Case 1. The time series data presented in Figure 6 show few changes in the ratio of complaints per tar get scope over time. This suggests that complaints regarding of fice work tended to be consistent regardless of the social situation. During the year -end and New Y ear ’ s periods, the overall number of complaints tended to decrease, while the tweets classified as ENV did not decrease during this period. As in Case 1, we analyzed the topics of the classified tweets in Case 2. T able 10 shows the five characteristic topics and five words extracted from the top 10 words per topic. The same tendency as in Case 1 was observed for all labels except ENV , with higher weights given to adjec- 344 Informatica 47 (2023) 335–348 K. Ito et al. tives such as “nervous,” “anxious,” and “sad” for SELF , words indicating personal relations such as “boss,” “you,” and “husband” for IND, and words indicating public tar gets such as “idol,” “company ,” and “voice actor” for GRP . W ith regard to ENV , while in Case 1, words indicating services to which the tweeter is a customer appeared, in Case 2, words indicating workload or vacation were com- mon, suggesting that the environment in which complaints tar get varies greatly depending on the domain. 5.3 Case 3: T ohoku earthquake In Case 1, the time series data show that complaints labeled as ENV accounted for a lar ge proportion of cases during the early stages of the pandemic, but decreased over time, while complaints labeled as IND and GRP are flat over time. This tendency suggests our labels of the tar get scope of complaints caught phenomenon called “ a paradise built in hell ” [27]. This concept means that victims often exhibit altruistic behavior , engaging in voluntary mutual aid after a disaster . In the case of our classification model, we hypoth- esize that if the phenomenon of “ a paradise built in hell ” occurs, the ratio of complaints labeled as ENV is high in the early period after the disaster , while the ratio of complaints labeled as IND or GRP increases over time. W e obtained 106,732 Japanese tweets including “ 東 日 本 大 震 災 (/hi-ga-shi-ni-ho-n-da-i-shi-n-sa-i/)” which is a Japanese word for T ohoku earthquake from March 1 1, 201 1 to March 10, 2013 using the T witter API. The time series data presented in Figure 7 show that complaints labeled as ENV accounted for a lar ge ratio of cases during the early period after the disaster and that this ratio decreased over time. In contrast to the complaints labeled as ENV , the ra- tio of complaints labeled as GRP increased from one year after the disaster . These trends suggest that our classifica- tion model for the tar get scope of complaints can be used to detect the phenomenon of “ a paradise built in hell ” in T o- hoku earthquake. The examples of tweets labeled as each label is shown in T able 1 1. 6 Conclusion & futur e work W e examined the use of computational linguistics and ma- chine learning methods to analyze the complaints subjects. W e introduced the first complaint dataset including labels that indicate a tar get scope of complaints. W e then built BER T -based classification models that achieved F1 score of 90.4 for a binary classification task and micro-F1 score of 72.2 for a multiclass classification task, suggesting the validity of our dataset. Our dataset is available to the re- search community to foster further research on complaints. While we tried to adjust the unbalanced labels of the dataset by down sampling, it is also possible to adjust it by semi- supervised learning [19, 10] or data augmentation [8]. The validation of methods to improve model performance, in- cluding these methods, is our future work. (a) T weets Counts (b) The ratio of tweets labeled with each label Figure 7: T ime series data on the number of tweets per tar get scope of complaints related to T ohoku earthquake. The complaints labeled as ENV accounted for a lar ge pro- portion of cases during the early period after the disaster and this proportion decreased over time. In contrast to the complaints labeled as ENV , the ratio of complaints labeled as GRP increased from about one year after the disaster . Furthermore, from the results of the case studies, we could show the possibility of applying the constructed mod- els to perform sociological analysis. In case study , we ap- plied our model to tweets extracted using queries related to COVID-19, of fice work, and T ohoku earthquake. In the case of COVID-19, we identified that the ratio of com- plaints tar geting the surrounding environment decreases over time. W e found that complaints tar geting the sur - rounding environment and specific individuals were more frequent, with the former being complaints about “others whose views on COVID-19 dif fer from the tweeter” and the latter being complaints about “the COVID-19 virus and the environment in which infectious disease is spreading.” These results suggest most complaints can be divided into two categories: complaints that divide people and com- plaints generate empathy and cooperation. In the case of the 201 1 of f The Pacific Coast of T ohoku Earthquake, we showed the potential of our model to detect the phe- nomenon of “ a paradise built in hell .” These viewpoints show the potential of our dataset as a starting point for so- ciological analysis. W e also experimented with a topic model for each tar get scope label as a case study using tweets about COVID-19 and of fice work, respectively . The distribution of words per topic confirms our hypothesis that the content of complaints varies greatly depending on the tar get scope. In addition, we observed that the complaints classified by our model as environmentally tar get scope varied greatly depending on the domain. In the future, as attempted through the case Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 345 T able 10: Five characteristic topics and five words extracted from the top 10 words per topic (translated from Japanese). Higher weights were given to adjectives such as “nervous,” “anxious,” and “sad” for SELF , words indicating personal relations such as “boss,” “you,” and “husband” for IND, words indicating public tar gets such as “idol,” “company ,” and “voice actor” for GRP , and words indicating the day of the week, busy season, and vacation for ENV T ar get Scope Label W ords extracted from the top 10 words per topic T opic 1 T opic 2 T opic 3 T opic 4 T opic 5 SELF nervous human like sad lonely overtime really motivation depressed busy hard painful anxious dislike dif ficult bothersome stress happiness despair weekend painful get a job patience adult beautiful IND boss you vacation bath meal every day son computer senior member husband me plan mistake meal work place information absolutely salary tough bath really fool husband friend time GRP idol recruitment voice actor everybody doctor type salary politics tough crime occupation serious interesting professional The Diet stupid company government of ficial on time last train left-wing woman knowledge understanding really ENV tired busy vacation good game go to work event tough a fun thing tomorrow Monday afraid tired refrain weekend Friday tough study end-of-year happy everybody reservation nap dull sleep T able 1 1: The examples of tweets related to T ohoku earthquake labeled as each label T ar get Scope Label T weet GRP 今、 電 車 に 乗 っ て い ま す が、 み ん な 暑 い 服 着 て い ま す ね。 だ か ら、 余 計 な 電 力 が 必 要 な の で す。 も う す ぐ 東日本大震災から 2 年。 もう一度、 見つめ直しましょう。 あぁあ、 電車の空調が入っちゃった。 (I’m taking the train now , everyone is wearing hot clothes. So we need extra electric power . It will soon be two years since T ohoku earthquake. Let’ s look back once again. Ahhh, the air conditioning is on in the train.) 東 日 本 大 震 災 の 被 災 に 関 し て 言 え ば、 未 だ に 復 興 ど こ ろ か 復 旧 す ら 出 来 て い な い 所 も あ る。 ま し て や、 福 島 県 の 一 部 県 民 は、 ふ る さ と へ 帰 れ な い ま ま で す。 選 挙 を し て る 場 合 で し ょ う か ね ぇ。 (As for the damage caused by T ohoku earthquake, there are still some areas that have not even been restored, let alone repaired. And some residents of Fukushima Prefecture are still unable to return to their hometowns. I wonder if it’ s a matter of time to hold elections.) ENV 勉 強 横 目 に 東 日 本 大 震 災 の ド キ ュ メ ン タ リ ー 見 て る け ど、 恐 す ぎ る。 こ れ 今 日 寝 れ な い や つ だ。 や っ ぱ 1 人 恐 い。 。 (I’m watching a documentary about T ohoku earthquake while studying, it’ s too scary . I’m sure I won’ t be able to sleep today . I’m afraid of being alone..) いつ災害がくるかわかりません。 東日本大震災のとき、 カセットボンベの買い置きがなくて困ったよ。 (Y ou never know when a disaster will happen. When T ohoku earthquake happened, I was in trouble because I didn’ t have any cassette cylinders left over .) study , we won’ t only be able to identify a tar get scope of complaints in a text, but also be able to reveal potential so- cial problems by investigating the temporal change of a tar - get scope of complaints. Furthermore, the analysis results can be applied beyond social media platforms. For exam- ple, we are interested in investigating the relationship be- tween workplace well-being and complaints by measuring the number of complaints and their tar get scope in the daily reports of a particular company . Such applications will be useful for achieving a comfortable life within society . Acknowledgement This work was supported by JST -Mirai Program Grant Number JPMJMI21J2, Japan. Refer ences [1] Mark Alicke et al. “Complaining Behavior in Social Interaction”. In: Personality and Social Psychology 346 Informatica 47 (2023) 335–348 K. Ito et al. Bulletin 18 (1992), pp. 286–295. DOI: 10 . 1177 / 0146167292183004 . [2] Ron Artstein and Massimo Poesio. “Survey Arti- cle: Inter -Coder Agreement for Computational Lin- guistics”. In: Computational Linguistics 34.4 (2008), pp. 555–596. DOI: 10.1162/coli.07- 034- R2 . [3] T ilman Beck et al. “Investigating label suggestions for opinion mining in German Covid-19 social me- dia”. In: Pr oceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 1 1th International Joint Confer ence on Natu- ral Language Pr ocessing (V olume 1: Long Papers) . Online: Association for Computational Linguistics, Aug. 2021, pp. 1–13. DOI: 10 . 18653 / v1 / 2021 . acl- long.1 . [4] David M Blei, Andrew Y Ng, and Michael I Jor - dan. “Latent dirichlet allocation”. In: the Journal of machine Learning r esear ch 3 (2003), pp. 993–1022. DOI: 10.5555/944919.944937 . [5] Y i-Ling Chung et al. “CONAN - COunter NArratives through Nichesourcing: a Multilin- gual Dataset of Responses to Fight Online Hate Speech”. In: Pr oceedings of the 57th Annual Meeting of the Association for Computational Linguistics . Florence, Italy: Association for Com- putational Linguistics, July 2019, pp. 2819–2829. DOI: 10.18653/v1/P19- 1271 . [6] Jacob Devlin et al. “Bert: Pre-training of deep bidi- rectional transformers for language understanding”. In: arXiv pr eprint arXiv:1810.04805 (2018). DOI: 10.48550/arXiv.1810.04805 . [7] Ming Fang et al. “Analyzing the Intensity of Com- plaints on Social Media”. In: Findings of the Associ- ation for Computational Linguistics: NAACL 2022 . Seattle, United States: Association for Computa- tional Linguistics, July 2022, pp. 1742–1754. DOI: 10.18653/v1/2022.findings- naacl.132 . [8] Steven Y . Feng et al. “A Survey of Data Augmenta- tion Approaches for NLP”. In: Findings of the Asso- ciation for Computational Linguistics: ACL-IJCNLP 2021 . Online: Association for Computational Lin- guistics, Aug. 2021, pp. 968–988. DOI: 10.18653/ v1/2021.findings- acl.84 . [9] João Filgueiras et al. “Complaint Analysis and Clas- sification for Economic and Food Safety”. In: Pr o- ceedings of the Second W orkshop on Economics and Natural Language Pr ocessing . Hong Kong: Asso- ciation for Computational Linguistics, Nov. 2019, pp. 51–60. DOI: 10.18653/v1/D19- 5107 . [10] Akash Gautam et al. “Semi-Supervised Iterative Ap- proach for Domain-Specific Complaint Detection in Social Media”. In: Pr oceedings of the 3r d W orkshop on e-Commer ce and NLP . Seattle, W A, USA: As- sociation for Computational Linguistics, July 2020, pp. 46–53. DOI: 10.18653/v1/2020.ecnlp- 1.7 . [1 1] Sepp Hochreiter and Jür gen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780. DOI: 10 . 1162 / neco . 1997.9.8.1735 . [12] Kazuhiro Ito et al. “Identifying A T ar get Scope of Complaints on Social Media”. In: Pr oceedings of the 1 1th International Symposium on Information and Communication T echnology . SoICT ’22. Hanoi, V ietnam, 2022, pp. 1 1 1–1 18. DOI: 10 . 1145 / 3568562.3568659 . [13] W illiam James. The Principles of Psychology . Lon- don, England: Dover Publications, 1890. [14] Mali Jin and Nikolaos Aletras. “Complaint Identifi- cation in Social Media with T ransformer Networks”. In: Pr oceedings of the 28th International Con- fer ence on Computational Linguistics . Barcelona, Spain (Online): International Committee on Com- putational Linguistics, Dec. 2020, pp. 1765–1771. DOI: 10.18653/v1/2020.coling- main.157 . [15] Mali Jin and Nikolaos Aletras. “Modeling the Sever - ity of Complaints in Social Media”. In: Pr oceedings of the 2021 Confer ence of the North American Chap- ter of the Association for Computational Linguis- tics: Human Language T echnologies . Online: As- sociation for Computational Linguistics, June 2021, pp. 2264–2274. DOI: 10.18653/v1/2021.naacl- main.180 . [16] Mali Jin et al. “Automatic Identification and Clas- sification of Bragging in Social Media”. In: Pr o- ceedings of the 60th Annual Meeting of the Associa- tion for Computational Linguistics (V olume 1: Long Papers) . Dublin, Ireland: Association for Computa- tional Linguistics, May 2022, pp. 3945–3959. DOI: 10.18653/v1/2022.acl- long.273 . [17] Chul-min Kim et al. “The ef fect of attitude and per - ception on consumer complaint intentions”. In: Jour - nal of Consumer Marketing 20 (2003), pp. 352–371. DOI: 10.1108/07363760310483702 . [18] Robin M. Kowalski. “Complaints and complaining: functions, antecedents, and consequences.” In: Psy- chological bulletin 1 19 2 (1996), pp. 179–96. DOI: 10.1037/0033- 2909.119.2.179 . [19] Dong-Hyun Lee. “Pseudo-Label : The Simple and Ef ficient Semi-Supervised Learning Method for Deep Neural Networks”. In: 2013. [20] Jordan J Louviere, T erry N Flynn, and Anthony Al- fred John Marley. Best-worst scaling: Theory , meth- ods and applications . Cambridge University Press, 2015. DOI: 10.1017/cbo9781107337855 . Complaints with T ar get Scope Identification… Informatica 47 (2023) 335–348 347 [21] Julia Mendelsohn, Ceren Budak, and David Jur gens. “Modeling Framing in Immigration Discourse on Social Media”. In: Pr oceedings of the 2021 Confer - ence of the North American Chapter of the Associ- ation for Computational Linguistics: Human Lan- guage T echnologies . Online: Association for Com- putational Linguistics, June 2021, pp. 2219–2263. DOI: 10.18653/v1/2021.naacl- main.179 . [22] Kensuke Mitsuzawa et al. “FKC Corpus : a Japanese Corpus from New Opinion Survey Service”. In: In pr oceedings of the Novel Incentives for Collecting Data and Annotation fr om People: types, implemen- tation, tasking r equir ements, workflow and r esults . Portorož, Slovenia, 2016, pp. 1 1–18. [23] Elite Olshtain and Liora W einbach. “10. Complaints: A study of speech act behavior among native and non-native speakers of Hebrew”. In: 1987. DOI: 10. 1075/pbcs.5.15ols . [24] Silviu Oprea and W alid Magdy. “iSarcasm: A Dataset of Intended Sarcasm”. In: Pr oceedings of the 58th Annual Meeting of the Association for Compu- tational Linguistics . Online: Association for Compu- tational Linguistics, July 2020, pp. 1279–1289. DOI: 10.18653/v1/2020.acl- main.118 . [25] Robert Plutchik. “Chapter 1 - A GENERAL PSYCHOEVOLUTIONAR Y THEOR Y OF EMO- TION”. In: Theories of Emotion . Ed. by Robert Plutchik and Henry Kellerman. Academic Press, 1980, pp. 3–33. DOI: 10 . 1016 / B978 - 0 - 12 - 558701- 3.50007- 7 . [26] Daniel Preoţiuc-Pietro, Mihaela Gaman, and Niko- laos Aletras. “Automatically Identifying Complaints in Social Media”. In: Pr oceedings of the 57th An- nual Meeting of the Association for Computational Linguistics . Florence, Italy: Association for Compu- tational Linguistics, July 2019, pp. 5008–5019. DOI: 10.18653/v1/P19- 1495 . [27] Solnit Rebecca. A paradise built in hell: The extraor - dinary communities disaster . Penguin, 2010. [28] Marco T ulio Ribeiro, Sameer Singh, and Carlos Guestrin. “”Why Should I T rust Y ou?”: Explaining the Predictions of Any Classifier”. In: Pr oceedings of the 22nd ACM SIGKDD International Confer ence on Knowledge Discovery and Data Mining . KDD ’16. San Francisco, California, USA: Association for Computing Machinery, 2016, pp. 1 135–1 144. DOI: 10.1145/2939672.2939778 . [29] Anna T rosbor g. Interlanguage Pragmatics: Requests, Complaints, and Apologies . De Gruyter Mouton, 201 1. DOI: doi : 10 . 1515 / 9783110885286 . [30] Camilla Vásquez. “Complaints online: The case of T ripAdvisor”. In: Journal of Pragmatics 43.6 (201 1). Postcolonial pragmatics, pp. 1707–1717. DOI: 10.1016/j.pragma.2010.11.007 . [31] Guangyu Zhou and Kavita Ganesan. “Linguistic Un- derstanding of Complaints and Praises in User Re- views”. In: Jan. 2016, pp. 109–1 14. DOI: 10 . 18653/v1/W16- 0418 . 348 Informatica 47 (2023) 335–348 K. Ito et al.