189 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) Petra BAGO Faculty of Humanities and Social Sciences, University of Zagreb Virna KARLIĆ Faculty of Humanities and Social Sciences, University of Zagreb In this paper, we present recent developments on a new version (v3.0) of Dir- Korp (Korpus direktivnih govornih činova hrvatskoga jezika), the first Croatian corpus of directive speech acts developed for the purposes of pragmatic re- search. The corpus contains 800 elicited speech acts collected via an online questionnaire with role-playing tasks, a method of simulated communication that is implemented under pre-set conditions. This method is suitable for re- searching speech acts due to the ability to collect a great number of examples of such acts of equal propositional content and illocutionary purpose used in the same controlled situations. The presented situations are classified into two categories with regard to the relationship between the participants of the com- munication act: (1) situations involving interlocutors who are not in a familiar relationship; (2) situations involving interlocutors in a familiar relationship. As- signments of the two categories are organized into four pairs, asking respond- ents to share a speech act of similar propositional content. The respondents were 100 Croatian speakers, all undergraduate (63%) or graduate students (37%) of the Faculty of Humanities and Social Sciences (University of Zagreb). The corpus has been manually annotated on the speech act level, each speech act containing up to 14 features: (1) respondent ID, (2) familiarity/unfamiliarity, (3) utterance type, (4) directive performative verb in 1st person, (5) illocutionary force, (6) propositional content, (7) T/V form, (8) exhortative, (9) lexical marker of request, (10) lexical marker of apology, (11) lexical marker of gratitude, (12) Bago, P., Karlić, V.: DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0). Slovenščina 2.0, 11(1): 189–217. 1.01 Izvirni znanstveni članek / Original Scientific Article DOI: https://doi.org/10.4312/slo2.0.2023.1.189-217 https://creativecommons.org/licenses/by-sa/4.0/ 190 Slovenščina 2.0, 2023 (1) | Articles honorific title, (13) grammatical mood, and (14) modal verb in 2nd person. It con- tains 12,676 tokens and 1,692 types. The corpus is encoded according to the TEI P5: Guidelines for Electronic Text Encoding and Interchange, developed and maintained by the Text Encoding Initiative Consortium (TEI). DirKorp is available for download under the CC BY-SA 4.0 license from GitHub in TEI format. We describe applied pragmatic annotation as well as the structure of the corpus. Keywords: corpus pragmatics, directive speech acts, DirKorp, Croatian language 1 Introduction Corpus pragmatics is an interdisciplinary field of study that incorporates linguistic pragmatics and computer science, focusing on the develop- ment of natural language corpora in machine-readable form and their application for the purposes of studying pragmatics phenomena in writ- ten and spoken language. For a long time, linguists have regarded a corpus approach to language as incompatible with pragmatics (Rome- ro-Trillo, 2008, p. 2). While the corpus approach to studying language implies processing authentic language material by implementing quan- titative research methods, pragmatic research is still predominantly of a qualitative nature – based on the researcher’s introspection, data obtained by elicitation methods, or an analysis of authentic linguistic material of small size. The application of corpus analysis in the research of pragmatics phenomena represents a major turnaround in the devel- opment of pragmatics, primarily because it allows a systematic analysis of language material of large size, and thus the detection of patterns of language use that “fly below the radar” through qualitative analyses (ibid.). In addition, it should be pointed out that the application of new technologies in linguistics, including pragmatics, did not only ensure, facilitate or accelerate numerous research processes but opened the door to a new, different way of thinking about language (Leech, 1992). The application of corpus methods to large pragmatic corpora allows one to systematically carry out empirically based pragmatic research (Bunt, 2017, p. 327). While the implementation of corpus research can result in minor adjustments to existing theories on the one hand, it can lead to a rethinking of pragmatic concepts and theoretical frameworks on the other, such as the development of the theory of dialogue acts (ibid.). 191 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) According to Rühlemann and Aijmer (2015), one of the major methodological problems that corpus pragmatic researchers encoun- ter is the disproportionate relationship between pragmatic functions and language forms by which these functions are expressed. One form can perform multiple pragmatic functions in discourse, while one func- tion can be expressed by different forms, which makes the process of querying a corpus according to the pragmatic function criterion rather difficult. It is for this reason that corpus pragmatic researchers most often investigate conventional speech acts or functions performed by a limited number of language forms (Jucker et al., 2009, p. 4). The aim of this paper is to present the first Croatian corpus of directive speech acts, DirKorp, manually annotated for corpus pragmatic research. The paper is structured as follows: Section 2 describes selected work related to corpus pragmatic research, Section 3 explores the defi- nition, classification, and research methods of directive speech acts, while the subsequent three sections present the DirKorp corpus. Sec- tion 4 gives a description of the developed corpus, Section 5 describes 14 annotation features, and Section 6 presents the structure of the corpus encoded according to the TEI P5: Guidelines for Electronic Text Encoding and Interchange (TEI Consortium, 2021). Finally, Section 7 contains the conclusions and some directions for future work. This is a follow-up paper from the conference “Language Technol- ogies and Digital Humanities” held in Ljubljana, Slovenia on 15th–16th September 2022, where we presented DirKorp v2.0. Here we present a new published version of the DirKorp (v3.0) with two additional an- notation layers, as well as a new section clarifying the definition, clas- sification, and research methods of directive speech acts (Section 3). 2 Related work The number of large corpora with systematically implemented pragmatic annotation remains relatively small. Due to a disproportionate relation- ship between pragmatic functions and the language forms by which these functions are expressed, automatic corpus annotation does not produce satisfactory results. For this reason, only a few researchers have engaged in creating larger corpora of this sort. Generally, for the purposes 192 Slovenščina 2.0, 2023 (1) | Articles of corpus pragmatic research, specialized corpora of smaller size are pro- duced for individual research purposes. In addition, pragmatic research is sometimes carried out on corpora without pragmatic annotation. An example of a corpus that does not contain pragmatic annota- tion but was used for pragmatic research is the Birmingham Blog Cor- pus1 (Kehoe and Gee, 2007; 2012). In fact, this is a subcorpus of a larger set of corpora being developed at the department Research and Development Unit for English Studies at the Birmingham City University. It consists of blog posts and reader comments, and includes some 500 million words in English that were collected between 2000 and 2010. Automatic POS annotation was performed using the Stanford Core NLP tools2 and included lemma annotations and part-of-speech categories3 based on the Universal Dependencies framework,4 while the docu- ments contain metadata of the publication date. Pragmatic research on speech acts has been conducted on this corpus. For example, Lutzky and Kehoe (2017a; 2017b) used it to analyse apologies as speech acts that contain formulaic expressions, which facilitate their querying in a corpus when using the available tools. Similarly, we (Karlić and Bago, 2021) conducted research on the pragmatic functions and properties of imperatives using corpora with- out pragmatic annotation. We used hrWaC and srWaC (Ljubešić and Klubička, 2014), two large web corpora of the Croatian and Serbian languages with morphosyntactic annotation. For the purposes of the analysis, an additional pragmatic annotation of a representative sample of verbs in an imperative form was carried out manually. Other corpora of the Croatian spoken and written language with no pragmatic annota- tion have also been used as a resource for corpus pragmatic research. For example, Hržica, Košutar, and Posavec (2021) used the Croatian Corpus of the Spoken Language of Adults (HrAL) (Kuvač Kraljević and Hržica, 2016) and the Croatian National Corpus of the written language (HNK) (Tadić, 1996) for the search and analysis of connectors and dis- course markers. 1 https://www.webcorp.org.uk/wcx/lse/corpora 2 https://stanfordnlp.github.io/CoreNLP/ 3 See more about the POS tagset used for the Birmingham Blog Corpus: https://www.webcorp. org.uk/wcx/lse/guide. 4 https://universaldependencies.org/u/pos/index.html 193 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) According to Bunt (2017) the majority of corpora with pragmatic annotation contain labels on discourse relationships in written texts and on spoken dialogue acts. An example of such a larger corpus is the Penn Discourse Treebank or PDTB5 (Prasad et al., 2018) which con- tains labels on discourse relations, i.e., discourse structure and its se- mantics. Discourse annotations were added to a subcorpus consisting of texts published in the newspaper Wall Street Journal with a total of around 1 million tokens, included in a bigger corpus Penn Treebank (PTB). Bunt (2017) states that there are corpora of other languages developed for the purposes of studying the co-occurrence of discourse labels, such as Chinese, Czech, Dutch, German, Hindi, and Turkish – emphasizing that these corpora are manually annotated and of modest size. Additionally, for each corpus a new schema was developed based on various theoretical starting points. DialogBank6 (Bunt et al., 2019) is one of a rare dialogue corpus anno- tated with an ISO 24617-2 standard. It contains already existing dialogue corpora annotated with various schemas. Four corpora are of English, namely HCRC Map Task (Anderson et al., 1991), Switchboard (Godfrey et al., 1992), TRAINS (Allen et al., 1995) and DBOX (Petukhova et al., 2014), and four of Dutch – DIAMOND (Geertzen et al., 2004), OVIS7, Dutch Map Task (Caspers, 2000) and Schiphol (Prüst et al., 1984). Dialogue act anno- tation involves segmenting a dialogue into defined grammatical units and augmenting each unit with one or more communicative function labels. Another example of a corpus with a pragmatic annotation is the Engineering Lecture Corpus8 (Alsop and Nesi, 2013; 2014) which con- tains 76 transcripts based on hour-long video recordings of engineering lectures held in English at three universities. It is manually annotated for three pragmatic features: humour, storytelling, and summary.9 Each feature can be augmented with one of the attributes containing addi- tional information that describes the feature in more detail. Further, the corpus contains labels regarding significant breaks, laughter, writing or drawing on the board, etc. 5 https://doi.org/10.35111/qebf-gk47 6 https://dialogbank.lsv.uni-saarland.de/ 7 http://www.let.rug.nl/vannoord/Ovis/ 8 www.coventry.ac.uk/elc 9 https://www.coventry.ac.uk/research/research-directories/current-projects/2015/engineer- ing-lecture-corpus-elc/annotations-and-mark-ups/ 194 Slovenščina 2.0, 2023 (1) | Articles Finally, we present the SPICE-Ireland corpus (Systems of Pragmat- ic Annotation in the Spoken Component of ICE-Ireland) (Kallen and Kirk, 2012), a part of a larger set of corpora ICE-Ireland (International Cor- pus of English: Ireland Component) containing pragmatic, discourse, and prosodic features. The corpus contains various types of private and public, formal and informal dialogues and monologues of a length of about 2,000 words, with a size of some 625,000 words. It consists of spoken English. The pragmatic annotation of speech acts is based on Searle’s classification (Searle, 1969; 1976): representatives, direc- tives, commissives, expressives, and declaratives. When it comes to corpus research of speech acts, researchers have two options: (1) to analyse examples from existing corpora of au- thentic linguistic material, or (2) to analyse examples of elicited lin- guistic material. In the second case, different types of data completion tests are usually applied, and based on the obtained results smaller custom-made corpora are created for the needs of individual research (and therefore not publicly available). This method is most often used in cross-linguistic, contrastive research, but it is also used in the study of individual languages (e.g., Barron, 2008; Trosborg, 1995). For an overview of pragmatic research of speech acts (including directives) on elicited linguistic material, see, for example, Wojtaszek (2008). To the best of our knowledge, there exist no publicly available corpo- ra of spoken or written Croatian with pragmatic annotation. So far, Croa- tian linguists have mostly dealt with speech acts from a theoretical per- spective, referring primarily to the Austin’s and Searle’s theory (cf. Pupo- vac, 1990; Ivanetić, 1995; Miščević, 2018; Palašić, 2020). However, in recent years the number of research projects based on the qualitative and quantitative analysis of small-sized authentic linguistic materials (from literary texts and advertisements to email messages and political discourse in Croatian and other languages) has been increasing (cf. e.g., Pišković, 2007; Matić, 2011; Franović and Šnajder, 2012; Šegić, 2019). 3 Directive speech acts: definition, classification, and research methods During verbal communication, speakers express their thoughts in the form of utterances, through which they convey information, express 195 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) their emotions and attitudes, or try to modify the addressee’s behav- iour (Capone, 2009, p. 1015). Speech acts are utterances with specific properties and communi- cative functions (ibid.): A speech act (…) is not merely the expression of a thought. It is the vo- calization of a certain representation of the world (external or internal) aimed at making official the display of an intention to change a state of things and at changing things by the public display of that intention. Therefore, speech acts can be briefly defined as “actions performed via utterances” (Yule, 2002, p. 47). According to Searle (1975), there are five types of speech acts: (1) representatives – statements that can be evaluated as true or false; (2) directives, which speakers use to influence the addressee’s wishes and actions; (3) commissives, through which speakers commit to perform some action in the future; (4) expressives, which speakers use to express their feelings or attitudes; and (5) de- claratives or institutionalized declarations that formally change the state of affairs in extralinguistic reality (cf. Karlić and Bago, 2021). Directive speech acts, or directives, are a type of speech act by which speakers express their “(...) desire/wish for the addressee to do something. (...) In using a directive, the speaker intends to elicit some future course of action on the part of the addressee, thus making the world match the words via the addressee” (Huang, 2009, p. 1004). Directives differ with respect to their illocutionary force. The illo- cutionary force of a directive depends on how binding it is for the ad- dressee. If the speaker insists on its realization, the illocutionary force of the directive is strong – and vice versa. According to this criterion, directive speech acts are classified into orders, commands, requests, pleads, incentives, advice, etc. (cf. Piper et al., 2005, p. 1021; Karlić and Bago, 2021, p. 37). Directive speech acts can be direct or indirect. Direct directives contain an explicit directiveness marker – an imperative (Close the win- dow) or a performative verb in the first person of the present tense (I ask you to close the window). Directiveness can be expressed implic- itly, through assertions without a performative verb (You should close the window), interrogative utterances (Can you close the window?), or 196 Slovenščina 2.0, 2023 (1) | Articles elliptical utterances (Um… the window…). Just like illocutionary force, the propositional content of directive speech acts can also be ex- pressed explicitly and implicitly (It is cold here [implicature: Close the window]) (cf. Huang, 2009, p. 1005; Karlić and Bago, 2021, p. 39). According to Brown and Levinson (1987, p. 65–66), directives represent a typical example of face-threatening acts. For this reason, when using them, speakers often apply various politeness strategies that mitigate their illocutionary force (e.g., implicatures and lexical or grammatical modifiers of illocutionary force). The foundations of speech act theory were laid by the philoso- phers John Austin and John Searle in works published in the 1960s and 1970s. Since then, numerous studies of speech acts have been conducted. In the beginning, they were non-empirical, based on the researcher’s intuition. In recent years, however, the number of empiri- cal studies of speech acts has grown significantly. Jucker (2009) distin- guishes three types of data collection methods for the needs of empiri- cal research of speech acts – field, laboratory, and armchair (Flӧck and Geluykens, 2015, p. 10): While armchair approaches investigate participants’ intuitions and at- titudes about language use, field and laboratory approaches aim at studying actual language use. They differ, however, in the way lan- guage data are produced. While in laboratory approaches, language use is elicited by researchers (by employing role-plays or administer- ing discourse completion tasks), field data are defined by the absence of such elicitation techniques. Field methods are therefore observa- tional in nature, i.e., they require an authentic communicative intent by participants to produce language. Each of the mentioned methods has its advantages and disadvan- tages. For the purposes of creating the DirKorp corpus, we applied the laboratory method of eliciting language production by role-playing. The main advantage of this method is that it gives “full variable control to the researcher (...), and can generate large amounts of data; however, participants use language without their own intrinsic communicative intent in fictional scenarios” (Flӧck and Geluykens, 2015, p. 11). This method allowed us to collect a large amount of mutually comparable 197 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) directives with the same propositional content and produced in the same controlled circumstances. In the following sections, we present a new version (v3.0) of Dir- Korp, the first Croatian corpus of directive speech acts. 4 Corpus description DirKorp (Korpus direktivnih govornih činova hrvatskoga jezika) (Karlić and Bago, 2021) is a Croatian corpus of directive speech acts developed for the purposes of pragmatic research. The corpus contains 800 elic- ited speech acts collected via an online questionnaire with role-playing tasks applying the method of simulated communication that is imple- mented under pre-set conditions. This method is suitable for research- ing speech acts due to the ability to collect a great number of examples of speech acts of equal propositional content and illocutionary purpose used in the same controlled situations. The questionnaire included eight closed-type role-playing tasks. These types of tasks imply recording the speaker’s reactions (in this case in writing) to the stimulus without feedback. In each task, the participants are presented with one textually described hypothetical situation asking them to refer a directive speech act to their interlocutor. Their assignment was to imagine they were in the presented situation and to give a written statement they would use in the described situations. The presented situations are classified into two categories with regard to the relationship between the participants of the communication act: (1) situations involving interlocutors who are not in a familiar relationship (i.e., interlocutors who are not close and are not equal in terms of power relations, and communicate in more or less in/formal situations); (2) situations involving interlocutors in a fa- miliar relationship (i.e., interlocutors in a close and equal relationship who communicate in more or less in/formal situations). Assignments of the two categories are organized into four pairs, asking respondents to share a speech act of similar propositional content: “I want you to return something that belongs to me” (for text of this role-playing task pair see Example 1 when interlocutors have (a) an unfamiliar relationship (label “NEFAM1”) and (b) a familiar relationship (label “FAM1”)); “I want you to answer my inquiry” (for text of this role-playing task pair see Example 2 when interlocutors have (a) an unfamiliar relationship (label “NEFAM2”) 198 Slovenščina 2.0, 2023 (1) | Articles and (b) a familiar relationship (label “FAM2”)); “I want you to change something that bothers me” (for text of this role-playing task pair see Example 3 when interlocutors have (a) an unfamiliar relationship (label “NEFAM3”) and (b) a familiar relationship (label “FAM3”)); “I want you to stop behaving inappropriately” (for text of this role-playing task pair see Example 4 when interlocutors have (a) an unfamiliar relationship (label “NEFAM4”) and (b) a familiar relationship (label “FAM4”))10. Example 1 (a) Upravo si pojeo/la ručak u restoranu. Posluživao te stariji konobar koji se odnosio prema tebi ljubazno i profesionalno. Prilikom plaćanja računa konobar ti vraća 100 kuna manje nego što je trebao. Želiš da ti konobar vrati novac. Zamisli da se konobar nalazi pred tobom i napiši što bi mu točno rekao/la u danoj situaciji (nemoj prepričavati, već iskaz formuliraj kao da se izravno obraćaš sugovorniku). (Eng. You just ate lunch at a restaurant. You were served by an el- derly waiter who treated you kindly and professionally. When pay- ing the bill, the waiter refunds you 100 kunas less than he should have. You want the waiter to give you your money back. Imagine the waiter was in front of you and write what exactly you would say to him in the given situation (do not recount but formulate the statement as if you were addressing the interlocutor directly).) (b) Posudio/la si knjigu najboljem prijatelju (ili prijateljici). Rekao ti je da će ti je uskoro vratiti, no nije održao riječ. Sjedite zajedno u kafiću, situacija je opuštena, razgovarate o svakodnevnim stva- rima. Želiš mu dati do znanja da ti treba čim prije vratiti knjigu. Zamisli da se tvoj prijatelj nalazi pred tobom i napiši što bi mu točno rekao/la u danoj situaciji (nemoj prepričavati, već iskaz for- muliraj kao da se izravno obraćaš sugovorniku). (Eng. You lent a book to your best friend. (S)he told you (s)he’d give it back to you soon, but (s)he didn’t keep her/his word. You are sitting together in a café, the situation is relaxed, you talk about everyday things. You want to let her/him know you need to get your book back as soon as possible. Imagine your friend was in front of you and write what exactly you would say to her/him in the given situation (do not recount but formulate the statement as if you were addressing the interlocutor directly).) 10 Full texts of role-playing tasks are available in the corpus header as well. 199 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) Example 2 (a) Poslao/la si e-mail profesoru s upitom možeš li pohađati njegov izborni kolegij i hitno trebaš njegov odgovor i potvrdu u mailu. Međutim, profesor ne odgovara već tjedan dana, a rok za upis završava sutradan. Želiš ponovno zatražiti njegovu povratnu in- formaciju. Napiši kratak e-mail profesoru kakav bi mu uputio/la u navedenoj situaciji. (Eng. You sent an email to the professor asking if you can attend his elective course and urgently need his response and confirma- tion in the email. However, the professor has not responded for a week, and the admission deadline ends the next day. You want to ask for his feedback again. Write a short email to the professor as you would in this situation.) (b) Poslao/la si poruku (WhatsApp, Viber, Messenger) najboljem pri- jatelju (ili prijateljici) s pozivom na druženje sljedeće večeri. On je vidio poruku, ali nije odgovorio do sutradan. Želiš da ti odgovori čim prije kako bi mogao/la isplanirati ostatak dana. Napiši kratku poruku prijatelju kakvu bi mu uputio/la u navedenoj situaciji. (Eng. You sent a message (WhatsApp, Viber, Messenger) to your best friend with an invitation to hang out the next night. (S)he saw the message but did not respond until the next day. You want her/ him to reply as soon as possible so you can plan the rest of the day. Write a short message to your friend as you would in this situation.) Example 3 (a) Voziš se u taksiju. Prozori su otvoreni i želiš da ih taksist zatvori jer ti je hladno. Zamisli da se nalaziš u navedenoj situaciji i napiši što bi točno rekao/la taksistu (nemoj prepričavati, već iskaz formuliraj kao da se izravno obraćaš sugovorniku). (Eng. You’re riding in a cab. The windows are open, and you want the taxi driver to close them because you’re cold. Imagine that you are in this situation and write down what exactly you would say to the taxi driver (do not recount but formulate the statement as if you were addressing the interlocutor directly).) (b) Voziš se u autu na suvozačkom mjestu. Vozač je tvoj najbolji pri- jatelj (ili prijateljica). Budući da vozi prebrzo i gleda u mobitel, ne osjećaš se ugodno i želiš da uspori. Zamisli da se nalaziš u danoj situaciji i napiši što bi mu točno rekao/la (nemoj prepričavati, već iskaz formuliraj kao da se izravno obraćaš sugovorniku). 200 Slovenščina 2.0, 2023 (1) | Articles (Eng. You are riding in the car in the passenger seat. The driver is your best friend. Because (s)he’s driving too fast and looking at her/ his cell phone, you don’t feel comfortable and want her/him to slow down. Imagine that you are in a given situation and write what exactly you would say to her/him (do not recount but formulate the statement as if you were addressing the interlocutor directly).) Example 4 (a) Nalaziš se u dućanu i čekaš u redu pred blagajnom. Velika je gužva. Ispred tebe se u red ugura gospođa srednje dobi. Ljudi u redu iza tebe negoduju jednako kao i ti. Želiš da gospođa stane na kraj reda. Zamisli da se nalaziš u danoj situaciji i napiši što bi točno rekao/la gospođi (nemoj prepričavati, već iskaz formuliraj kao da se izravno obraćaš sugovorniku). (Eng. You’re in the store waiting in line at the cash register. It’s crowded. A middle-aged lady squeezes in front of you. The people in line behind you are just as resentful as you are. You want the lady to stand at the end of the line. Imagine that you are in a given situation and write down what exactly you would say to the lady (do not recount but formulate the statement as if you were directly addressing the interlocutor).) (b) Slušaš predavanje na fakultetu. Sjediš pored dvoje kolega s koji- ma si inače vrlo blizak/bliska. U jednom trenutku oni počinju glas- no razgovarati i smijati se. Njihov razgovor ti smeta jer ne možeš pratiti predavanje, a i nastavnik pogledava u vašem smjeru. Želiš da prestanu. Zamisli da se nalaziš u danoj situaciji i napiši što bi im točno rekao/la (nemoj prepričavati, već iskaz formuliraj kao da se izravno obraćaš sugovorniku). (Eng. You’re listening to a lecture in college. You’re sitting next to two colleagues with whom you are otherwise very close. At some point, they start talking loudly and laughing. Their conversation bothers you because you can’t follow the lecture, and the lecturer looks in your direction. You want them to stop. Imagine that you are in a given situation and write down exactly what you would say to them (do not recount but formulate the statement as if you were addressing the interlocutor directly).) The respondents were 100 Croatian speakers, all undergraduate (63%) or graduate students (37%) of the Faculty of Humanities and 201 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) Social Sciences (University of Zagreb), aged between 18 to 33, with Croatian being the native language for the majority (96%). The ques- tionnaire was administered in December 2020 and January 2021. Be- fore completing the questionnaire, all the respondents were informed of the purposes of the study as well as what data would be collected. All the respondents voluntarily participated in the study and were made aware that they could withdraw from it at any time. By choosing to par- ticipate in the study, respondents gave informed consent for their data to be processed for the stated research purposes. The questionnaire was administered anonymously via an online survey, and the language material collected was used exclusively for research purposes. The elicitation of language production by the role-playing method has its advantages and disadvantages. On the one hand, it enables the collection of a large number of speech acts with the same proposition- al content and illocutionary purpose. On the other hand, users of the corpus should keep in mind that the language material collected by this method does not reflect the features of actual language use, but instead shows what speakers think they would say and/or do in hypo- thetical situations. DirKorp contains 12,676 tokens and 1,692 types.11 Since it con- sists of 800 speech acts, it is a relatively small corpus compared to some of the corpora with pragmatic annotation presented in Section 3. However, as the first Croatian corpus with detailed pragmatic annota- tion, DirKorp can serve as a useful resource for researching the charac- teristics of speech acts on a formal and content level, the application of politeness strategies in communication in different situations, and the properties of other grammatical-pragmatic and lexical-pragmatic phenomena in the Croatian language that are annotated in the corpus. In addition, we believe that DirKorp can serve as a complement to re- search on speech acts that are conducted on authentic language mate- rials and as a starting point for conducting contrastive research on the characteristics and use of speech acts in other languages. In addition, we hope that it will contribute to the development of larger corpora of 11 Respondents’ answers contain utterances but also text about what they would do in the giv- en situation. At this moment, the corpus contains no annotation of utterances of speech acts, and therefore we cannot analyse the average length of a response. Generally, we can only state that some speech acts contain only one utterance, while some contain more than one. 202 Slovenščina 2.0, 2023 (1) | Articles the Croatian language with pragmatic annotation, and that such work will encourage a wider application of the corpus-pragmatic research method. In Karlić and Bago (2021), we have conducted corpus pragmatic analyses of the collected speech acts to investigate ways and means of expressing directives, and their pragmatic characteristics and func- tions. For example, we confirmed that indirect directives are more fre- quent than direct ones, especially among interlocutors who are not in a familiar relationship. Regarding a(n) (un)familiar relationship between interlocutors, we detected that explicit illocutionary force is more fre- quent in communication between interlocutors with a familiar relation- ship, while implicit illocutionary force is more frequent in communica- tion between interlocutors with an unfamiliar relationship. Additionally, we have identified that imperative utterances are a more frequent type of direct directives than utterances with a directive performative verb in 1st person. For more such corpus pragmatic analyses see Karlić and Bago (2021). 5 Corpus annotation The collected language material has been manually annotated on the speech act level by two independent annotators12 with university graduate degrees in the field of philology. Annotators received oral and written instructions, including illustrative examples for all the features they had to annotate. Basic categorization of speech acts (directive; direct and indirect; explicit and implicit) and their formal and pragmatic properties (i.e., performative verbs) was carried out according to the theory of speech acts by Austin (1962), Searle (1969; 1976), and their successors. The features and components of speech acts related to the phenomenon 12 When comparing the annotations of two annotators, in all categories, disagreements were found in at most 1.2% of examples and were mostly the result of accidental mistakes by one of the annotators. Once such mistakes were corrected, a consensus was reached among the annotators. The only category in which the disagreement was higher (2.5%) was the category “Illocutionary force”. In most cases, these were examples with general- ized conversational implicature (one annotator marked speech acts with this type of impli- cature as explicit, and another as implicit). Based on the instruction to label speech acts with all types of implicature as implicit, a consensus was reached among the annotators for this category as well. 203 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) of politeness (familiarity, use of T/V forms, and certain lexical modifi- ers of the illocutionary force of speech acts) are taken according to the politeness theory of Brown and Levinson (1987), while the grammati- cal characteristics (utterance type, grammatical mood, modal verbs) of speech acts are categorized according to the grammatical descriptions of contemporary Croatian and Serbian languages (Silić and Pranjković, 2007; Piper et al., 2005). For more on individual categories, see Karlić and Bago (2021). In the new version of DirKorp (v3.0), each speech act can contain up to 14 features. The first eight features were part of the corpus version v1.0, features nine to 12 were part of v2.0, while fea- tures 13 and 14 were newly added. Appendix A contains the frequency distribution of features two to 14. For a more detailed frequency distri- bution of all features see Karlić and Bago (2021). (1) Respondent ID – This mandatory feature contains information on the identification of the respondent uttering the speech act. (2) Familiarity/unfamiliarity – This mandatory feature contains in- formation on the category of the proposed situation in which the speech act was uttered. Four situations are labelled ‘unfamiliar’ (involving interlocutors who are not in a familiar relationship), while the other four situations are labelled ‘familiar’ (involving interlocu- tors who are in a familiar relationship). (3) Utterance type – This mandatory feature contains information on the utterance type regarding its structural organization. It con- tains six labels: (a) an imperative utterance, (b) an assertive ut- terance (a statement), (c) an utterance in the form of a question, (d) an utterance in the form of a predicate ellipsis13, (e) a nonver- bal signal, (f) a case of avoidance of executing a speech act (see Example 5). 13 Utterances in the form of a predicate ellipsis were singled out as a separate category due to: (1) the absence of a verb (and potentially other components of the sentence structure) and therefore the default indirectness and implicitness of the speech act, which makes them incomparable to other utterances in the corpus; (2) impossibility to determine the type of utterance for all examples due to their elliptical structure. 204 Slovenščina 2.0, 2023 (1) | Articles Example 5 (a) E vrati mi onu knjigu koju sam ti posudio. (Eng. Hey, give me back that book I lent you.) (b) Oprostite, ali mislim da ste mi krivo vratili novce. (Eng. Excuse me, but I think you gave me my money back wrong.) (c) Možete li molim vas zatvoriti prozore? (Eng. Could you please close the windows?) (d) E, moja knjiga?? (Eng. Hey, my book??) (e) [Samo bih zavrtjela očima da vide moje neodobravanje, ali ne bih ništa rekla.]14 (Eng. [I’d just roll my eyes so that they see my disapproval, but I wouldn’t say anything.]) (f) [Ne bih ništa rekao.] (Eng. [I wouldn’t say anything.]) (4) Directive performative verb in 1st person – This optional feature contains information on the representation of a directive performa- tive verb in 1st person as part of the speech act, only for assertive utterances and utterances in the form of a question. It contains two labels: (a) yes and (b) no (see Example 6). Example 6 (a) Oprostite, molim da odete na kraj reda. (Eng. Excuse me, I am imploring you to go to the end of the line.) (b) Gospođo, morate na kraj reda stati. (Eng. Madam, you must move to the end of the line.) (5) Illocutionary force – The optional feature contains information on the explicitness or implicitness of the illocutionary force of a speech act. It is only applied to utterances that contain verbal means (an imperative utterance, an assertive utterance, an utterance in the form of a question, and in the form of an ellipsis). It contains two labels: (a) explicit and (b) implicit (see Example 7). 14 Descriptions of non-verbal situations can be found in Example 5 (e) and (f). All other ex- amples contain actual utterances. DirKorp v3.0 does not contain annotations of utterances. Therefore, it is currently not possible to filter the speech acts with regard to actual utterances or descriptions of non-verbal situations. 205 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) Example 7 (a) Daj mi donesi više onu knjigu, treba mi! (Eng. Bring me that book already, I need it!) (b) Kaj je s onom knjigom koju sam ti posudio? (Eng. What happened to that book I lent you?) (6) Propositional content – This optional feature contains informa- tion on the explicitness or implicitness of the propositional content of a speech act. It is only applied to utterances that contain verbal means (an imperative utterance, an assertive utterance, an utter- ance in the form of a question, and in the form of an ellipsis). It contains two labels: (a) explicit and (b) implicit (see Example 8). Example 8 (a) Gledaj na cestu, pusti mobitel. (Eng. Look at the road, leave the cell phone.) (b) Ti hoćeš da poginemo? (Eng. You want us to die?) (7) T/V form – This optional feature contains information on how the respondent addressed the interlocutor, using an informal (T-form) or a formal you (V-form). It is only applied to utterances that con- tain verbal means (an imperative utterance, an assertive utterance, an utterance in the form of a question, and in the form of an ellip- sis). It contains three labels: (a) T-form, (b) V-form, and (c) impos- sible to determine (see Example 9). Example 9 (a) Oprosti, dao si mi manje novca (Eng. SorryT-form, youT-form gave me less change.) (b) Oprostite, mislim da ste mi ipak još dužni 100 kuna. (Eng. ExcuseV-form me, I think youV-form still owe me 100 kunas.) (c) Hmm... još 100 kuna, zar ne? (Eng. Hmm… another 100 kunas, right?) (8) Exhortative – This optional feature contains information on the representation of an exhortative as part of the speech act (a lexical 206 Slovenščina 2.0, 2023 (1) | Articles mean used to express encouragement, i.e., incentive particles). It is only applied to utterances that contain verbal means (an impera- tive utterance, an assertive utterance, an utterance in the form of a question, and in the form of an ellipsis). It contains two labels: (a) yes and (b) no (see Example 10). Example 10 (a) Daj mi više vrati knjigu, treba mi za knjižnicu. (Eng. Bring me back my book already, I need it for the library.) (b) Jel se sjećaš one knjige koju sam ti posudila? Potrebna mi je. Možeš li mi ju donijeti sutra na faks? (Eng. Do you remember that book I lent you? I need it. Could you bring it tomorrow to uni?) (9) Request – This optional feature contains information on whether the speech act includes a lexical marker of request (e.g., “please”). It is only applied to utterances that contain verbal means (an im- perative utterance, an assertive utterance, an utterance in the form of a question, and in the form of an ellipsis). It contains two labels: (a) yes and (b) no (see Example 11). Example 11 (a) E da, jel bi mi mogao/la vratiti knjigu, molim te? (Eng. Oh yeah, could you bring the book back, please?) (b) Zaboravio si mi vratiti knjigu, jel se možeš idući put sjetiti? (Eng. You forgot to bring me back the book, can you remember next time?) (10) Apology – This optional feature contains information on whether the speech act includes a lexical marker of apology. It is only ap- plied to utterances that contain verbal means (an imperative ut- terance, an assertive utterance, an utterance in the form of a ques- tion, and in the form of an ellipsis). It contains two labels: (a) yes and (b) no (see Example 12). Example 12 (a) Oprostite, ovdje fali još 100 kuna (Eng. Excuse me, 100 kunas are missing here.) 207 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) (b) Možete li molim vas pritvoriti prozore, hladno mi je? (Eng. Could you please close the windows, I’m cold?) (11) Gratitude – This optional feature contains information on whether the speech act includes a lexical marker of gratitude. It is only ap- plied to utterances that contain verbal means (an imperative ut- terance, an assertive utterance, an utterance in the form of a ques- tion, and in the form of an ellipsis). It contains two labels: (a) yes and (b) no (see Example 13). Example 13 (a) Molim te mi samo javi da znam zbog organizacije hoćeš li doći. Hvala ti! (Eng. Please just let me know whether you’re coming so that I know because of the organization. Thank you!) (b) Heej, jel dolaziš večeras na druženje? Moram znati zbog organi- zacije. xoxo (Eng. Heeey, are you coming tonight to hang out? I need to know because of the organization. xoxo) (12) Honorific title – This optional feature contains information on whether the speech act includes an honorific title. It is only ap- plied to utterances that contain verbal means (an imperative ut- terance, an assertive utterance, an utterance in the form of a ques- tion, and in the form of an ellipsis). It contains two labels: (a) yes and (b) no (see Example 14). Example 14 (a) Gospođo, kraj reda je dolje. (Eng. Madam, the end of the line is back there.) (b) Oprostite, tamo je kraj reda! (Eng. Excuse me, the end of the line is there!) (13) Grammatical mood – This optional feature contains information on grammatical mood used in a speech act. It is only applied to indirect speech acts (assertive utterances and utterances in the form of a question) since it is understood that direct imperative 208 Slovenščina 2.0, 2023 (1) | Articles speech acts contain verbs in the imperative mood. Accordingly, this feature contains two labels: (a) indicative mood and (b) condi- tional mood (see Example 15). Example 15 (a) Oprostite, ali ovo nije kraj reda. (Eng. Excuse me, but this is not the end of the line.) (b) Oprostite jel bi mogli zatvorit prozore? Malo mi je hladno. (Eng. Excuse me, could you close the windows? I’m a little cold.) (14) Modal verb in 2nd person – This optional feature contains infor- mation on the representation of modal verb in 2nd person as part of a speech act. It is only applied to indirect speech acts (an assertive utterance and an utterance in the form of a question). It contains two labels: (a) yes and (b) no (see Example 16). Example 16 (a) Oprostite, mislim da je došlo do pogreške, trebate mi vratiti još 100 kuna. (Eng. Sorry, I think there was a mistake, you have to return an- other 100 kunas.) (b) Malo je hladno ovdje, možemo možda zatvoriti prozor? (Eng. It’s a little cold in here, can we possibly close the window?) 6 Corpus format DirKorp is encoded according to the TEI P5: Guidelines for Electronic Text Encoding and Interchange, developed and maintained by the Text Encoding Initiative Consortium (TEI) (TEI Consortium, 2021). The TEI document is comprised of a header and the body of the corpus. The content of the elements and attributes are in Croatian. The metadata of the corpus is given in the header including bibliographic informa- tion; the editorial practice; a structured taxonomy describing catego- ries used for each of the 14 pragmatic features in the annotation pro- cess (see Figure 1 for an example), including the full text of the eight situations on the questionnaire; a list of questionnaire participants with information on their age, gender, undergraduate or graduate level of 209 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) study, enrolment in a philological/non-philological/combined study program and native language (see Figure 2 for an example); and a list of revisions of the DirKorp versions. The body of the corpus is com- posed of one division containing utterances with pragmatic features (see Figure 3 for an example). DirKorp is available for download under the CC BY-SA 4.0 license from GitHub in TEI format (https://github.com/pbago/DirKorp). Govorni čin sadržava obraćanje na ti (atribut se odnosi na tipove iskaza koji uključuju verbalna sredstva [imperativni, tvrdnja, upitni, eliptični]). Govorni čin sadržava obraćanje na Vi (atribut se odnosi na tipove iskaza koji uključuju verbalna sredstva [imperativni, tvrdnja, upitni, eliptični]). Nije moguće odrediti sadržava li govorni čin obraćanje na ti ili Vi (atribut se odnosi na tipove iskaza koji uključuju verbalna sredstva [imperativni, tvrdnja, upitni, eliptični]). Figure 1: An example of a pragmatic feature description – how the respondent ad- dressed the interlocutor (V-form, T-form, or impossible to determine, annotation feature 7 from Section 5).

ispitanik/ispitanica, 20 godina, spol Ž, preddiplomski studij Filozofskog fakulteta, nefilološko usmjerenje, materinji jezik hrvatski

Figure 2: An example of participant information. Ispričavam se, pardon, fali još sto kuna. Oprostite. Figure 3: An example of a speech act containing all 14 pragmatic features. 210 Slovenščina 2.0, 2023 (1) | Articles 7 Conclusion and future work In this article we have presented DirKorp v3.0, the first Croatian cor- pus of directive speech acts, containing 800 elicited speech acts col- lected via an online questionnaire with role-playing tasks, specifically developed for pragmatic research studies. The respondents were 100 Croatian speakers, all students of the Faculty of Humanities and Social Science (University of Zagreb). The corpus has been manually anno- tated on the level of a speech act, with each speech act containing up to 14 features. It contains 12,676 tokens and 1,692 types. The corpus is available for download under the CC BY-SA 4.0 license from GitHub in TEI format. Further work is planned on the corpus, which includes an evalu- ation of the developed schema for annotating directive speech acts (e.g., test-retest reliability on a sample of data to evaluate stability and consistency of the schema, domain experts reviewing the schema to determine if it adequately captures the relevant aspects of the data, reviewing the adequacy of encoding choices regarding attributes and its values), annotation at the levels smaller than a speech act, as well as augmentation with additional features such as information on vari- ous politeness strategies applied in a speech act. Acknowledgments This paper is generously co-financed by the institutional project of the Faculty of Humanities and Social Sciences (University of Zagreb) “South Slavic languages in use: pragmatic analyses” (principle researcher Vir- na Karlić). We also wish to thank our annotators for the time and effort. References Allen, J. F., Schubert, L. K., Ferguson, G., Heeman, P., Hwang, C. H., Kato, T., Light, M., …, & Traum, D. R. (1995). The TRAINS Project: A Case Study in Building a Conversational Planning Agent. Journal of Experimental & Theoretical Artificial Intelligence, 7(1),7–48. Alsop, S., & Nesi, H. (2013). Annotating a Corpus of Spoken English: The Engi- neering Lecture Corpus (ELC). In Proceedings of GSCP 2012: Speech and Corpora (pp. 58–62). Firenze University Press, Florence. 211 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) Alsop, S., & Nesi, H. (2014). The Pragmatic Annotation of a Corpus of Aca- demic Lectures. In The International Conference on Language Resources and Evaluation 2014 Proceedings (pp. 1560–1563). Reykjavik: European Language Resources Association. Anderson, A. H., Bader, M., Gurman Bard, E., Boyle, E., Doherty, G., Garrod, S., Isard, S., …, & Weinert, R. (1991). The HCRC Map Task Corpus, Language and Speech, 34(4), 351–366. Austin, J. L. (1962). How to Do Things with Words. Oxford: Clarendon Press. Barron, A. (2008). The structure of requests in Irish English and English Eng- lish. In K. P. Schneider & A. Barron (Eds.), Variational Pragmatics: A Focus on Regional Varieties in Pluricentric Languages (pp. 35–68). John Benja- mins Publishing Company. Brown, P., & Levinson, S. C. (1987). Politeness: Some Universals in Language Usage. Cambridge University Press. Bunt, H. (2017). Computational Pragmatics. In Oxford Handbook of Pragmat- ics (pp. 326–345). Oxford University Press, New York. Bunt, H., Petukhova, V., Malchanau, A., Fang, A. & Wijnhoven, K. (2019). The DialogBank: Dialogues with Interoperable Annotations. In Language Re- sources and Evaluation, 53(2), 213–249. Capone, A. (2009). Speech Acts, Classification and Definition. In Concise En- cyclopedia of Pragmatics (pp. 1015–1017). Oxford: Elsevier. Caspers, J. (2000). Melodic Characteristics of Backchannels in Dutch Map Task Dialogues. In Proceedings, 6th International Conference on Spoken Lan- guage Processing (pp. 611–614). Beijing: China Military Friendship Pub- lish,. Retrieved from https://www.isca-speech.org/archive/icslp_2000/ Flӧck, I., & Geluykens, R. (2015). Speech Acts in Corpus Pragmatics: A Quan- titative Contrastive Study of Directives in Spontaneous and Elicited Dis- course. In Yearbook of Corpus Linguistics and Pragmatics (pp. 7–37). Springer International Publishing. Franović, T., & Šnajder, J. (2012). Speech Act Based Classification of Email Messages in Croatian Language. In Proceedings of the Eighth Language Technologies Conference (pp. 69–72). Ljubljana: Information Society. Geertzen, J., Girard, Y., Morante, R., Van der Sluis, J., Van Dam, H., Suijker- buijk, B., Van der Werf, R., & Bunt, H. (2004). The DIAMOND Project. In: Proceedings of the 8th Workshop on the Semantics and Pragmatics of Dia- logue (CATALOG 2004), Barcelona. Godfrey, J., Holliman, E. & McDaniel, J. (1992). SWITCHBOARD: Telephone Speech Corpus for Research and Development. In: IEEE International 212 Slovenščina 2.0, 2023 (1) | Articles Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 517– 520). San Francisco: IEEE Computer Society. Hržica, G., Košutar, S., & Posavec, K. (2021). Konektori i druge diskursne oznake u pisanome i spontanome govorenom jeziku. Fluminensia: časopis za filološka istraživanja, 33(1), 25–52. Huang, Y. (2009). Speech Acts. In Concise Encyclopedia of Pragmatics (pp. 1000–1009). Oxford: Elsevier. Ivanetić, N. (1995). Govorni činovi. Zagreb: FF-press, Zavod za lingvistiku Filo- zofskoga fakulteta Sveučilišta u Zagrebu. Jucker, A. H. (2009). Speech Act Research between Armchair, Field and Labo- ratory: The Case of Compliments. Journal of Pragmatics, 41, 1611–1635. Jucker, A. H., Schreier, D., & Hundt, M. (Eds.). (2009). Corpora: Pragmatics and Discourse. Rodopi, Amsterdam. Kallen, J. L., & Kirk, J. M. (2012). SPICE-Ireland: A User’s Guide. Retrieved from https://pure.qub.ac.uk/en/publications/spice-ireland-a-users-guide Karlić, V., & Bago, P. (2021). (Računalna) pragmatika: temeljni pojmovi i korpusnopragmatičke analize. Zagreb: FF Press. Retrieved from https:// openbooks.ffzg.unizg.hr/index.php/Ffpress/catalog/book/125. Kehoe, A., & Gee, M. (2007). New Corpora from the Web: Making Web Text More ‘Text-Like’. In Studies in Variation, Contacts and Change in English 2. Retrieved from https://varieng.helsinki.fi/series/volumes/02/kehoe_gee/ Kehoe, A., & Gee, M. (2012). Reader Comments as an Aboutness Indicator in Online Texts: Introducing the Birmingham Blog Corpus. In: Studies in Variation, Contacts and Change in English 12. Retrieved from https://var- ieng.helsinki.fi/series/volumes/12/kehoe_gee/ Kuvač Kraljević, J., & Hržica, G. (2016). Croatian Adult Spoken Language Cor- pus (HrAL). Fluminensia: časopis za filološka istraživanja, 28(2), 87–102. Leech, G. N. (1992). Corpora and Theories of Linguistic Performance. In Direc- tions in Corpus Linguistics (pp. 105–122). De Gruyter, Berlin. Ljubešić, N., & Klubička, F. (2014). {bs, hr, sr}WaC-Web Corpora of Bosnian, Croatian and Serbian. In: Proceedings of the 9th Web as Corpus Workshop (WaC-9) (pp. 29–35). Association for Computational Linguistics, Gothen- burg. Retrieved from https://aclanthology.org/W14-0405.pdf Lutzky, U., & Kehoe, A. (2017a). I Apologize for My Poor Blogging: Searching for Apologies in the Birmingham Blog Corpus. Corpus Pragmatics, 1(1), 37–56. Lutzky, U., & Kehoe, A. (2017b). Oops, I Didn’t Mean to Be so Flippant. A Cor- pus Pragmatic Analysis of Apologies in Blog Data. Journal of Pragmatics, 116, 27–36. 213 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) Matić, D. (2011). Govorni činovi u političkome diskursu. PhD thesis. Zagreb: Faculty of Humanities and Social Sciences. Miščević, N. (2018). Rođenje pragmatike. Orion Art, Beograd. Palašić, N. (2020). Pragmalingvistika – lingvistički pravac ili petlja? Zagreb: Hrvatska sveučilišna naklada. Petukhova, V., Gropp, M., Klakow, D., Eigner, G., Topf, M., Srb, S., Motlicek, P., … Potard, …, & Schmidt, A. (2014). The DBOX Corpus Collection of Spoken Human-Human and Human-Machine Dialogues. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (pp. 252–258). European Language Resources Association, Reykjavik. Piper, P. et al. (2005) = Предраг Пипер, Ивана Антонић, Бранислава Ружић, Срето Танасић, Људмила Поповић, Бранко Тошовић. 2005. Синтакса савременог српског језика. Проста реченица, Београд: Институт за српски језик САНУ, Београдска књига, Матица српска. Pišković, T. (2007). Dramski diskurs između pragmalingvistike i feminističke lingvistike. Rasprave: Časopis Instituta za hrvatski jezik i jezikoslovlje, 33(1), 325–341. Prasad, R., Webber, B., & Lee, A. (2018). Discourse Annotation in the PDTB: The NextGeneration. In: Proceedings of the 14th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (pp. 87–97). Santa Fe: Association for Computational Linguistics. Retrieved from https://aclanthology.org/ W18-4710.pdf Prüst, H., Minnen, G. & Beun, R. (1984). Transcriptie dialooogesperiment juni/ juli 1984, IPORapport 481. Eindhoven: Institute for Perception Research, Eindhoven University of Technology. Pupovac, M. (1990). Jezik i djelovanje. Zagreb: Biblioteka časopisa Pitanja. Romero-Trillo, J. (Ed.). (2008). Pragmatics and Corpus Linguistics: A Mutualis- tic Entente. De Gruyter, Berlin. Rühlemann, C., & Aijmer, K. (2015). Introduction. Corpus pragmatics: laying the foundations. In: Corpus pragmatics (pp. 1–28). Searle, J. R. (1969). Speech Acts. Cambridge University Press, Cambridge. Searle, J. R. (1975). A Taxonomy of Speech Acts. In: Minnesota Studies in the Philosophy of Science (Vol. 9, pp. 344–369). University of Minnesota Press. Searle, J. R. (1976). A classification of illocutionary acts. Language in Society, 5, 1–23. Silić, S. & Pranjković, I. (2007). Gramatika hrvatskoga jezika za gimnazije i vi- soka učilista. Zagreb: Školska knjiga. 214 Slovenščina 2.0, 2023 (1) | Articles Šegić, T. (2019). Tata kupi mi auto und Nivea Milk weil es nichts Besseres für die Hautpflege gibt. Filologija, 73, 103–116. Tadić, M. (1996). Računalna obradba hrvatskoga i nacionalni korpus. Suvre- mena lingvistika, 41–42, 603–611. TEI Consortium (Ed.). (2021). TEI P5: Guidelines for Electronic Text Encoding and Interchange. TEI Consortium. Trosborg, A. (1995). Interlanguage Pragmatics: Requests, Complaints, and Apologies. Berlin; New York: Mouton de Gruyter. Wojtaszek, A. (2016). Thirty years of Discourse Completion Test in Contrastive Pragmatics research, Linguistica Silesiana, 37, 161–173. Yule, G. (2002). Pragmatics. Oxford, New York: Oxford University Press. DirKorp: hrvaški korpus direktivnih govornih dejanj (v3.0) V prispevku predstavljamo razvoj nove različice (v3.0) korpusa DirKorp (Kor- pus direktivnih govornih činova hrvatskoga jezika), prvega hrvaškega korpusa direktivnih govornih dejanj, ki je bil izdelan za namene raziskav pragmatike. Korpus vsebuje 800 govornih dejanj, ki so bila zbrana s spletnim vprašalnikom z nalogami igranja vlog – gre za metodo stimulirane komunikacije, ki poteka pod vnaprej določenimi pogoji. Metoda je primerna za raziskovanje govornih dejanj, saj lahko na ta način zberemo veliko število primerov z enako propozi- cijsko vsebino in ilokucijskim namenom, ki so uporabljeni v enaki kontrolira- ni situaciji. Predstavljene situacije razdelimo v dve kategoriji glede na odnos med udeleženci komunikacijskega dejanja: (1) situacije, ki vključujejo sogo- vorce, ki niso v sorodstvenem razmerju; (2) situacije z govorci v sorodstvenem razmerju. Naloge v obeh kategorijah so razdeljene v štiri pare, od sodelujočih pa zahtevajo, da pripišejo govorno dejanje s podobno propozicijsko vsebino. V vprašalniku je sodelovalo 100 govorcev hrvaščine; vsi so bili dodiplomski (63 %) ali podiplomski študenti (37 %) Fakultete za humanistiko in družbene vede (Univerza v Zagrebu). Korpus je bil ročno označen na ravni govornih de- janj, vsako dejanje pa vsebuje do 14 značilnosti: (1) ID sodelujočega, (2) so- rodstveno/nesorodstveno razmerje, (3) tip izjave, (4) direktivni performativni glagol v prvi osebi, (5) ilokucijska sila, (6) propozicijska vsebina, (7) tikanje/ vikanje, (8) prepričevalnost, (9) leksikalni označevalec za prošnjo, (10) leksi- kalni označevalec za opravičilo, (12) naziv spoštovanja, (13) slovnični naklon, (14) modalni glagol v drugi osebi. Korpus vsebuje 12.676 pojavnic in 1.692 različnic, enkodiran pa je v skladu s smernicami TEI P5: Guidelines for Elec- tronic Text Encoding and Interchange, ki jih razvija in vzdržuje konzorcij Text 215 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) Encoding Initiative Consortium (TEI). DirKorp je v formatu TEI na voljo za pre- nos pod licenco CC BY-SA 4.0 na platformi GitHub. V prispevku opišemo ozna- čevanje in strukturo korpusa. Ključne besede: korpusna pragmatika, direktivna govorna dejanja, DirKorp, hrvaški jezik 216 Slovenščina 2.0, 2023 (1) | Articles Appendix A: Frequency distribution of annotated features 2-14 Utterance type ∑ Illocutionary force Propositional content T/V form Exhortative Request Apology Gratitude Honorific title Grammatical mood Modal verb in 2nd person Yes No Explicit Implicit Explicit Implicit T-form V-form Yes No Yes No Yes No Yes No Yes No Indicative Conditional Yes No A NEFAM1 Imperative 2 N/A N/A 2 0 0 2 0 2 0 1 1 2 0 1 1 0 2 0 2 N/A N/A N/A N/A Assertive 88 3 85 3 85 11 77 2 86 0 0 88 3 85 88 0 0 88 4 84 83 5 11 77 Question 10 0 10 0 10 0 10 0 9 1 0 10 2 8 8 2 0 10 1 9 8 2 9 1 Ellipsis 0 N/A N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A B FAM1 Imperative 22 N/A N/A 22 0 21 1 22 0 0 17 5 7 15 0 22 0 22 0 22 N/A N/A N/A N/A Assertive 15 1 14 1 14 7 8 15 0 0 1 14 1 14 0 15 0 15 0 15 8 7 2 13 Question 60 0 60 0 60 33 27 60 0 0 2 58 7 53 3 57 0 60 0 60 55 5 28 32 Ellipsis 3 N/A N/A 0 3 2 1 2 0 1 0 3 0 3 0 3 0 3 0 3 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A C NEFAM2 Imperative 0 N/A N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N/A N/A N/A N/A Assertive 87 39 48 39 48 56 31 0 87 0 0 87 44 43 10 77 40 47 66 21 54 33 1 86 Question 13 0 13 0 13 12 1 0 87 0 0 13 6 7 2 11 6 7 10 3 12 1 11 2 Ellipsis 0 N/A N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A D FAM2 Imperative 40 N/A N/A 40 0 38 2 40 0 0 14 26 13 27 0 40 2 38 0 40 N/A N/A N/A N/A Assertive 8 2 6 2 6 4 4 8 0 0 1 7 2 6 0 8 1 8 0 8 6 2 1 7 Question 46 0 45 0 46 5 41 45 0 1 0 46 2 44 0 46 0 46 0 46 45 1 13 33 Ellipsis 3 N/A N/A 0 3 1 2 1 0 2 0 3 0 3 1 2 1 2 0 3 N/A N/A N/A N/A Nonverbal signal 2 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 1 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A E NEFAM3 Imperative 2 N/A N/A 2 0 2 0 0 2 0 0 2 0 2 1 1 1 1 0 1 N/A N/A N/A N/A Assertive 1 0 1 0 1 1 0 0 1 0 0 1 0 1 0 1 1 0 0 1 1 0 0 1 Question 96 1 95 1 95 96 0 0 95 1 0 96 42 54 63 33 4 92 3 93 71 25 84 12 Ellipsis 0 N/A N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 1 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A F FAM3 Imperative 86 N/A N/A 86 0 80 6 86 0 0 59 27 16 70 0 86 0 86 0 86 N/A N/A N/A N/A Assertive 4 0 4 0 4 1 3 4 0 0 1 3 0 4 0 4 0 4 0 4 2 2 0 4 Question 9 0 9 0 9 5 4 9 0 0 0 9 2 7 0 9 0 9 0 9 8 1 7 2 Ellipsis 1 N/A N/A 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A H NEFAM4 Imperative 19 N/A N/A 19 0 19 0 0 19 0 3 16 11 8 4 15 0 19 14 5 N/A N/A N/A N/A Assertive 55 9 46 9 46 12 43 0 55 0 0 55 10 45 27 28 0 55 35 20 52 3 4 51 Question 12 0 12 0 12 10 2 0 11 1 0 12 3 9 7 5 0 12 5 7 11 1 9 3 Ellipsis 1 N/A N/A 0 1 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 13 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A G FAM4 Imperative 43 N/A N/A 43 0 40 3 1 0 42 27 16 11 32 1 42 0 43 0 43 N/A N/A N/A N/A Assertive 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 Question 12 0 12 0 12 12 0 0 0 12 1 11 6 6 1 11 1 11 0 12 12 0 12 0 Ellipsis 37 N/A N/A 0 37 33 4 1 1 35 22 15 2 35 0 37 0 37 0 37 N/A N/A N/A N/A Nonverbal signal 2 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 5 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Directive performative verb in 1st person to determine 217 DirKorp: A Croatian Corpus of Directive Speech Acts (v3.0) Utterance type ∑ Illocutionary force Propositional content T/V form Exhortative Request Apology Gratitude Honorific title Grammatical mood Modal verb in 2nd person Yes No Explicit Implicit Explicit Implicit T-form V-form Yes No Yes No Yes No Yes No Yes No Indicative Conditional Yes No A NEFAM1 Imperative 2 N/A N/A 2 0 0 2 0 2 0 1 1 2 0 1 1 0 2 0 2 N/A N/A N/A N/A Assertive 88 3 85 3 85 11 77 2 86 0 0 88 3 85 88 0 0 88 4 84 83 5 11 77 Question 10 0 10 0 10 0 10 0 9 1 0 10 2 8 8 2 0 10 1 9 8 2 9 1 Ellipsis 0 N/A N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A B FAM1 Imperative 22 N/A N/A 22 0 21 1 22 0 0 17 5 7 15 0 22 0 22 0 22 N/A N/A N/A N/A Assertive 15 1 14 1 14 7 8 15 0 0 1 14 1 14 0 15 0 15 0 15 8 7 2 13 Question 60 0 60 0 60 33 27 60 0 0 2 58 7 53 3 57 0 60 0 60 55 5 28 32 Ellipsis 3 N/A N/A 0 3 2 1 2 0 1 0 3 0 3 0 3 0 3 0 3 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A C NEFAM2 Imperative 0 N/A N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N/A N/A N/A N/A Assertive 87 39 48 39 48 56 31 0 87 0 0 87 44 43 10 77 40 47 66 21 54 33 1 86 Question 13 0 13 0 13 12 1 0 87 0 0 13 6 7 2 11 6 7 10 3 12 1 11 2 Ellipsis 0 N/A N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A D FAM2 Imperative 40 N/A N/A 40 0 38 2 40 0 0 14 26 13 27 0 40 2 38 0 40 N/A N/A N/A N/A Assertive 8 2 6 2 6 4 4 8 0 0 1 7 2 6 0 8 1 8 0 8 6 2 1 7 Question 46 0 45 0 46 5 41 45 0 1 0 46 2 44 0 46 0 46 0 46 45 1 13 33 Ellipsis 3 N/A N/A 0 3 1 2 1 0 2 0 3 0 3 1 2 1 2 0 3 N/A N/A N/A N/A Nonverbal signal 2 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 1 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A E NEFAM3 Imperative 2 N/A N/A 2 0 2 0 0 2 0 0 2 0 2 1 1 1 1 0 1 N/A N/A N/A N/A Assertive 1 0 1 0 1 1 0 0 1 0 0 1 0 1 0 1 1 0 0 1 1 0 0 1 Question 96 1 95 1 95 96 0 0 95 1 0 96 42 54 63 33 4 92 3 93 71 25 84 12 Ellipsis 0 N/A N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 1 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A F FAM3 Imperative 86 N/A N/A 86 0 80 6 86 0 0 59 27 16 70 0 86 0 86 0 86 N/A N/A N/A N/A Assertive 4 0 4 0 4 1 3 4 0 0 1 3 0 4 0 4 0 4 0 4 2 2 0 4 Question 9 0 9 0 9 5 4 9 0 0 0 9 2 7 0 9 0 9 0 9 8 1 7 2 Ellipsis 1 N/A N/A 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A H NEFAM4 Imperative 19 N/A N/A 19 0 19 0 0 19 0 3 16 11 8 4 15 0 19 14 5 N/A N/A N/A N/A Assertive 55 9 46 9 46 12 43 0 55 0 0 55 10 45 27 28 0 55 35 20 52 3 4 51 Question 12 0 12 0 12 10 2 0 11 1 0 12 3 9 7 5 0 12 5 7 11 1 9 3 Ellipsis 1 N/A N/A 0 1 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 N/A N/A N/A N/A Nonverbal signal 0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 13 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A G FAM4 Imperative 43 N/A N/A 43 0 40 3 1 0 42 27 16 11 32 1 42 0 43 0 43 N/A N/A N/A N/A Assertive 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 Question 12 0 12 0 12 12 0 0 0 12 1 11 6 6 1 11 1 11 0 12 12 0 12 0 Ellipsis 37 N/A N/A 0 37 33 4 1 1 35 22 15 2 35 0 37 0 37 0 37 N/A N/A N/A N/A Nonverbal signal 2 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Avoidance 5 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Directive performative verb in 1st person to determine