168 169 Slovenščina 2.0, 2020 (2) THE ATTITUDE OF DICTIONARY USERS TOWARDS AUTOMATICALLY EXTRACTED COLLOCATION DATA: A USER STUDY E v a P O R I , J a k a Č I B E J , Š p e l a A R H A R H O L D T Faculty of Arts, University of Ljubljana I z t o k K O S E M Faculty of Arts, University of Ljubljana; Jožef Stefan Institute Pori, E., Čibej, J., Kosem, I. and Arhar Holdt, Š. (2020): The attitude of dictionary users towards automatically extracted collocation data: a user study. Slovenščina 2.0, 8(2): 168–201. DOI: https://doi.org/10.4312/slo2.0.2020.2.168-201 The paper is based on a survey conducted within the framework of the basic research project Collocations as a Basis for Language Description: Semantic and Temporal Perspectives (KOLOS; J6-8255). It presents a qualitative analy- sis of a user evaluation of the interface of the Collocations Dictionary of Mod- ern Slovene (CDS). It discusses an alternative perspective—the user's point of view—on problematic aspects of individual dictionary features, which require further lexicographic analysis and discussion. The collocations user study pres- ents a model of the process of user evaluation; its findings are significant pri- marily for determining problems encountered by users. They also serve as a useful basis for methodology improvements in future, comparable lexicograph- ic user studies and analyses. Keywords: collocations dictionary, responsive dictionary, user evaluation, attitude towards errors, dictionary interface 169 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... 1 I N T R O D U C T I O N In the digital world, a dictionary is increasingly becoming a network of dynamic shifts between different language information and resources, as well as a testing ground for various contemporary conceptual lexicographic approaches. The concept of a “responsive dictionary”—a dictionary char- acterised by its capacity to respond to the dynamics of language develop- ment and include the interested language community in the development of language resources in a methodologically transparent manner (Arhar Holdt et al., 2018)—first came to fruition (both in Slovenia and interna- tionally) with the Thesaurus of Modern Slovene.1 The responsive diction- ary was created as a reaction to the language needs and desires of the mod- ern community of users. The innovative characteristics of the Thesaurus, such as open-access, flexibility, and interconnectedness, provided an al- ternative to already established dictionary forms. The unique character of The Collocations Dictionary of Modern Slovene,2 the second example of a responsive language resource and the topic of this paper, introduced a new dynamic in Slovene lexicography: its basic design follows the original concept of a responsive, linear (but not only) lexicographic structuring, bends established lexicographic surfaces and both shifts and transcends traditional lexicographic patterns. In addition to coming up with an alternative dictionary form, modern lexicog- raphy has increasingly recognised the undeniable value of dictionary users. Despite the growing interest of international lexicographers in user studies, in Slovenia the field remains understudied and overlooked. This is why the 1 The Thesaurus of Modern Slovene was published in March 2018 and was compiled automatically. It contains 105,473 headwords and 368,117 synonyms with links to the Gigafida Corpus of Written Standard Slovene; it is freely accessible at: https://viri.cjvt. si/sopomenke; the database is freely accessible at CLARIN.SI under the CC BY-SA 4.0 licence: Krek, Simon; et al., 2018, Thesaurus of Modern Slovene 1.0, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1166. 2 The Collocations Dictionary of Modern Slovene was published in October 2018 and is based on automatically extracted data. It contains 35,989 headwords, 7,717,561 collocations, and 36,736,168 examples from the Gigafida Corpus of Written Standard Slovene; it is freely accessible at: https://viri.cjvt.si/kolokacije; the database is freely accessible at CLARIN.SI under the CC BY-SA 4.0 licence: Kosem, Iztok et al., 2019, Collocations Dictionary of Modern Slovene CSD 1.0, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1250. 170 171 Slovenščina 2.0, 2020 (2) present study examines the role of user reception and contribution to the up- grades and improvements of dictionaries. The idea of a responsive diction- ary recognises the user as an active co-creator of (digital) language resources, as well as a critical evaluator of the features offered. The results of an open discussion between linguists and users represent a useful starting point for further analysis of the design of dictionaries, and, in the present case, of the general role of the collocations dictionary as a responsive dictionary within the field of lexicography. The present study focuses on the users’ attitudes towards automatically ex- tracted collocation data, especially in relation to specific features introduced into lexicography by responsive dictionaries. In their initial phase, responsive dictionaries are automatically compiled and relatively quickly published for public use; alongside linguists, the language community then gradually helps improve and clean the data. The Collocations Dictionary of Modern Slovene was also immediately made available to the public, i.e. in the initial, unpro- cessed stage containing noise or errors. The design of the dictionary interface, however, featured options to eliminate these shortcomings (data evaluation and cleaning), information about the linguistic completeness of the entry, and other similar features (Kosem et al., 2018c). The present study was interest- ed in specific groups of users and their attitudes towards the present state of the dictionary, their opinion on its responsiveness (which includes automatic compilation, gradual upgrades, and user involvement), and their response to particular types of existing errors in the data. The user evaluation is intended to serve as a basis for identifying problematic areas, as well as less problematic areas in need of improvement, and will play a key role in the improvement of the collocations dictionary interface. The paper begins by presenting the method of user evaluation of the Colloca- tions Dictionary of Modern Slovene 1.0. This is followed by an analysis of the three thematic segments of the user evaluation, i.e. the three-part design of the evaluation interview. A representative case (proper nouns) demonstrates user perspective on (non-)problematic features of data and the dictionary in- terface. The conclusion summarizes the key findings of the study and exam- ines the suitability of the applied method as a model for user evaluation in similar lexicographic user studies. 171 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... 2 M E T H O D O L O G Y 2.1 Research Framework In lexicography, user research has a tradition reaching back to the 1960s (e.g. Barnhart, 1962; Householder, 1967), but the research area was firmly estab- lished later in the 1980s and 1990s (e.g. Tomaszczyk, 1979; Hartman, 1987; Atkins, 1998; Nesi, 2000). The emergence of the digital medium in the 2000s offered a vast array of new methodological possibilities (e.g. Bergenholtz and Johnsen, 2013; Müller-Spitzer, 2014; Lew and De Schryver, 2014). More re- cently, existing approaches were also critically evaluated and surpassed (Bo- gaards, 2003; Tarp, 2009; Lew, 2015; Kosem et al., 2018a). Despite growing opportunities for user involvement, Slovene lexicography has been relatively slow in developing an interest in user studies. This is why, as mentioned in previous research (Rozman, 2004; Stabej, 2009; Logar, 2009; Gorjanc, 2017), Slovene lexicography has a glaring lack of data in relation to user habits, needs, capacities, and preferences. Over the past few years, im- portant steps have been taken, such as the development of a user typology (Ar- har Holdt et al., 2016), the research of user needs in relation to selected lan- guage problems (Čibej et al., 2016; Arhar Holdt et. al, 2017), the participation in an international study on user attitudes to general monolingual dictionaries (Kosem et al., 2018a, 2018b), and the development of methodologies for user inclusion and tracking within the framework of a responsive dictionary (Arhar Holdt et al., 2018). The present study contributes to the available array of tried and tested meth- odologies (a comprehensive overview of existing methodologies is provided in Welker, 2013a, 2013b) with the addition of user evaluation based on the guided think-aloud method. Think-aloud protocols have been described by Tarp (2009, p. 287) as: The informants are invited to freely express which reflections and problems they have during the consultation process [while working with a specific dictionary (author’s note)]. These »thoughts« are tape-recorded and subsequently transcribed and written down in protocol form. [...] [This method] gives the researcher an idea of the users' way of working as well as what is happening during the process, what users are looking for, what they think they are looking for, and which problems they face when trying to find and interpret the relevant data. A number of research projects performed with this method have provided valuable 172 173 Slovenščina 2.0, 2020 (2) results, among others Wingate (2002) who did research into the usefulness of various types of definitions in learners' dictionaries, and Thumb (2004) who focused on the users' different look-up strategies and the problems they faced during the process. We used the basic idea of the method, but adapted it to serve the purposes of a straightforward evaluative approach: the participants were presented with the dictionary; while they were using it, an interviewer was actively involved, sug- gesting queries and guiding the “thinking” with a set of prepared questions. Both the audio and the participants’ interaction with the screen were record- ed. However, only the audio was transcribed and analyzed (as the “protocol” itself was guided and thus comparable). 2.1 Research Goals and Sample Structure The primary aim of the study was to determine the participants’ opinion on the advantages and disadvantages of the Collocations Dictionary of Mod- ern Slovene and responsive dictionaries in general, and to find ways of im- proving its user-friendliness. It was our intention to examine whether adult speakers of Slovene – particularly those with linguistic background or keen linguistic sensibility – know how to use, read and interpret the Collocations Dictionary of Modern Slovene, despite the fact that the dictionary featured raw, automatically extracted data. Our focus was on determining the partic- ipants’ attitudes towards: • automatic data compilation and errors; • continuous dictionary upgrades and updates; • possibility of user inclusion or contribution; • innovative interface functions. Following the typology of potential dictionary users (Arhar Holdt et al., 2016), the study included four distinct target groups of participants: translators and proof-readers; teachers of Slovene as a first language; teachers of Slovene as a second or foreign language; and lexicographers. The selected sample cov- ers different scenarios of potential use, which allows the joined feedback on the dictionary to be perceived as more representative. Teachers were included to evaluate the didactic value of the dictionary, primarily its usefulness for teaching vocabulary to students. Translators can benefit significantly from 173 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... knowing what collocations and colligations are typical for a given word, while proofreaders need straightforward normative information to support their decisions. Finally, the group of lexicographers was included to identify wheth- er and how their views differ from the opinions of actual dictionary users, e.g. whether as the creators of the dictionary, they perceive its pros and cons similarly to other groups, and whether they propose similar steps for further development than other groups.3 Table 1: Structure of the participant sample GROUP Affiliated institutions Region Age Professional experience 10 teachers of Slovene as L1 SŠ Ravne na Koroškem II. gimnazija v MB Ekonomska šola (+gimnazija) Ljubljana Ljubljanska Podravska Koroška Gorenjska 30–50 10–30 years 10 teachers of Slovene as L2 / foreign language Centre for Slovene as a Second/Foreign Language (Faculty of Arts, University of Ljubljana) Hungary Czech Republic Štajerska Ljubljanska Primorska 30–50 10–30 years 10 translators / language editors (proofreaders) SLG Celje self-employed independent cultural employee Primorska Dolenjska Savinjska Gorenjska Ljubljanska 30–50 10–30 years 10 lexicographers CJVT UL FDV UL FF UL self-employed Ljubljanska Štajerska 30–50 10–20 years The study included 40 participants. As seen in Table 1, the participants were primarily between 30–50 years of age, with 10–30 years of work experience; they originated from different Slovene regions or—in the case of teachers of Slovene as a second or foreign language—from abroad. The call for par- ticipation was circulated widely through various means of communication 3 Students of Slovene as an L1 and learners of Slovene as an L2 did not participate in this step of the study. We chose to focus on adult professional users to make the best of the time and resources available within the project. Compared to the selected user groups, students are more easily accessible and after the project, the study can be continued to include both them as well as other potentially relevant user groups. 174 175 Slovenščina 2.0, 2020 (2) (such as mailing lists). The participants responded voluntarily, which needs to be taken into account in the interpretation of the results: the sample con- sists of participants who are relatively familiar with innovative, digital, and responsive language and dictionary resources, as they use them in their everyday work. 2.2 Evaluation Interview: Design The evaluation interview was carefully planned and pre-tested on a group of researchers, i.e. linguists and research colleagues assuming the roles of inter- viewees. Our method was selected in order to enable identification of relevant data communicated in various ways by the interviewee, with minimal inter- viewer influence; its aim was to detect problems encountered by the interview- ee while attempting to complete a specific task—working with a dictionary, on particular dictionary entries. To facilitate internal processing and analysis of acquired data, the participants were guaranteed full anonymity and asked for prior written consent for the recording of their screen and voice. The approximately 30-minute long evaluation interview was based on a pre- pared three-part questionnaire (Appendix 1). During the first part of the ses- sion, the participants were asked—while thinking aloud—to click randomly in the dictionary and to query entries of their own choice. In this way, they could familiarize themselves with the Collocations Dictionary and form a first impression. At the same time, they were encouraged to spontaneously express their thoughts, feelings, and emotions and report whether they encountered, sensed or noticed any problems. Attention was primarily focused on the par- ticipant’s capacity to recognize the range of functions and their possible com- binations provided by the Collocations Dictionary (visual information on entry completeness, sense menus, various filters, such as frequency filter (showing only either rare or frequent words), or ordering by alphabetical order; collo- cate clustering, information on collocation relevance, examples of use, links to the Gigafida corpus and other dictionaries, etc.). In this way, we primarily examined attitudes towards functionality, intuitiveness, and user-friendliness of the dictionary. The second segment of the interview involved working with specific head- words; the participants were guided and tested to determine whether they 175 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... recognized the (non-)problematic nature of particular entries. We were inter- ested in their ability to interpret raw data, the amount of problems or errors detected, the nature of these errors, and the levels of distraction posed by the errors. The evaluation included three types of dictionary entries; prior to con- ducting interviews, we created a list of existing data errors for each entry and thus anticipated the participants’ potential observations. a) An example of a non-problematic and lexicographically fully exam- ined entry, albeit highly polysemous and thus collocationally diverse: the noun belina 'whiteness'. b) An example of an entry with only few potentially problematic collo- cates: the noun pivo 'beer'. c) Two examples of more problematic entries, with the difficulties ex- pressed either on the level of collocation structure or headword: the noun 'klop', where most of the collocates are erroneous due to homon- ymy (klóp 'bench', klòp 'tick'); and the verb usesti (se) 'to sit (oneself) down', which appears in inadequate structures due to the absence of the reflexive pronoun se. Table 2: A list of identified errors for the noun headword pivo on the levels of collocates or headwords, syntactic structures and collocations Problem Example in Slovene Translated example Errors on the level of collocates or headwords The collocate was incorrectly lemmatized. plata piva instead of plato piva ‘plate of beer [cans]’ instead of ‘box (lit. plateau) of beer [cans]’ The collocate or headword should be in a specific inflected form (such as plural or comparative). drag od piva instead of dražji od piva ‘[expensive] than beer’ instead of ‘[more expensive] than beer’ The collocation did not include the verb morpheme si/se. nacejati s pivom instead of nacejati se s pivom ‘to guzzle beer’ [missing se morpheme] Errors on the level of syntactic structures The collocate was tagged with an incorrect part-of-speech. pivo pite instead of pivo piti ‘beer of pie’ instead of ‘to drink beer’ The verb collocate should appear in the negative form. piti piva instead of ne piti piva ‘to drink beer’ instead of ‘to not drink beer’ [missing negative particle] 176 177 Slovenščina 2.0, 2020 (2) Problem Example in Slovene Translated example Errors on the level of collocations The collocation is nonsensical as it makes no sense if taken out of context or without additional elements. pivo k ustom instead of dvigniti kozarec piva k ustom ‘beer to the mouth’ instead of ‘[to raise a glass of] beer to the mouth’ The headword appears next to a syntactic structure in the genitive plural or is a plural noun; the collocation makes no sense without an additional, quantitative element. pivo po tolarja instead of pivo po 300 tolarjev ‘beer for tolar’ instead of ‘beer for 300 tolars’ The third and final segment of the interview examined the participant’s opin- ion on the general usefulness of the dictionary, its digital form (continuous upgrades) and their assessment of its look. 2.3 Transcription and Annotation The annotation of interviews with the participants was done on the transcrip- tions of audio recordings, which were completed by four students of linguistics. The transcription followed a set of clear guidelines; one of the key guidelines was that the transcription should not be reduced to summarizing, but should instead record the conversations as faithfully as possible, with linguistic adap- tation and standardization only permissible on the morphological level. The annotation process followed the general thematic structure of the ques- tionnaire (Appendix 1). A set of annotation guidelines was prepared, con- taining a list of available tags, their descriptions, and several examples from the transcriptions. Four annotators were familiarized with the guidelines and assigned 10 transcriptions each. The annotation was made in a local installation of Taguette (Rampin et al., 2019), an open-source online plat- form for collaborative text annotation (Figure 1). Taguette is an example of computer-assisted qualitative data analysis software (CAQDAS), the aim of which is to facilitate a systematic analysis of unstructured or half-structured data, particularly transcriptions of interviews. It enables multiple annota- tors to collaboratively annotate each transcription. Relevant text segments are marked either top-down (i.e. the annotators are presented with a set of tags to use during annotation) or bottom-up (i.e. the annotators mark 177 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... relevant information with their own tags, which can be easily grouped in the end to achieve the final annotation scheme). There are two main advantages of this approach to qualitative data analysis: a) tagging the transcriptions can provide a quantifiable overview of the data (e.g. the frequency of the tags reveals the most frequently discussed topics, issues, and recurring patterns in the analyzed texts); and b) Taguette is designed in a way that allows seg- ments related to a specific feature to be exported to a separate file, essential- ly combining all related segments from different transcriptions into a single document. This allows for a more thorough analysis of a specific issue across all participants or participant groups. Because the interviews in our research were semi-structured and focused on specific features of the Collocations Dictionary of Modern Slovene, we elected to follow a top-down approach and prepared a limited tagset for the annota- tors to use. The higher the frequency of the annotation, the more prevalent or topical the discussed argument in the user group. On the other hand, less frequently annotated topics might indicate that the user either has not noticed a feature or found it less important compared to others. Figure 1: A screenshot of the Taguette annotation platform. 178 179 Slovenščina 2.0, 2020 (2) 2.4 Annotation Results The annotation typology (shown in Table 3, along with the total frequency of each tag) consists of 4 main categories4 with multiple subcategories. The table also presents the general attitude towards a specific feature indicating whether the participating evaluators expressed more arguments pro or con- tra. These labels are discussed in more detail in Section 3. Table 3: Frequency of annotations by thematic blocks of the interview Category Frequ- ency General attitude General features Automatic compilation Segments related to the participants’ opinion on the fact that the dictionary was compiled automatically 27 PRO Dictionary usefulness Segments related to the usefulness of the dictionary 112 PRO Look and design Segments related to the overall look and design of the dictionary 37 PRO Digital form Segments discussing the fact that the dictionary is digital-only 69 PRO Interface Entry phase indicator Segments discussing the phase indicator pyramid symbol in the dictionary 69 PRO Sense indicators Segments discussing the menu that enables the semantic disambiguation of collocates 43 PRO Three dot icon Segments discussing the three-dot icon that leads to the list of all collocations with a specific syntactic structure 32 PRO Filter (frequency) Segments discussing the function that allows the collocates to be filtered by corpus frequency 43 PRO Filter (alphabetical) Segments discussing the function that allows the collocates to be sorted alphabetically 14 PRO Filter (relevance) Segments discussing the function that allows the collocates to be sorted by relevance 4 PRO 4 The fourth category – Participant suggestions – was included in the typology as a catch-all category for any user suggestions that did not fit in any of the other (more finegrained) categories. These segments were also annotated in the transcriptions. 179 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... Colour scale for relevance Segments discussing the fact that collocates are colour-coded by relevance 56 PRO Collocate clusters Segments discussing the function to display automatically generated collocate clusters 39 PRO Links to Gigafida Segments discussing links to the Gigafida corpus of Slovene 39 PRO Other links Segments discussing other links in the dictionary 14 PRO Corpus examples Segments discussing corpus examples included in the dictionary 44 PRO Other resources Segments discussing other resources 12 PRO Navigation menu Segments discussing the navigation menu that allows the user to filter collocation by syntactic structure 82 PRO User votes Segments discussing the option for users to up- or downvote collocations 78 PRO/ CONTRA Noise in dictionary data Errors (definite form of adjectives) Segments discussing the lack of definite forms in adjectival collocations 6 PRO/ CONTRA Errors (homonyms) Segments discussing errors with homonymous headwords 63 CONTRA Errors (proper nouns) Segments discussing proper nouns included in the dictionary 62 PRO/ CONTRA Errors (prepositions) Segments discussing errors with prepositions 5 PRO Errors (comparative form of adjectives) Segments discussing the lack of obligatory comparative forms of adjectives 13 PRO/ CONTRA Errors (reflexive pronoun) Segments discussing the lack of the reflexive pronoun in collocations containing inherently reflexive verbs 61 PRO/ CONTRA Errors (missing collocation element) Segments discussing the lack of additional collocation elements in multi-word collocations 59 PRO/ CONTRA Errors (negative form) Segments discussing the lack of negative forms in collocations that require the presence of a negative particle 17 PRO Errors (other) Segments discussing other errors related to noise found in the dictionary 136 PRO/ CONTRA Participant suggestions Different participant suggestions regarding the potential improvements of the dictionary 215 180 181 Slovenščina 2.0, 2020 (2) 3 D A T A A N A L Y S I S O V E R V I E W The initial overview and analysis of categorized opinions included all the struc- tural and thematic segments covered by the evaluation interview (Appendix 1): examining the intuitiveness of the dictionary interface, the participants' attitudes towards errors and selected general features of the dictionary. All the assessed categories mentioned above were divided into groups according to predominant opinion on their adequacy (the category is marked by PRO) or inadequacy (the category is marked by CONTRA) (Table 3).5 We were inter- ested in determining the areas in which the participants agreed or disagreed. This data is relevant for identifying problematic and less problematic catego- ries, and for further improvements of the dictionary interface. An example of an opinion6 marked by PRO: [1] “Fantastic! In my opinion, digitalization is the only way of coming up with useful dictionaries.” [teacher of Slovene as a second/foreign language, on the digitalization in lexicography] An example of an opinion marked by CONTRA: [2] “I’m put off by mistakes, because I find this slows down my work considerably.” [trans- lator, on automatic noise in dictionary data] 3.1 Evaluating Features of the User Interface The first part of the interview involved the participant exploring the dictionary features in a free and unstructured manner. The aim was to evaluate the intu- itiveness of the user interface, e.g. the entry phase indicator (pyramid icon), the presence or absence of sense indicators (sense menus), the three-dot icon for accessing specific syntactic structures, etc. As shown in Table 3, the participants from all groups described all the se- lected features as positive (PRO): they rated them as excellent, highly useful, 5 For time and resource constraints, we leave the exact distribution of PRO and CONTRA opinions for a future paper on this subject, in which we also intend to analyze the distribution of annotations between users and user groups. 6 In order to facilitate reading, all the participant statements were edited to conform to standards of written language. Where the provided context makes it difficult to discern what the statement (or part of the statement) refers to, an explanation or the concrete referent was added in angular brackets – [ ]. 181 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... functional and intuitively designed dictionary elements. The participants highlighted the clarity of use and the practicality of individual filters, the inclusion of sense indicators, visual indicators of entry completeness, and especially the links to corpus examples, i.e. the use of collocations in actual language use: [3] “These examples, to me, they’re the best thing about this, because I really, really missed them, yes. There’s very few of them in SSKJ [the General Monolingual Dictionary of Stan- dard Slovene], but here you can really… In fact, a single entry gives you a lot of informa- tion. That’s great, you can really find whatever it is that you need—a really useful thing, this.” [teacher of Slovene as a first language, on the relevance of corpus examples] [4] “Straight away, I find this pyramid icon great. But I would have a pyramid, from the outset, where all these lines would be thicker, stronger.” [lexicographer, on the entry phase indicator] [5] “I find this great. This thing where everything is sorted according to meaning... Espe- cially for our foreign learners, so they can limit themselves to this, to this single meaning.” [teacher of Slovene as a second/foreign language, on sense menus in the dictionary] None of the participants expressed arguments against any of the features. However, we have identified a common suggestion (across all participant groups) for improvement relating to the visual upgrade of the pyramid icon, i.e. the icon should be more noticeable and its function clarified. Divergent opinions (PRO/CONTRA) were noted with regards to the possibility of user involvement. All the participants see the option of up- or downvoting the collocations as a useful and welcome feature; proof-readers and translators, however, pointed out that they often lack time for doing so, whereas the teach- ers expressed concern about the feature being used by non-competent users: [6] “I have very mixed feelings about this. If the idea is that this is only intended for more advanced users, then this is a great option. But if I think of showing this to the children in primary school and then they would click away and play a little, I think they could really spoil this situation here.” [teacher of Slovene as a first language, on the dictionary's voting feature] [7] “Yes, I definitely find this great. I often notice these mistakes in a lot of places, and others notice them, too, when I’m reading online news, and I notice things being misspelled. But I can’t be bothered to register only to bring attention to the mistake. I mean, if I could do it, I suppose I would, sometimes. So I think it’s great that this here is made in such a way that the user can immediately point out a mistake.” [teacher of Slovene as a second/foreign language, on the convenience of not having to register to provide user votes] 182 183 Slovenščina 2.0, 2020 (2) 3.2 Evaluating Data Error Distraction The second part of the interview, which focused on examining the participants' attitudes to various types of errors, demonstrated that the participants—judg- ing by their response to test entries and their self-reports on previous, often- times daily dictionary use—mostly do not seem to notice them. In fact, they seemed to first become aware of the errors only during their participation in the user study, after being guided in their work on specific entries (belina, pivo, klop and usesti (se), i.e. after being systematically queried whether they noticed any errors and asked about the extent of their disruption.7 Prompted by the interviewer, the participants evaluated specific types of er- rors, such as the absence of the reflexive pronoun se in the verb headword, errors due to homonymy, the inclusion of proper nouns in the dictionary, etc. As seen in Table 3, the most distracting type of error occurs due to homonymy and was mostly independently detected by the participants. In the headword klop, homonymy results in most of the collocates being wrong (greti klôpa 'to keep a tick warm' – instead of greti klóp 'to keep a bench warm', guliti klôpa 'to wear out a tick'– instead of guliti klóp 'to wear out a bench', sesti v klôpu 'to sit on a tick' – instead of sesti v klopí 'to sit on a bench').8 The participants also had mixed opinions (PRO/CONTRA) on the inclusion of proper nouns in the dictionary. Due to the diversity of opinions on this issue and some very interesting results, we examine the issue in more detail in Section 4. The participants marked all the other shortcomings (i.e. types of errors) with CONTRA, and mostly did not notice them independently during their work with dictionary entries, as mentioned above: 7 It should be noted that the above was not true for the group of lexicographers—unlike the other participants, who encountered such errors for the first time, the lexicographers were well acquainted with the dictionary. Namely, the group of lexicographers included many of the original authors involved in the diverse stages of the building of the collocations dictionary (data processing, user interface design, and other processes of development). 8 Homonymy-related problems can occur because of incorrect morphosyntactic tagging and/or problems in post-processing. One particular issue of corpus data is that lemmas are form-based, so differently-pronounced headwords with the same form will be combined under the same lemma. The problems become particularly noticeable when such a word (as a headword or a collocate) features in the grammatical structure in a case that is not nominative. 183 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... [8] “I don’t know, I wasn’t really distracted... If you hadn’t told me, I wouldn't even have noticed. I think that as soon as I saw it, I somehow already imagined the correct meaning and then got the meanings of the sort I was thinking about.” [teacher of Slovene as a first language, on the] [9] “These are mistakes of the kind where the petty Slovene mind, which would rather cri- ticise than help or praise, could say: there, I knew it, I found a mistake right away.” [tran- slator, on dictionary errors] [10] “Because, for instance, we’ve been using it [the Collocations Dictionary] now [in class], we’ve had a look at quite a number of things, at least those that were in the texts, and we haven’t found a single mistake, not a single problematic thing. So, I think, well, you really have to try hard to find a page where something bothers you. To the point that you find the page useless.” [teacher of Slovene as a second/foreign language, on the scarcity of errors in the Collocations Dictionary] [11] “Because the user knows in advance [to expect mistakes], I don’t think it’s a problem, no. Because then, even someone who is learning Slovene, they know not to trust it blindly. So I think that even in this stage, this phase, this resource is really valuable.” [teacher of Slovene as a second/foreign language, on the usefulness of the Collocations Dictionary] 3.3 Evaluating General Features of the Dictionary In the final part of the interview, the participants evaluated the general fea- tures of the collocations dictionary, such as its automatic compilation, digi- tal-only form, and look/design. As shown in Table 3, all the above features were positively evaluated by all the participant groups. The reasons were mostly unanimous. The partici- pants find the Collocations Dictionary a clear and coherent resource, with relatively clearly recognizable functions; translators and proof-readers see it as an invaluable resource; the teachers consider it an extremely useful one (both for the preparation of didactic exercises and for classroom use, e.g. to check the adequacy of phrases, find expressions typical for newspa- pers, works of fiction, etc.); its strengths are its authenticity, the intercon- nectedness of its language data, and the relative ease of use in comparison to corpora. Its look and the distribution and density of data are clear and user-friendly, whereas its digital-only form, which enables continuous up- grades and updates, is functional, indispensable and a necessary precondi- tion for work in modern times. 184 185 Slovenščina 2.0, 2020 (2) [12] “I believe that these two dictionaries [the Thesaurus of Modern Slovene and The Collo- cations Dictionary of Modern Slovene] are the best thing that has happened to Slovene in the past few years, I really do. And the people are infinitely, truly grateful, for having these resources.” [proof-reader and translator, on their attitude toward responsive dictionaries] [13] “So I really enjoyed it today when we could show this to the foreign learners: 'Here, this is the entire selection [of collocates]. There are some things that are not in accordance with the orthography manual, and a newspaper proof-reader might correct a lot of things, but you encounter all of this in every-day language. Everything you see here is real-life language.' So it’s great that these dictionaries exist and offer so many options. Because this is what fore- igners often experience: 'Well, I heard someone say this on the street, but where can I check if it's OK?” And then, with Fran [Slovene dictionary portal] or, I don’t know, the orthography manual, well, there’s nothing there. For a foreign learner there’s not enough headwords in there. It’s much easier to browse through this than it is directly through corpora. I find this dictionary much more user-friendly than corpora.” [teacher of Slovene as a second/foreign language, on the usefulness of the Collocations Dictionary for foreign learners] [14] “It's nice and user-friendly, because it’s so clean and clear and there’s enough space, the page isn’t crowded. Yes, I like it and those shades of grey aren’t too conspicuous, it’s clear, well, I like it. Here, the titles are nicely listed, so you know what you’re looking for, down here you get the collocations, great. So I find it ... Well, I’d just like to say well done, really, great.” [teacher of Slovene as a second/foreign language, on the user-friendliness of the Collocations Dictionary] [15] “I don't find the fact that it’s in digital-only form a disadvantage at all. It’s an advan- tage, really, because it takes less time to access it and precisely because you can correct it, update it, improve it. Because if this wasn’t the case, then you could wait forever for such a dictionary, and in the meantime expressions go out of use, or maybe not out of use, but new things come along, the language develops and so the dictionary would be left behind.” [translator, on the advantages of a digital-only dictionary form] 3.4 Participants' Improvement Suggestions While evaluating specific interface features, the participants also suggested several improvements on their own initiative. The suggested improvements included adding information on the collocate or collocation frequency, the op- tion to export data, the addition of accents and pronunciation to headwords (especially homonymous headwords). The bulk of suggestions was primarily concerned with the option to click on the headword in order to return to the initial page, the visual upgrade of specific interface elements, such as upgrad- ing the frequency filter with a color scheme or a color code, making the pyra- mid icon more graphically pronounced by enlarging it, using intense colors or stripes, including a short headline, description, etc. 185 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... 4 Q U A L I T A T I V E C A S E A N A L Y S I S: P R O P E R N O U N S In this section, we describe a qualitative analysis of the participants' at- titude towards the inclusion of proper nouns. The Collocations Diction- ary of Modern Slovene 1.0 includes proper nouns as collocates, but not as headwords.9 While the Collocations Dictionary was under development, lexicographic dis- cussions frequently highlighted the problematic nature of proper nouns. Be- cause they refer to a single, specific referent, they are semantically specific and often bring into question the relevance of the dictionary entry. A typical exam- ple of this includes headwords which necessitate a longer sequence enumer- ating collocates of the same type, e.g. geographical proper nouns: prestolnica [Slovenije, Štajerske, Rusije] 'the capital of [Slovenia, Styria, Russia]', bivati v [Sloveniji, Rusiji, Ukrajini] 'to live in [Slovenia, Russia, the Ukraine]', or ad- jectives derived from proper nouns: [slovenski, angleški, nemški, češki] jezik '[Slovene, English, German, Czech] language', etc. Aside from data overload, the inclusion of proper nouns may also lead to difficulties by adding potential- ly recognizable personal names (personal data), trademarks, etc. On the other hand, their complete exclusion may lead to omitting an important segment of vocabulary which, statistically speaking, conforms to collocation criteria (type, frequency, occurrence). The complexity of this issue and its possible solutions were reflected in the results of the participants' evaluation. Most participants supported the inclu- sion of proper nouns in the dictionary (see Table 3). However, all the partici- pant groups identified reasons both for and against the inclusion. This was es- pecially pronounced in the group of lexicographers, where all the participants listed reasons both for and against the inclusion. Table 4 gives an overview of the above discussed opinions within individual groups. 9 However, it should be noted that the Collocations Dictionary does include headwords derived from proper nouns which, in Slovene, begin with lower-case initials (as opposed to many foreign languages in which the opposite is often the case). The dictionary thus contains e.g. adjectives derived from proper nouns, such as slovenski 'Slovene', angleški 'English', nemški 'German', etc. 186 187 Slovenščina 2.0, 2020 (2) Table 4: An overview of participant attitudes (PRO, CONTRA, PRO/CONTRA) towards inclu- sion of proper nouns across individual groups PRO CONTRA PRO/CONTRA Teachers of Slovene as L1 9 0 1 Teachers of Slovene as L2 9 1 0 Translators, proof-readers 6 3 1 Lexicographers 0 0 10 4.1 Attitude of Teachers of Slovene as a First Language The majority of teachers of Slovene as a first language (Table 4) had a positive attitude towards the inclusion of proper nouns, especially for the following reasons: • the students find them more illustrative and concrete; • they pique the interest of students and promote intellectual and cogni- tive processes; • their specificity is attractive and intuitive, which is reflected in in- creased study motivation of the student and, consequently, in a more flexible understanding and adequate language use. While giving a positive evaluation of the inclusion of proper nouns because of their ability to illustrate and convey a more specific example of language use, one of the teachers expressed doubts regarding the benefits of including trademarks (e.g. Laško pivo, a Slovene beer brand) and questioned their con- tribution towards understanding word use. 4.2 Attitude of Teachers of Slovene as a Second/Foreign Language Almost all teachers of Slovene as a second language (Table 4) find the inclu- sion of proper nouns important because they give useful information on the morphological characteristics of a particular part-of-speech category, such as declension patterns or the use of prepositions with proper nouns (a frequent problem for foreign learners, e.g. potovati na [Hrvaško, Kitajsko] 'to travel to [Croatia, China]', but potovati v [Evropo, Azerbajdžan] 'to travel to [Europe, Azerbaijan]'. There was a suggestion to exclude specific types of proper nouns, such as personal names and surnames. 187 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... As seen in Table 4, only one of the teachers was of opposed to proper nouns. The teacher pointed out several proper nouns incorrectly spelled with a low- er-case initial letter (večernji list 'evening newspaper' instead of Večernji list 'Evening Newspaper'; smučati v dolomitih 'to ski in the dolomites' instead of smučati v Dolomitih 'to ski in the Dolomites'), which might cause difficulties for students trying to learn the language. An incorrectly spelled proper noun may mislead a foreign learner who is incapable of recognizing or disambiguat- ing language mistakes; it can provide misleading information on orthogra- phy and the role of particular part-of-speech categories and their inflections in phrases and syntactic structures. The above examples may misinform the learner about the proper form and use of the deadverbial adjective (večernji instead of večerni) or the correct use of the common noun (Dolomiti as the Italian mountain range instead of dolomiti as a mineral). 4.3 Attitude of Proof-Readers and Translators 6 out of 10 participating proof-readers and translators gave reasons in favour of the inclusion of proper nouns (Table 4). Much like the teachers of Slovene as a first language, they recognised the quality of intuitiveness arising from the concreteness of proper nouns: the collocation klop Reala 'the bench of Real [Madrid]' or klop Liverpoola 'the bench of Liverpool' may be more illustrative and meaningful than klop prvoligaša 'first league bench', where the lack of context may make it difficult to determine that this is a football club. On the other hand, a smaller number of proof-readers and translators—3 out of 10—argued against the inclusion, especially in relation to trademarks (e.g. Illy kava 'Illy coffee', Laško pivo 'Laško beer'), since they find this degree of specificity meaningless and unnecessary. Furthermore, one of the participants had a mixed opinion, since they believe that the decision regarding the inclu- sion of proper nouns in the dictionary depends primarily on the type of proper noun and the relevance of the information conveyed by the proper noun. 4.4 Attitude of Lexicographers As already mentioned above, all the participating lexicographers expressed arguments both for and against the inclusion (Table 4), which is to be ex- pected considering the fact that they see the dictionary not only from the 188 189 Slovenščina 2.0, 2020 (2) perspective of the user, but also as content developers and originators of lexicographic concepts. The arguments for the inclusion were related to semantically relevant proper nouns; the participants stressed that not all proper nouns are equally semanti- cally relevant (kranjski Janez 'John Doe' – Janez Novak; delati se Francoza 'lit. to pretend to be a Frenchman, meaning to feign ignorance' – Francoz 'French- man'). Proper nouns were also considered a valuable source of information on the most typical ways of addressing people, with the caveat that the specific per- sonal name in and of itself is not that relevant (dragi Janez 'dear Janez' – dragi + [personal name]); the key information here is the discourse category. The arguments against the inclusion were related to longer sequences of col- locates of the same type, since this type of information is distracting and does not enhance user experience. This is the case for the selected entries klop and pivo, where there is a longer sequence enumerating adjectives de- rived from proper nouns: [češko, belgijsko, angleško, dansko] pivo '[Czech, Belgian, English, Danish] beer' or geographical proper nouns (e.g. names of cities): klop [Celja, Maribora, Kopra, Gorice] 'the bench of [Celje, Maribor, Koper, Gorica]'. 4.5 Participants' Suggestions for Dictionary Improvements The participants suggested two solutions on the topic of inclusion and pres- entation of proper nouns in the dictionary. The proof-readers and translators suggested an introduction of a special but- ton for hiding the proper noun candidates; this would give them the option to choose whether to use it and thus make querying the dictionary more efficient. Their work is related to the specific nature of various text types and vocabu- lary, the variety of topics subject to intense linguistic research, as well as time as one of the key components, which is why this group believes that the dic- tionary should adjust to the needs, wishes, and expectations of its target users as much as possible. Lexicographers proposed a solution of grouping collocates belonging to the same semantic type under a semantic label (e.g. football, hockey, basketball > sport; dog, cat, hamster > (domestic) animal). This would improve the 189 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... visibility of collocational behaviour of the word and ease browsing through (long) lists of collocates. 5 D I S C U S S I O N A N D M E T H O D A S S E S S M E N T The user evaluation of The Collocations Dictionary of Modern Slovene 1.0 identified the participants' attitudes towards its features, which were grouped in three discrete segments in the research interview. The user evaluation was, to a great degree, positive. In the first segment of the interview, the partic- ipants evaluated as positive (i.e. relevant for the dictionary and useful) all the features that they independently recognized. In the guided part of the interview (during which they worked with selected entries), the participants expressed reservations about some (but not necessarily all) data errors, es- pecially mistakes arising as the result of homonymy and ambiguous word in- flections. Opinions also differed with regards to the (non-)inclusion of proper nouns (as seen in Section 4). The third and final segment of the interview asked the participants to evaluate general dictionary features; here, also, their opinion was unanimously positive. The analysis of the participants' attitudes towards errors has demonstrat- ed that even in their initial stage (during which they still contain mistakes), responsive dictionaries represent an invaluable tool—this was a common opinion across all participant groups taking part in the study. In order to un- derstand this degree of positive or permissive attitudes towards data errors, we need to keep in mind that before the publication of the Collocations Dic- tionary of Modern Slovene, collocation data for Slovene had not been readily available. To a great extent, the participants’ enthusiasm is thus a reflection of the newly opened possibilities offered by the dictionary—it is, therefore, safe to conclude that the participants prefer easy accessibility over fully clean data. The evaluation further demonstrated that: a) it is vital that dictionary users are alerted to the presence of errors with the pyramid icon, which indicates the phase of entry completeness; and b) given the presence of context, the possi- bility of accessing examples, and links to the Gigafida corpus, it is possible for the users to resolve any ambiguities. In terms of dictionary shortcomings, special attention should be given to the most “vulnerable” user groups, i.e. teachers of Slovene as a first language and 190 191 Slovenščina 2.0, 2020 (2) teachers of Slovene as a second/foreign language. Teachers bear the responsi- bility of choosing the sources used in the classroom with students who as lan- guage learners are somewhat less qualified to independently identify and re- solve data ambiguities in the manner described above. Didactic use demands precise and unambiguous information, so that the teacher does not lose time by having to correct errors. On the other hand, the teachers themselves found the dictionary to be very useful and of great help, especially as a starting point for exercises, a tool for enriching vocabulary, for checking the correctness and adequacy of phrases; for writing fiction and poetry, for discussing col- locations, using idioms, newspaper language, etc. They were excited by the authenticity of the language, the interconnectedness of different resources, and especially by the possibility to observe language as a natural phenomenon across all segments of its use. What is important is that the study made it clear that many of the charac- teristics that were deemed problematic by linguists are not necessarily prob- lematic for the users—this was seen, for instance, in the discussion of the participants' attitudes towards the inclusion of proper nouns. Contrary to our expectations, the particpants found proper nouns to be interesting and illustrative despite referring to a specific referent. Whereas the lexicogra- phers’ main concern was that the inclusion may result in overcrowding the dictionary (e.g. in cases where the headword is followed by a long, enumer- ating sequence of collocates of the same type), the participants found such concreteness more intuitive. The evaluation identified areas of the dictionary and its interface which the participants find adequate and those that need to be re-examined, improved and further assessed. In this sense, the study achieved its main goal and the selected method proved to be successful. Even though collecting, recording and categorizing evaluation data is extremely time consuming, the transcribed opinions offer insight into problems and solutions that significantly contrib- ute to concepts proposed by dictionary developers. The evaluation study has resulted in a number of positive findings, but also revealed possibilities for improving the methodology in case of further, comparable studies. One of the positive aspects of the study was its multi-stage design (i.e. inter- views – transcription – annotation – analysis): on the one hand, it enabled a 191 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... careful and thorough planning of the entire process of the study; on the other, it increased the time needed to realize individual tasks. The study took place between May and September 2019, with the time span depending on several outside factors: the availability and flexibility of the participants, their will- ingness to co-operate, collaboration with students, and unforeseen technical difficulties. Apart from demonstrating the need to plan for a longer time span, our experience has also shown the following: • in order to secure participation, it is very important to adopt a person- al approach, including personal correspondence, willingness to record sessions in the participants’ place of work, etc.; • collaboration with students demands careful and consistent monitor- ing of their work, including providing clear and understandable guide- lines and a detailed examination of the transcriptions and annotations; • a methodological process reliant on the use of recording software and equipment and the use of a digital dictionary should take into account potential technological difficulties and provide for adequate data backup. 6 C O N C L U S I O N The user evaluation of the Collocations Dictionary of Modern Slovene has proven to be a highly efficient way to detect (non-)problematic dictionary fea- tures and represents a solid foundation for further attempts to improve and upgrade the interface to make it more user-friendly and functional. It pre- sents a model for evaluation and identification of user problems; the gathered results reveal areas for potential methodological improvements and are thus useful for similar lexicographic user studies and analyses. The findings of the study indicate that the methodology of automatic ex- traction of lexical data has indeed reached the levels where such data can be immediately presented to the users, something that has been often claimed by authors such as Kilgarriff et al. (2013) and others. Nonetheless, what the study also shows is that the presentation of such data matters, i.e. features are needed that alert the users to the different stages of data validation and that enable data manipulation/filtering. Part of the reason for this need lies in the 192 193 Slovenščina 2.0, 2020 (2) quantity of automatically extracted data which always exceeds the quantity after human clean up and selection.10 As envisaged when preparing the study, the user feedback obtained will be used in the preparation of the next version of the Collocations Dictionary of Modern Slovene. First and foremost, we need to acknowledge that no radical changes are needed; to some extent, the aspects of data quality and quantity, as well as clarity of presentation, need to be addressed. For example, we plan to introduce additional options to filter collocates, such as an option to hide proper nouns (as opposed to removing them from the dictionary complete- ly), hiding or downgrading semantically less relevant collocates, and viewing a selection of top collocations (or collocate clusters) regardless of their syn- tactic structure. In terms of visual improvements, the pyramid icon will be made more conspicuous. In cases where the distribution of collocations over syntactic structures is uneven, structures with more collocations will receive more space in the display. Moreover, an option for downloading entries will be added. As evidenced by the results of the study, user groups differ in their attitude towards the inclusion of proper names, which makes it difficult to propose universal answers for this issue. Solutions that introduce a choice for the user (as the on/off buttons), seem to be a way to go for such cases. Nonetheless, one feature that seemingly requires a rethink is the option of user participation; to this end, we are already testing other approaches such as gamification, which may help us clean the dictionary data even faster and less obtrusively than existing voting method in the dictionary. And gamification, in combination with improvements to the automatic data extraction method, will make the dictionary even more »responsive«. Acknowledgments The authors acknowledge that the project Collocation as a basis for language description: semantic and temporal perspectives (J6-8255) was financially supported by the Slovenian Research Agency, and acknowledge the finan- cial support from the Slovenian Research Agency (research core funding No. 10 This is also the rationale behind the pyramid icon – wider at the bottom in the initial stages, and narrower at the top when the entry is completed. 193 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... P6-0411, Language Resources and Technologies for Slovene). This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 731015. The research was con- ducted within the framework of the CA160105 eNetCollect COST Action. The authors would also like to thank Bojan Klemenc for his assistance in setting up the local installation of Taguette, all the users of the Collocations Dictionary of Slovene, and the annotators who participated in the transcription/annota- tion campaign: Jan Gajski, Tjaša Jelovšek, Saša Jenko Pahor, Manja Kraševec, Manja Ocepek, Chiara Vianello and Karolina Zgaga. R E F E R E N C E S Arhar Holdt, Š., Kosem, I., & Gantar, P. (2016). Dictionary user typology: the Slovenian case. In T. Margalitadze & G. Meladze (Eds.), Lexicography and linguistic diversity. Proceedings of the XVII EURALEX Internation- al Congress, 6–10 September, 2016 (pp. 179–187). Tbilisi: Ivane Javakh- ishvili Tbilisi State University. Arhar Holdt, Š., Čibej, J., & Zwitter Vitez, A. (2017). Value of language-re- lated questions and comments in digital media for lexicographical user research. International journal of lexicography, 30(3), 285–308. Arhar Holdt, Š., Čibej, J., Dobrovoljc, K., Gantar, P., Gorjanc, V., Klemenc, B., Kosem, I., Krek, S., Laskowski, C., & Robnik Šikonja, M. (2018). Thesau- rus of modern Slovene: by the community for the community. In J. Čibej et al. (Eds.), Lexicography in global contexts. Proceedings of the XVI- II EURALEX International Congress, 17–21 July, 2018, Ljubljana (pp. 401–410). Ljubljana: University Press, Faculty of Arts. Atkins, B. T. S. (Ed.). (1998). Using Dictionaries: Studies of Dictionary Use by Language Learners and Translators. Tübingen: Max Niemeyer Verlag. Barnhart, C. L. (1962). Problems in Editing Commercial Monolingual Diction- aries. International Journal of American Linguistics, 28(2), 161–181. Bergenholtz, H., & Johnsen, M. (2013). User Research in the Field of Electronic Dictionaries: Methods, First Results, Proposals. In R. H. Gouws, U. Heid, W. Schweickard & H. E. Wiegand (Eds.), Dictionaries. An International Encyclopedia of Lexicography: Supplementary Volume: Recent Devel- opments with Focus on Electronic and Computational Lexicography (pp. 194 195 Slovenščina 2.0, 2020 (2) 556–568). Berlin/New York: Walter de Gruyter. Bogaards, P. (2003). Uses and users of dictionaries. In P. van Sterkenburg (Ed.), A practical Guide to Lexicography (pp. 26–33). Amsterdam in Philadelphia: John Benjamins. Čibej, J., Gorjanc, V., & Popič, D. (2016). Analysing translators’ language prob- lems (and solutions) through user-generated content. In T. Margalitadze & G. Meladze (Eds.), Lexicography and linguistic diversity. Proceedings of the XVII EURALEX International Congress, 6–10 September, 2016 (pp. 158–167). Tbilisi: Ivane Javakhishvili Tbilisi State University. Gorjanc, V., Gantar, P., Kosem, I., & Krek, S. (Eds.). (2017). Dictionary of Modern Slovene: problems and solutions. Ljubljana: University of Lju- bljana, Faculty of Arts. Hartman, R. R. K. (1987). Four Perspectives on Dictionary Use: A Critical Re- view of Research Methods. In A. P. Cowie (Ed.), The Dictionary and the Language Learner (pp. 11–28). Tübingen: Niemeyer. Householder, F. W. (1967). Summary Report. In F. W. Householder & S. Saporta (Eds.), Problems in lexicography (pp. 279–282). Bloomington: Indiana University Publications. Kilgarriff, A., Husak, M., & Jakubíček, M. (2013, October). Automatic collo- cation dictionaries. Presented at eLex 2013 conference, Tallinn, Estonia. Retrieved from https://youtu.be/b3KyhPBeoLU Kosem, I., Lew, R., Müller-Spitzer, C., Ribeiro Silveira, M., Wolfer, S. et al. (2018a). The image of the monolingual dictionary across Europe: Results of the European survey of dictionary use and culture. International Jour- nal of Lexicography. doi: 10.1093/ijl/ecy022 Kosem, I., Wolfer, S., Lew, R., & Müller-Spitzer, C. (2018b). Attitudes of Slo- venian language users towards general monolingual dictionaries: an in- ternational perspective. Slovenščina 2.0: empirical, applied and interdis- ciplinary research 6(1), 90–134. Ljubljana: University Press, Faculty of Arts. Retrieved from https://revije.ff.uni-lj.si/slovenscina2/article/view/8142/8467 Kosem, I., Krek, S., Gantar, P., Arhar Holdt, Š., Čibej, J., & Laskowski, C. (2018c). Collocations dictionary of modern Slovene. In J. Čibej et al. (Eds.), Proceed- ings of the XVIII EURALEX International Congress, 17–21 July, 2018, Lju- bljana (pp. 989–997). Ljubljana: University Press, Faculty of Arts. Retrieved 195 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... from https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/118/211/3000-1 Kosem, I. et al. (2019). Collocations Dictionary of Modern Slovene KSSS 1.0. Slovenian language resource repository CLARIN.SI. Retrieved from http:// hdl.handle.net/11356/1250 Lew, R., & De Schryver, G. M. (2014). Dictionary Users in the Digital Revolu- tion. International Journal of Lexicography, 27(4), 341–359. Lew, R. (2015). Research into the Use of Online Dictionaries. International Journal of Lexicography, 28(2), 232–253. Logar, N. (2009). Slovenski splošni in terminološki slovarji: za koga? In M. Stabej (Ed.), Infrastruktura slovenščine in slovenistike. Obdobja 28 (pp. 225–231). Ljubljana: Znanstvena založba Filozofske fakultete. Müller-Spitzer, C. (Ed). (2014). Using Online Dictionaries. Proceedings of the XVIII EURALEX international congress. Berlin, Boston: De Gruyter Mouton. Nesi, H. (2000). The Use and Abuse of EFL Dictionaries. Tübingen: Max Nie- meyer Verlag. Rampin, R., Steeves, V., & DeMott, S. (2019). Taguette (Version 0.8). Zenodo. doi: 10.5281/zenodo.3246958 Rozman, T. (2004). Upoštevanje ciljnih uporabnikov pri izdelavi enojezičnega slovarja za tujce. Jezik in slovstvo, 49(3–4), 63–75. Stabej, M. (2009). Slovarji in govorci: kot pes in mačka? Jezik in slovstvo, 54(3–4), 115–138. Tarp, S. (2009). Reflections on Lexicographical User Research. Lexikos, 19(1), 275–296. Thumb, J. (2004). Dictionary Look-up Strategies and the Bilingualised Learner's Dictionary. Lexico-graphica (Series Maior 117). Tübingen: Max Niemeyer. Tomaszczyk, J. (1979). Dictionaries: Users and Uses. Glottodidactica 12, 103–119. Welker, H. A. (2013a). Methods in Research of Dictionary Use. In R. H. Gou- ws, U. Heid, W. Schweickard & H. E. Wiegand (Eds.), Dictionaries. An International Encyclopedia of Lexicography: Supplementary Volume: Recent Developments with Focus on Electronic and Computational Lexicography (pp. 540–547). Berlin, New York: Walter de Gruyter. 196 197 Slovenščina 2.0, 2020 (2) Welker, H. A. (2013b). Empirical Research into Dictionary Use since 1990. In R. H. Gouws, U. Heid, W. Schweickard & H. E. Wiegand (Eds.), Diction- aries. An International Encyclopedia of Lexicography: Supplementary Volume: Recent Developments with Focus on Electronic and Computa- tional Lexicography (pp. 531–540). Berlin, New York: Walter de Gruyter. Wingate, U. (2002). The Effectiveness of Different Learners Dictionaries: An Investigation into the Use of Dictionaries for Reading Comprehension by Intermediate Learners of German. Lexicographica (Series Maior 112). Tübingen: Max Niemeyer. 197 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... ODNOS UPORABNIKOV DO AVTOMATSKO PRIDOBLJENIH KOLOKACIJSKIH PODATKOV: UPORABNIŠKA RAZISKAVA Prispevek izhaja iz uporabniške raziskave, izvedene v okviru temeljnega ra- ziskovalnega projekta Kolokacije kot temelj jezikovnega opisa: semantični in časovni vidiki (KOLOS; J6-8255). Prikaže analizo uporabniške evalvacije vmesnika Kolokacijskega slovarja sodobne slovenščine (KSSS). Z nekoliko drugačnega gledišča – skozi uporabniški aspekt pokaže, kje in katera so prob- lematična mesta posamezne slovarske kategorije, ki so potrebna nadaljnje leksikografske obravnave in diskusije. Kolokacijska uporabniška študija pred- stavlja model procesa uporabniškega evalviranja, ugotovitve, ki jih prinaša, pa bodo predvsem relevantne za detekcijo uporabniških problemov, pa tudi za iz- boljšavo metodologije, kar bo predvsem koristno za primerljive leksikografske uporabniške raziskave in analize. Keywords: kolokacijski slovar, odzivni slovar, uporabniška evalvacija, odnos do na- pak, slovarski vmesnik To delo je ponujeno pod licenco Creative Commons: Priznanje avtorstva-Deljenje pod enakimi pogoji 4.0 Mednarodna. / This work is licensed under the Creative Commons Attribution-Share- Alike 4.0 International. https://creativecommons.org/licenses/by-sa/4.0/ 198 199 Slovenščina 2.0, 2020 (2) A P P E N D I X 1: E V A L U A T I O N Q U E S T I O N N A I R E First segment: Free use of the dictionary During the first interview segment, the participants are asked to browse the dictionary freely while thinking aloud. This allows them to form the first im- pression and get the general sense of the dictionary. Second segment: Guided work with dictionary headwords In the second part of the interview, the participants are guided by the inter- viewer to click on a number of headwords that were pre-selected according to a carefully designed set of criteria. The participant is thus familiarized with the various functions offered by the resource. The participant is presented with the following headwords: belina 'whiteness' – a non-problematic entry that has already been finalized by lexicographers How do you find this headword? Is it in any way problematic? Do you notice any errors? Can you identify the various functions available (e.g. the entry phase indicator, sense menus, collocate clusters), the possibility of using various filters, the option to contribute to the di- ctionary by rating collocations? pivo 'beer' – an entry with potentially problematic collocates Do you notice that the noun/adjective (collocate or headword) is not in the expected inflected form? Does this motivate you to refer to the corpus examples provided? Are you bothered by this type of errors (semantic nonsense)? A selection of the identified errors (on the levels of collocate/headword, collo- cation structure or collocation): o The collocate is incorrectly lemmatized: plata piva 'plate of beer' instead of plato piva 'box of beer [cans], lit. plateau of beer cans' o The collocate/headword should appear in a specific inflected form (e.g. comparative, plural): drag od piva 'expensive than beer' instead of dražji od piva '[more expensive] than beer' o The headword appears next to a collocate tagged with wrong part-of-speech: pivo pite 'beer of pie' instead of pivo piti 'to drink beer' o The verb collocate of the noun headword does not appear in the negative form (as requ- ired by the genitive case of the headwod): piti piva 'to drink beer' instead of ne piti piva 'to not drink beer' 199 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... o The collocation makes no sense out of context or without additional elements: pivo k ustom 'beer to the mouth' as in dvigniti kozarec piva k ustom 'to raise [a glass of] beer to the mouth' o The headword is either a plural noun or appears next to a syntactic structure in the ge- nitive plural; as such, the collocation makes no sense without an additional, quantitative element: pivo po tolarja 'beer for tolar' instead of pivo po 300 tolarjev 'beer for 300 tolars' klop 'bench' or 'tick' – a homonym that has not been disambiguated in the dictionary Do you find anything about the entry distracting? Did you identify the word as a homonym (words having the same spelling but different meanings)? Do you find the ambiguity distracting? Are you distracted by proper nouns as collocates? Do you find that there are too many errors? usesti (se) 'to sit (oneself) down' – an inherently reflexive verb which is missing the obligatory se pronoun in the dictionary [The participant first enters the word into the search window; the interviewer observes their reaction and then continues with the questions.] Did you notice the absence of the se pronoun? (or Does the lack of reflexivity (usesti se) bother you? Do you find that there are too many errors? Third segment: General dictionary features Automatic compilation [The questions are meaningfully incorporated into the discussion about spe- cific headwords.] In its initial stage, this resource is compiled completely automatically. This is why, as you may have noticed, it also includes information that should not be here. Do you feel there is too much noise or that there are too many errors? Do you find this distracting? Why (not)? This resource enables dictionary entry tracking and provides information on the phase of entry completeness, generated by clicking on the pyramid icon. Did you notice this? How do you find this? This resource was compiled automatically and as such was made freely and openly accessible as soon as it was compiled. Do you prefer free and open re- sources with raw data or payable sources with clean data? 200 201 Slovenščina 2.0, 2020 (2) This new form of language resource allows for continuous upgrades and up- dates; the development team can include new collocations and headwords, the users can vote on collocation candidates, etc. Do you prefer static, unchange- able resources, or are there any advantages to a dictionary that can change over time? Changes also mean that the dictionary is never fully complete and is continu- ously developing. How do you feel about that? User inclusion [Questions are meaningfully incorporated into the discussion about specific headwords.] Did you notice it was possible to contribute to the dictionary as a user (i.e. up- or downvote collocates/collocations)? Do you find user involvement positive or negative? Once the user up- or downvotes a collocation, their rating immediately ap- pears on the page. How do you feel about this? Do you find the resource stimulating enough to contribute to it yourself? Would you provide your votes in the dictionary? Why (not)? What would motivate you to contribute to the compilation of the dictionary? What would additionally motivate you to do so? Do you have any reservations about user inclusion? [The participant is giv- en the space to respond first; they are then asked to discuss whether they see user inclusion as shifting the burden of responsibility onto the users by means of crowdsourcing; whether this constitutes taking advantage of the user; whether they are concerned about the potential lack of experience or professionalism in users; whether user judgement may in fact improve the quality of the dictionary, etc.] Digital-only form This resource has no printed version. Is that a problem or do you find its dig- ital-only form an advantage? 201 E. PORI, J. ČIBEJ, I. KOSEM, Š. ARHAR HOLD: The attitude of dictionary users... Interface Interface problems [The interviewer asks specific questions] Do you find the dictionary useful? What do you like most about it? What are the main reasons you wouldn’t use this dictionary?