Domen Krvina ORCID: 0000-0002-2276-1156 The Growing Dictionary of the Slovenian Language (2014-) and Slovenian Neologisms: Study on Types of Data and Their Use Slovenski jezik / Slovene Linguistic Studies 14/2022. 117–151. DOI: https://doi.org/10.3986/sjsls.14.1.05 ISSN tiskane izdaje: 1408-2616, ISSN spletne izdaje: 1581-127 https://ojs.zrc-sazu.si/sjsls Slovenski jezik – Slovene Linguistic Studies 14 (2022): 117–151 @language: sl, en, de, it, fr @trans-language: sl, en, de, it, fr @publisher.id: id @doi: 10.3986/00.0.00 @article-type: 0.00 @article-category: category @pages: 117–151 @history-received: dd. mm. yyyy @history-accepted: dd. mm. yyyy * * * Ž u r n a l m e t a * * * @issue: xx @volume: 14 @pub-year: 2022 @pub-date: dd. mm. yyyy * * * O p r e m a * * * @avtorji: Domen Krvina @running-header: The Growing Dictionary of the Slovenian languange Domen Krvina (ORCID: 0000-0002-2276-1156) ZRC SAZU, Inštitut za slovenski jezik Frana Ramovša, Slovenija DOI: https://doi.org/10.3986/sjsls.14.1.05 The GrowinG DicTionary of The Slovenian lanGuaGe (2014-) and Slovenian neologiSmS: Study on typeS of data and their uSe The article aims at presenting the methods of detection of Slovenian neologisms, used in the making of the Growing Dictionary of the Slovenian Language, accessible at the Fran portal , which integrates various dictionaries into a single whole, form 2014 onwards. In the first year of compiling and for the following few years, the main source of the candidates was corpus Gigafida 1.0, built in 2013. Due to the corpus not being updated regularly (and unavailability of other appropriate sources), users’ suggestions have taken over the main role. Users submit suggestions directly on the Fran portal. The corpus Gigafida and other (Janes, SlWaC) are still used for checking users’ suggestions. Due to a high number of such suggestions and a growing demand for new lexical descriptions, their importance cannot be overlooked. The neologisms collected in the dictionary exhibit a number of characteristics, a brief overview of which is provided at the end of the study. Keywords: Neologisms, Slovene, Growing Dictionary of the Slovenian Language, Data Detection, Corpora, Users’ Propositions, Overview of Neologisms’ Characteristics Prispevek predstavlja metode zaznavanja slovenskih neologizmov, uporabljene pri izdelavi Sprotnega slovarja slovenskega jezika, ki je od leta 2014 dostopen slovarskem portalu Fran . Ta združuje različne slovarje v eno celoto. V prvem letu nastajanja slovarja in nekaj naslednjih je bil glavni vir kandidatov za neologizme korpus Gigafida (zaključen leta 2013). Ker se ni redno posodabljal, drugi primerni viri pa tudi niso bili na voljo, so glavno vlogo prevzeli predlogi uporabnikov. Ti lahko svoje predloge oddajajo neposredno na portalu Fran. Korpusi Gigafida in drugi (Janes, SlWaC) ohranjajo vlogo gradiva za preverjanje 118 Slovenski jezik – Slovene Linguistic Studies 14 (2022) uporabniških predlogov. Zaradi velikega števila tovrstnih predlogov in velikega povpraševanja po novih leksikalnih opisih njihovega pomena ne le da ni mogoče zanemariti –postali so temelj opisa novejšega besedja. Kratek pregled njegovih temeljnih značilnosti je podan na koncu prispevka. Ključne besede: novejše besedje, slovenščina, Sprotni slovar slovenskega jezika, gradivna zaznava, korpusi, predlogi uporabnikov, pregled značilnosti novejšega besedja 1 bacKground: transformation of slovenian lexicography, the portal fran and the rise of new type of dictionary in 2014 Neologisms constantly appear in language: they reflect developments in lifestyles, environment, perceptions of the world (ten Hacken 2020). In Slovene, the new lexis for the period 1991-2009 was comprehensively treated in the monograph Novejša slovenska leksika (v povezavi s spletnimi jezikovnimi viri) (Gložančev et al. 2009), mainly from a lexicological point of view, and lexicographically in the Dictionary of New Slovenian Words (2012). The neologisms presented in the dictionary spanned from 1991 to 2012 as the wordlist was compiled using the Nova beseda corpus in relation to the wordlist of the only (systematically compiled by a team of authors adhering to unified principles) monolingual general explanatory dictionary at the time – SSKJ: Dictionary of the Slovenian Standard Language (1970–1991). In the following years Slovenian lexicography, after what could be called a preparatory decade, experienced some major shifts in its course, not unlike those that took place in English lexicography at the time of the COBUILD project (Sinclair et al. 1987), more than a decade before. Firstly, the corpus Gigafida 1.0, the first Slovene reference corpora to be fully equipped with formal POS tagging and at the same time accessible to the general public, built within the project Sporazumevanje v slovenskem jeziku , was compiled in 2013. Secondly, that same year, three authors published a dictionary conceptualization plan proposing to compile a new, mainly corpus-driven explanatory dictionary, planned in different phases: from the first, computer-driven phase, whose Domen Krvina, The Growing Dictionary of the Slovenian languange 119 results would be only partially revised and would be available immediately, to the final phase with fully revised entries on various levels that are marked as completed (Krek et al. 2013). Thirdly, the first edition of SSKJ was updated and partially revised into SSKJ2: Dictionary of the Slovenian Standard Language, 2nd Edition using the data from the corpus Gigafida 1.0. These events set the stage for the following developments in the late 2014 and early 2015: 1. the emergence of the dictionary portal Fran at the ZRC SAZU, Fran Ramovš Institute of the Slovenian Language; 2. the creation of the Growing Dictionary of the Slovenian Language and the publication of the first-year batch of entries; 3. the making of dictionary conceptualization plan for a completely new, corpus-based dictionary eSSKJ: Dictionary of the Slovenian Standard Language, 3rd Edition, which saw the publication of its first entries in 2016. The main role of the portal Fran in 2014 was to bring together existing dictionaries and integrate them into a user-friendly and user-responsive website – by ensuring their transition into e-form by linking the data from various sources that are searchable through a single search engine (and results displayed from all the different sources all at once). The portal supports user-responsive interface. It enables general and highly advanced, targeted searches. Even when a dictionary is singled out by the user, the search is always performed against the entire background database – these results are shown separately from the main search in the navigation panel; see figure 1 (Ahačič et al. 2015, Perdih 2018, 2020). The other important function of the portal was to serve as a platform on which completed batches of entries in new type of e-dictionaries could be published regularly, alongside with some (minor) changes to those new dictionaries on the level of microstructure, if necessary. These new-type dictionaries would be called rastoči slovarji (‘growing’ dictionaries). 120 Slovenski jezik – Slovene Linguistic Studies 14 (2022) figure 1: Portal Fran (Growing Dictionary of the Slovenian Language) In October 2015, the portal adopted a policy of encouraging users to suggest ‘missing’ words and meanings as well as equivalents of loanwords as candidates for lexical description (figures 2 and 3). Especially in the case of Slovenian equivalents of loanwords, Slovenian word-formation strategies (such as sup: stojeska a board for ‘standing paddling’, plovček ‘sailing’, ‘rowing’) would play a pivotal role. First seen as a part of user inclusion policy, this type of encouragement quickly turned out to be an extremely important source for propositions of neologisms, stemming directly from users’ observations and answering their demand. These could be called ‘neologisms from the users’ point of view’. Domen Krvina, The Growing Dictionary of the Slovenian languange 121 figure 2: Portal Fran: suggesting new (‘missing’) words figure 3: Portal Fran: suggesting (and voting for) equivalents of loanwords 2 The GrowinG Dictionary of the Slovenian lanGuaGe, collecting potential neologisms and sTaTe-of-arT of their sources The Growing Dictionary of the Slovenian Language, which is the central point of our study, was the first one of a new type of dictionaries in the portal Fran – hence its name. Designed from the beginning as a 122 Slovenski jezik – Slovene Linguistic Studies 14 (2022) web dictionary, the Growing Dictionary of the Slovenian Language was one of the first to make good use of the adaptable environment of the portal Fran. As it was created literally at users request (and catering to their needs), editors decided that all the data should be presented as transparently and user-friendly as possible: no abbreviations (commonly used in linguistics and easily recognizable for linguists, but not necessarily for most other dictionary users) were to be used, hints to the dictionary content and structure were to be given in small grey frames on the right, the full list of all the word forms would be accessible by a simple click (figure 4). figure 4: Growing Dictionary of the Slovenian Language: interface layout The experience accumulated in the first two years of compiling the Growing Dictionary of the Slovenian Language was positive. This made the decision for the subsequent ‘growing’ dictionaries (ePravopis: Slovenian Normative Guide (2014–), eSSKJ: Dictionary of the Slovenian Standard Language, 3rd Edition (2016–) and NESSJ – New Etymological Dictionary of Slovenian Language (2017–)) to follow the same direction easier. It should be noted, however, that due to being the first of Domen Krvina, The Growing Dictionary of the Slovenian languange 123 ‘growing’ kind, the Growing Dictionary of the Slovenian Language started more or less as the ‘dictionary on the fly’: apart from some basic principles of compilation (see below), most of its compiling criteria, especially in first years of compiling, would be dynamic rather than static. That was also linked to the circumstances regarding the availability of appropriate (corpora) resources and their (scarce or missing) updates. Therefore, dictionary compilation itself (as well as its assessment this paper is aiming at) could be seen as a certain experiment, particularly in the years prior to 2018-2019. The dynamic nature also applies to the dictionary’s definition of ‘neologism’, which has been inclusive rather than exclusive, but more or less based on the three complementary approaches (see chapter 2.2) in dynamically changing proportions. 2.1 The Growing dictionary of the slovenian Language: main features and source limitations The intention of the Growing Dictionary of the Slovenian Language was to continue the course of detecting and describing neologisms the Dictionary of New Slovenian Words had started. The latter had defined a neologism in a somewhat straightforward way: the words (if the word already existed, also meanings – but this was rarer) not present in the SSKJ: Dictionary of the Slovenian Standard Language (1970–1991), but appearing in one of the first Slovene corpora Nova beseda, would qualify as candidates for dictionary description. Their frequency was of lesser importance, though given the scope of the corpus Nova beseda, it would be rather low in most cases. 6,000 such neologisms were described as dictionary entries (some contained several multi-world units), published in 2012. For the present Slovenian state-of-art, it is important to note that there are no corpora of new Slovenian texts that are regularly updated. In late 2021, a project SLED (Spremljevalni korpus in spremljajoči podatkovni viri – SLED (ijs.si)), aimed at tracking neologisms, was announced – including a specialised corpus. However, its first version will not be available until late 2022. There are other specialised corpora of social media texts (Twitter, forums, blogs), such as Janes, built within 124 Slovenski jezik – Slovene Linguistic Studies 14 (2022) the same-name project in 2014-2018 (; cf. Fišer et al. 2018), and corpora of web texts, such as SlWaC, built in 2011, and updated in 2014 (v. 2.1) using the web crawler SpiderLing (Erjavec, Ljubešić 2014). The main reference corpus Gigafida, published in 2013, saw a modest update of texts up to 2018 in 2019 (Gigafida 2.0). The past and especially present state-of-art, therefore, presented and still presents a substantial (but not insurmountable) obstacle to obtaining a completely corpus-driven candidate list of neologisms – which would contribute to its objectiveness. As the goal of the Growing Dictionary of the Slovenian Language was to detect and analyse potential neologisms, it would make use of any appropriate resources at hand. At first, the Gigafida 1.0 corpus seemed sufficient (see below), but with no basic research of the newest lexis after 2013, its role could not be properly evaluated – at least not in a way it would remain the sole (major) source. With the number of users’ propositions growing, the focus shifted to them, while Gigafida (and other corpora, as they became available) retained the role of sources used for checking such propositions. Due to the scarce (or non-existent) corpora updates, the – ever changing and expanding – web content came to the fore. With no widespread and readily available crawling tools for Slovenian (the one used in SlWac was the same as used for Czech), the dictionary would also not try to develop its own; partly because it would be time-consuming for a rather small dictionary outside the frames of general analysis of new lexis after 2013. Therefore, the option yet to be explored is a (semi-) automatic way of detecting neologisms in a process of comparing the content of all the available corpora against the expanding web content for the words not present in the corpora. When decision was made in 2014 to start compiling the Growing Dictionary of the Slovenian Language, the first version (1.0) of the corpus Gigafida (2013) was the largest at hand and still relatively new. Therefore, it seemed feasible to retain the definition of neologism from the Dictionary of New Slovenian Words: words (or, rarer, meanings) not present in the latter nor in the recently updated and partially revised SSKJ2: Dictionary of the Slovenian Standard Language, 2nd Edition but Domen Krvina, The Growing Dictionary of the Slovenian languange 125 appearing in Gigafida 1.0 would qualify as candidates for dictionary description. Taking into account the scope of a billion-word corpus Gigafida 1.0, additional limitations regarding the frequency and time of appearance were introduced: the frequency of corpus lemma should be below 1,000 (and above 500), the peak of occurrences in years 2009-2012 – the last three years covered in the corpus Gigafida 1.0. Thus, an additional frequency-time dimension (Slána 2017: 41) that corpus analysis allows for was provided – these could be called ‘neologisms from the temporal point of view’. This procedure yielded some 500 candidates, out of which 224 (the majority of them with corpus frequency 700–500)1 were chosen and then further processed all the way to the final dictionary entries. Among various thematic fields some stood out in particular – and would mostly continue to do so in the following years (cf. also Slána 2017: 42–43):2 a. computing and technology: android, driftati ‘drive a car drifting’, inoks ‘stainless steel’, karbon ‘carbon used in bike frames’, kevlar ‘Kevlar’, multifunkcijski ‘multifunctional’, replikacija ‘replication’, večigralski ‘multi-player’, vtičnik ‘plugin’; b. finances and economics: depozitarni ‘depository’, fiskalno ‘fiscally’, konsolidacija ‘consolidation’, prociklični ‘procyclic’, refinancirati ‘refinance’, volatilnost ‘volatility’; c. medicine: artroskopija ‘arthroscopy’, epiduralni ‘epidural’, fibromialgija ‘fibromyalgia’, kandidiaza ‘candidiasis’, mirkocirkulacija ‘microcirculation’, obstruktiven ‘obstructive’, paradontalni ‘parodontal’;3 1 For further inclusion criteria see the chapter 2.3. 2 Be aware that words listed above would qualify as neologisms in 2014, which may not be the case anymore. They will be probably sooner or later described also in general explanatory dictionaries, such as eSSKJ. 3 In the fields of economics and especially medicine there is often a great deal of English-Slovene parallels both in form and meaning. For Russian-English comparison, see (Peredrienko and Istomina 2019). 126 Slovenski jezik – Slovene Linguistic Studies 14 (2022) d. (healthy) food, leisure and lifestyle: falafel, gamber ‘prawn’, goji, makadamija ‘macadamia’, tahini; glamping, selfi ‘selfie’, selfness, skike, sup ‘SUP’, trimaran. Some users, accustomed to the Dictionary of the Slovenian Standard Language, which was both descriptive and normative, would still expect a dictionary to mark certain words for their ‘foreign origin’ – in the Dictionary of New Slovenian Words this was done in cases the word retained the original written form from the donor language by applying the label cit. (lit. ‘cited form’). Since the Growing Dictionary of the Slovenian Language was intended not to shy away from collecting many such words, it would not continue that tradition. The labels were to be used sparingly and ‘loanword’ would not automatically translate to ‘colloquial’, as this was often the case in earlier dictionaries, particularly in loanwords from German (the process of labelling was not straightforward; the fact of being borrowed, especially from German, would quite commonly point to a non-formal language layer, however). 4 Descriptiveness was the main goal and after two years users would embrace that fact – at least judging by their propositions, submitted (mainly) at the portal Fran. In 2015, the total number of final dictionary entries was much lower (224 > 94),5 although with some prominent additions, such as loanwords bitcoin, karite ‘shea tree, butter’, overland, vloger, vlogerka ‘woman vlogger’ etc.6 This was mostly due to the fact that the initial supply of corpus candidates had been partially exhausted (note that until the modest above- mentioned update in 2019, the corpus remained virtually unchanged). Some uncertainties arose about how the potential neologisms with fewer than 500 occurrences should be treated: is this frequency still 4 For further information on neologisms and purism in other European languages see (ten Hacken and Koliopoulou 2020), (Klosa-Kückelhaus and Wolfer 2020), (Marello 2020), (Panocová 2020). 5 Partially also due to the decision taken at the Fran Ramovš Institut of the Slovenian Language to expand the smaller-scope ‘growing’ dictionaries by approximately 100 entries/units per year. 6 The formation of feminine forms usually follows their neutral (grammatically ‘male’) counterparts rather quickly. For English-Slovene comparison and general information on gender of English loanwords in Slovene see (Stopar and Ilc 2019), (Sicherl 2019). Domen Krvina, The Growing Dictionary of the Slovenian languange 127 relevant in a corpus exceeding one billion tokens or not (provided the peak of occurrences occurs in final years still covered in corpus)? As it would turn out later when checking users’ propositions, this frequency not only suffices – it is rather high: as the time passes, many potential neologisms may not be present in (non-updated) corpora at all. In 2015, the inflow of users’ propositions was only gaining momentum to increase considerably in the following years and maintain the position of one of the most important methods of detecting neologisms. 2.2 The Growing dictionary of the slovenian Language: complementary approaches To collecting neologisms As pointed out above, three main approaches have been developed and used complementarily, according to and in reaction to the available sources, in the Growing Dictionary of the Slovenian Language to collect potential neologisms: a. Straightforward data comparison approach: the words (or meanings) not present in latest editions of explanatory dictionaries (if available, especially those of new words) but present in the latest version of corpora are very likely neologisms. This approach was used in the Dictionary of New Slovenian Words and retained (especially for the first two-three years) in the Growing Dictionary of the Slovenian Language. b. Temporal corpus analysis approach: the words with the peak of occurrences in the last years (data noise excluded) in each subsequent version of the corpus are potential neologisms for the time period covered in the corpus. c. Neologisms from the users’ point of view: words felt as ‘new’ by users themselves – according to their daily language use and observations.7 Perhaps the most subjective of the three, but the subjectiveness is somewhat mitigated by the sheer number of 7 Direct interaction with users via collecting and answering their questions concerning mainly everyday (and often not completely expected/systemic) language use is also the mainstay of Fran Ramovš Institute of the Slovenian Language Language Counselling. 128 Slovenski jezik – Slovene Linguistic Studies 14 (2022) such propositions coming from various users interested in various thematic fields. The Growing Dictionary of the Slovenian Language first combined the approaches described above in the points a) and b). It found itself at a certain crossroads in the year 2015 – after the publication of the first yearly batch of entries. The upper half of words not present in available dictionaries but present in the corpus Gigafida 1.0 with frequency 1,000-500 and the peak of occurrences in the years 2009- 2012 had been exhausted. Given the fact that the corpus Gigafida 1.0 had not received any update since 2013, 2015 was absolutely the last year in which 2009-2012 as a peak of occurrences seemed convincing enough for the temporal criteria (point b above) to be still applicable. Their typical (extremes at both ends are not taken into account) frequency plummeted from over 500 to 300. Fewer than 100 such words were processed all the way to the final dictionary entries – and it would be the last time corpus-only candidates made the vast majority of the final entries; see the line ‘GF 1.0 (n/~ 500 initial)’ in figure 5. It became clear that new ways of collecting potential neologisms were to be actively sought out. As mentioned, encouraging users to suggest ‘missing’ words and meanings (and equivalents of loanwords) as candidates for lexical description was first seen as a part of user inclusion policy – at the time no one could predict what an important source of collecting potential neologisms it would become. It should be noted that faced with the entire portal Fran content – from present-day to historical as well as terminological dictionaries in a unified electronic form – users had a powerful tool to compare entries which could serve as a kind of checkpoint: anything felt as ‘new’, but already described in one of the dictionaries or other manuals at portal Fran, would not qualify as such. Anything non-present anywhere at the portal Fran, however, identified as new – and, as it could eventually turn out, not present even in the latest (2.0), let alone the first (1.0) version of the corpus Gigafida – would have a high qualification as a potential ‘new word’ (neologism). Domen Krvina, The Growing Dictionary of the Slovenian languange 129 In 2015, however, user’s propositions were few (7 were submitted)8 and available only late in the year, and a number of other sources were selected in search of potential neologisms: 1. regular mail, telephone – usually alongside a linguistic question, answered by one of the researchers at the Institute; 2. the formalised way of answering such questions: Institute’s Language Counselling site , which is also integrated into the portal Fran; 3. systematic reading of new, mainly web texts of different genres which is done by students at Faculty of Arts in Ljubljana within their seminar work; 4. targeted reading of latest (news) web texts by paying special attention mainly to the fields which stood out in the first-year batch of entries (computing and technology, finances and economics, medicine, food, leisure and lifestyle); this is often done alongside the work on material for other growing dictionaries (eSSKJ, ePravopis); 5. external factors, such as projects which certain researchers from the Institute have taken part or interest in – e. g. Janes, alongside its proceedings. These searches yielded some 20 candidates. As this was only a testing phase, they would not be processed further. The comparison with larger number of users’ propositions was needed to better evaluate their position. These propositions came before long: 2016 saw an enormous increase in users’ propositions submitted at the portal Fran (7 > 180). figure 5 shows how the proportions of neologism candidates from the approaches a)–c) have changed over time: from the domination of the straightforward data comparison along with temporal corpus analysis in 2014-2015 (the line ‘GF 1.0 (n/~ 500 initial)’ and the line ‘published (entries)’ as well as the line ‘sum of the candidates’ all follow the same curve) to the steep increase of the role of neologisms from the users’ point of view (with temporal corpus analysis, when 8 Among them was sebek, Slovene equivalent to selfie (2015), which would eventually make it to the final entries in 2018. 130 Slovenski jezik – Slovene Linguistic Studies 14 (2022) applicable, remaining an important part of entry processing) from 2016 onwards. While the line ‘sum of the candidates’ represents the sum of candidates from the initial approach a) + b) plus the all the candidates from the approach c) and other sources, the line ‘sum of the propositions’ unites only the latter: users’ propositions + other sources, listed in above points 1-5. The content united under this line is shown in detail in figure 6. figure 5: Data acquisition vs final entries figure 6: Proposition types Domen Krvina, The Growing Dictionary of the Slovenian languange 131 After 2015, a substantial number of propositions came from the sources listed in the above points 1-5, especially in the years 2016- 2019. And an even larger number of users’ propositions enabled their proper comparison with propositions from other sources (above points 1-5), which could not have been done in 2015. One of the other sources was the Language Counselling site, shown separately in figure 6; most propositions were obtained either directly or indirectly from the questions related to either lexicology , , lexicography or word formation ; . The majority of propositions, however, stemmed from the process of compiling the dictionary eSSKJ and partially the normative guide ePravopis. The 2018 was somewhat exceptional – the number of propositions from other sources, which had always been lower than those submitted at Fran by users, converged with the latter. This was mainly due to some researchers taking part or interest in the project Janes and its final proceedings, which was concluded in 2018. The project Janes, especially the corpus of social media (Twitter, forums, blogs) posts (), contributed substantially to the content of the Growing Dictionary of the Slovenian Language – and not only in the 2018. Due to its specialized nature, this corpus cannot substitute the Gigafida corpus as an important tool in processing candidates (see the following chapter). However, together with web texts, the corpus proved very useful – particularly when the proposed candidates are nearly (frequency ≤ 8) or fully absent from the corpus Gigafida. The combined use of three approaches (which also applies, to a certain degree, to the above point 3, done by students, and especially to point 4, with linguists taking role similar to that of general language users but with clear goal in mind) certainly allows for a greater degree of flexibility. The listed approaches are complementary – they help alleviate limitations that would arise when sticking disproportionally to only one of them (say, only corpus data without taking into account 132 Slovenski jezik – Slovene Linguistic Studies 14 (2022) user’s observations or taking latter for granted without checking them thoroughly in corpora and other available sources). Thus, it makes sense that all of them should be used not only in collecting potential neologism candidates but also when processing them in the preparatory phase and then, if they pass the initial test, all the way to the final dictionary entries. 2.3 the Growing dictionary of the slovenian Language: dictionary inclusion criteria What criteria must or should a neologism candidate fulfill to be included in the Growing Dictionary of the Slovenian Language? Reliance on corpora data alone was good enough only in 2014, when corpus Gigafida 1.0 was still relatively new – which allowed the frequency below 1000 and above 500 alongside the requirement for the peak occurrences in the 2009-2012 to function fair enough. After 2015 – a transient and in regards of inclusion criteria somewhat ‘unsure’ year (which resulted in the lowest number of entries published ever) –, 2016 saw a rise of number of users’ propositions beyond expectations. The number of propositions from other sources (see the points 1-5 in the chapter 2.2) was substantial as well. This required a careful consideration which neologism should be included immediately and which one should be put aside for possible inclusion later on. One could argue a big number of users’ propositions alone is enough to lessen their subjectivity. Be it as it may, a decision was made they should, without any exceptions, undergo a process of verification in all the available corpora (not only Gigafida 1.0) and, if search yielded no results, also beyond corpora in web texts. From 2016-2017, web material and/or the corpus of web texts slWaC as well as corpus of academic texts KAS (in cases of determinologization), and from 2018 onwards also the corpus Janes, started being used much more frequently than before. The use of neologism candidates, along with frequency ≥ 10, in either of the listed corpora was preferred. However, should a candidate not be present in any of them, web texts still represented a sufficient last resort – although processing the data can hardly be as orderly as it is when doing it using corpora. Domen Krvina, The Growing Dictionary of the Slovenian languange 133 The non-included propositions were usually those not present in any of the corpora Gigafida, Janes or slWaC and at the same time barely present (or even absent) in the web texts. Meanwhile, the absence from the corpora alone – especially from Gigafida (1.0) and from the 2017 onwards – did not prevent the inclusion. Non-included propositions are stored in the database, and they undergo a yearly check – when their presence becomes noticeable in various sources (at least in web texts), their inclusion can be reconsidered. When certain candidate is included, word formation also comes into play in the search for potential neologisms pertaining to parts of speech different from that of the proposed candidate – this is particularly true in Slovene, as well as other Slavic languages, which are known for their rich word formation. All word-formation candidates are subjected to the checking procedure described above; they are counted among ‘other’ propositions. figure 7: Neologism candidates vs published entries As figure 7 shows, from the total of all the candidates from all the sources (represented by the line Sum_candidates) – with exception of the first year when the corpus candidates were only available 134 Slovenski jezik – Slovene Linguistic Studies 14 (2022) – in average roughly about a half made it to the final entries each year. From the 2019, the line Prop_sum equals all candidates as the candidates from the initial Fran-Gigafida 1.0 alignment lost most of their initial relevance and stopped being used as source – the typical frequency of the lemmas in the corpus Gigafida 1.0 being also published dictionary entries was reduced from initial 500 in the 2014 to 40 in 2017. That, alongside a more streamlined process of checking users’ and ‘other’ propositions (Prop_sum), caused the ratio between all the candidates and the published entries to begin a moderate, but steady rise towards 70%. For a process of checking the propositions to retain its relevance, a new specialized corpus, such as the announced SLED, is highly desired. The following chapter, which will serve as a kind of discussion entry point, will reveal the inefficiency of non-updated or scarcely updated reference corpora – such as it is the case in Slovenian – as the main source for neologism candidates not long after their compilation. Due to the sources being limited to the Slovenian corpora Gigafida 1.0 and 2.0, the results obtained apply only in regard to them, and thus cannot be generalized without taking into account the specific Slovenian situation described in the opening chapters. Further research and comparison of data from various languages may very well lead to different conclusions. 3 The feasibility of using (only reference) corpora for acquiring and/or processing the daTa In regard to the corpus Gigafida 1.0, figures 8 and 9 show the absence or near absence – frequency ≤ 8, which makes appropriate processing of an entry relying solely on the limited corpus material very much inconvenient, impractical, if not outright undoable – of each year’s (2014-2021) entries of the Growing Dictionary of the Slovenian Language; from 2019 also in the updated version of the corpus (2.0: ). It has to be kept in mind, though, that due to the corpus Gigafida not being updated until 2019, most of the dictionary entries from the 2016 onwards stemmed from users’ propositions. No large-scale analysis of the corpus Gigafida itself in terms of potential neologisms has been done. As shown in figure 9, the update was rather modest Domen Krvina, The Growing Dictionary of the Slovenian languange 135 – not making a notable difference, at least as far as neologisms, described in the Growing Dictionary of the Slovenian Language, are concerned. There was some shift9 from the complete (2019: 21 (1.0) > 11 (2.0); 2020: 50 > 44, 2021: 71 > 42) absence in 1.0 to near (2019: 46 (1.0) < 54 (2.0); 2020: 38 < 39, 36 < 50) absence in 2.0 – but the change was not substantial. Due to the initial Fran-Gigafida 1.0 alignment input, the absence or near absence from the corpus was nonexistent or negligible at first, but started gaining momentum with users’ propositions and steady work on the material for both eSSKJ and ePravopis from 2016 onwards (‘other’ propositions). Starting with 2018, the sum of (nearly) absent entries represented at least a half of each year’s entries, reaching up to 56-66% in 2020- 2021 (both values show the impact of new corona lexis). If such trends continue, one could even argue that (near) absence from the corpus Gigafida 1.0 should become one of the criteria for inclusion of neologism candidates into dictionary entries, discussed in the chapter 2.3. Not something the Growing Dictionary of the Slovenian Language would seriously consider, of course. figure 8: Presence of entries in corpus Gigafida 1.0 9 With the corpus update there was also a shift in years of peak occurrences (2009- 2012  2015-2018) to look for when processing neologism candidates. 136 Slovenski jezik – Slovene Linguistic Studies 14 (2022) figure 9: Presence of entries in corpus Gigafida 1.0 vs 2.0 The (nearly) absent entries are mainly, but not exclusively, loanwords (sometimes also in form of calques) and new derivatives, cf. (ten Hacken 2020), for example: (2016) helikopterski starši ‘helicopter parents’,10 plačilomat ‘self- service payment machine’, vejpanje ‘vaping’, vejper ‘vaper’; antropocen ‘Athropocene’, brezpilotnik ‘pilotless plane’, dismorfofobija ‘BDD’, camu camu, mangostin, zagonsko podjetje ‘start-up’; (2017) emodži ‘emoji’, hipsterka ‘hipster woman’, ključnik ‘hashtag’, kriptovaluta ‘cryptocurrency’, pajkanje ‘web crawling’, zipline; antifeministka ‘antifeminist woman’, beachvolley, čustvenček ‘emoji’, hashtag, memorizirati ‘memorise’, netiketa ‘netiquette’, skrolati ‘to scroll’, smejko ‘emoji :-)’, tvitniti ‘to twit’, vitaminoza ‘vitaminosis’; (2018) bestička ‘best she friend’, coworking, fixie, geolov ‘geocaching’, hejterka ‘hater woman’, influencer, influencerka, kretalec ‘user of sign language’, retvit, retvitati/-niti ‘to retwit/(pf)’, sebek ‘selfie’, selfiestick, supati ‘to sail on SUP’, tekstanje ‘texting’, vlogati ‘to publish on vlog’; backpackerka ‘backpacker woman’, bestič ‘best friend’, chefinja ‘she chef’, klikanost ‘number of web clicks within an interval’, mikroplastika 10 Multi-word units are also included – one or more of them are listed under one of the components representing an entry. For further information on typology and treatment of multi-word lexical units in general monolingual explanatory Slavic dictionaries see (Perdih and Ledinek 2019). Domen Krvina, The Growing Dictionary of the Slovenian languange 137 ‘microplastic’, snorkljati ‘to snorkel’, sovrtičkar ‘kindergarten peer’, streamati ‘to stream’, trolati ‘to troll’, vstavljanka ‘toy with insertable parts’, youtubati ‘to publish on YouTube’; (2019) časosled ‘timeline’, dojenčkati ‘to care for one’s own baby’, fejmič ‘famous person’, hejtanje ‘hating’, jajcemat ‘self-service egg machine’, mikrozelenjava ‘microgreen’, odslediti ‘to unfollow’, prokrastinirati ‘to procrastinate’, risoroman ‘graphic novel’, spletinar ‘webinar’, vejpati ‘to vape’; antidementiv ‘anti-dementia’, gentrificirati ‘to gentrify’, hendlanje ‘handling’, hrčkar ‘hoarder’, hrčkati ‘to hoard’, izsočiti ‘to extract juice’, jogistka ‘yogi woman’, kontrolfrik ‘control freak’, koruptibilen ‘corruptible’, nadkul ‘very cool’, napsihirati ‘to depress (pf)’, polajkati ‘to like on web (pf)’, polinkati ‘to link (pf)’, predtestirati ‘pretest’, rimoklepač ‘rapper’, shendlati ‘to manage’, takitos ‘taquito’, webinar; (2020) alfakoronavirus, antikoronski ‘anti-corona’, brain freeze, halving, korona(čas, -humor, -kriza, -paket, -panika, …) ‘corona-(time, humour, crisis, package, panic)’, plavajoča licenca ‘floating licence’, megapaket ‘mega-package’, odločbodajalec ‘decree-issuer’, ničti pacient ‘patient zero’, po(st)koronski ‘post-corona’, prekuževanje ‘infecting in order to build up immunity’, trikini; asimptomatično ‘asimptomatically’, bankster, brezsimptomen ‘asimptomatic’, brezstično ‘contactlessly’, hekaton ‘hackathon’, čredna imunost ‘herd immunity’, kohortna izolacija ‘cohort isolation’, megazakon ‘mega law-package’, novookužen ‘newly infected’, samoizolirati se ‘to impose self-quarantine’; (2021) anticepilec, antivakser, anticepilski, antivakserski (adj.) ‘anti- vaxer’, antivakserka /proticepilka ‘woman anti-vaxer’, astroturfing, butaj ‘butai’, debelostnik ‘overweight person’, gerontocid, glinarjenje ‘working with clay’, hribarjenje ‘mountain hiking’, hudi ‘hoodie’, infodemija, instagramerka, kriptorudar ‘cryptocurrency miner’, kriptorudarjenje ‘cryptocurrency mining’, kriptorudarski (adj.), lockdown, nevrorazličnost ‘neurodivergence’, odrast ‘degrowth’, pokovidni/postkovidni (adj.), poobjavljati ‘retwit’, prebolelost ‘recovery from illness’, predkoronski/predkovidni (adj.), procepilec ‘provaxer’, procepilski (adj.), protiukrepni (adj.) ‘being/working against the measures’, razogljičiti ‘decarbonise’, senicid, tiktoker, tiktokerka etc. 138 Slovenski jezik – Slovene Linguistic Studies 14 (2022) The (basic) meaning of many of these lexemes can be guessed even by a non-Slovenian speaker. They are certainly not ‘exotic’, yet none of them would make it to the entries relying only on the approaches a) + b) – it was users’ propositions (approach c)) that proved crucial for their inclusion. The typical frequency of the entries present both in the dictionary and (as lemmas) in the corpus Gigafida 1.0, shown in figure 10, further explains the diminishing role of the non-updated corpus as the reliable main source of neologism candidates without the aid provided by users’ propositions. In 2014 and some following years the corpus played a very important role – when the substantial number of candidates at certain frequency was exhausted, the next effective number usually turned out to be at approximately half the previous frequency (500 > 300 > 160). Those were frequencies allowing for quite a comfortable analysis of data, typically using Sketch Engine, which would yield reliable collocations (common in the first two to three years, rare afterwards), distinct meanings etc. figure 10: Typical frequency of entries also present as lemmas in corpus Gigafida 1.0 The change after the pivotal year 2016 was quite pronounced: the typical frequency of entries, also present as lemmas in the corpus Gigafida 1.0, was cut down to 60 (afterwards to even less). Domen Krvina, The Growing Dictionary of the Slovenian languange 139 The work turned from that resembling compilation of a general explanatory dictionary to ‘trudging’ through the material in search of examples which would reliably confirm the detected potential meanings.11 With the ultimate goal of detecting (that was very much provided for by the users’ propositions) and analysing potential neologisms at any cost, all the available corpora and (ever changing and expanding) web content came into play. One of the options yet to be explored is a potential (semi-)automatic way of detecting neologisms in a process of comparing the content of all the available corpora against expanding web content for the words not present in the corpora – with targeted search aimed mainly, but not exclusively, at the thematic fields standing out in the whole of entries of the Growing Dictionary of the Slovenian Language in the period 2014-2021. In this regard, the announced project (and a specialized corpus) SLED is also expected to prove extremely useful. 4 an overview of the basic characteristics of slovenian neologisms in the period 2014–2021 This topic would require a study in a separate paper,12 therefore only a brief introduction will be provided. Since 2012 the research of the neologisms has been limited to certain linguistic phenomena on various levels – such as word formation (Gložančev 2012), (Voršič 2015), (Štumberger 2015); semantics (Štumberger 2015a), (Zatorska 2016), (Fišer and Ljubešić 2018) – or varieties (Michelizza 2015), (Michelizza and Žagar-Karer 2018), (Zwitter Vitez and Fišer 2018). Most of them, except for those that study (colloquial) online language and are based mainly on the comparison of the corpus Janes with other corpora, are based on the material contained within the Dictionary 11 figure 10 also shows that the number of new meanings (or narrower/wider scope of a meaning or multi-word expressions) in lexis, already described in general explanatory dictionaries, such as SSKJ2, is relatively low compared to the number of completely new words. Certain external events, such as present corona crisis, seem to augment that potential (see the years 2020-2021). 12 A general overview of the topic, albeit with the focus on corona lexis (and its word- formation), is given in (Krvina 2021). 140 Slovenski jezik – Slovene Linguistic Studies 14 (2022) of New Slovenian Words. No comprehensive research on the newer material has been done yet, apart from a preliminary report, based on the entries of the Growing Dictionary of the Slovenian Language, as part of a lexicologically oriented project proposition. Therefore, for the time being only some basic insight into certain questions, which have arisen at different levels of linguistic description, can be provided. Phonetical and morphological features: a. Existence of variants, sometimes with different stylistic value, in both spelling and pronunciation (dred/dread, selfi/selfie, snorkljati/šnorkljati, zero waste [ˈziːɾɔu ̍ uɛːist] : [ˈzeːɾɔ ̍ vɛːist]); b. Types of nouns in which the accusative takes (also) animate forms, as is typical of many Slavic languages (narediti selfi/selfija ‘to make a selfie’, dobiti všeček/všečka ‘to get a like’); Types of words which also act as a type of adjective and form multi-word units in which the first element stays undeclined (backpacker turist, korona kriza) and their relation to their potential competing adjectival derivative (backpackerski, koronski). Word-formation features: a. Types of word derivatives from loanwords: verbs (skrolati, supati, tekstati), their gerunds (skrolanje, supanje, tekstanje), animate agents (supar, suparka); +/– existence of word-formation basis as an independent loanword (e. g. *skrol; skrolati, skrolanje); b. Fully borrowed nouns (ending in -er) vs derivatives (with suffix -ar) from the verbs with the loanword as the basis (vejper : [sup-a-ti] sup-ar); occurrence of variants within the same word- formation basis (youtuber : [youtub-a-ti] youtub-ar); c. Formation of verbs from loanwords as bases; the relationship between the suffix -a- and -(iz)ira- (rent-a-ti, retvit-a-ti, stream- a-ti, vlog-a-ti : anonim-izira-ti, mentor-ira-ti); d. Types, frequency and ways of derivation of feminine forms from nouns in comparison to the neutral/masculine form Domen Krvina, The Growing Dictionary of the Slovenian languange 141 (backpackerka, bestica. chefinja, influencerka); the ratio of the respective suffixes -ka, -ica, -inja; Semantic features: a. Types and frequency of thematic fields predominantly contributing to the neologisms, mostly loanwords (computing and technology, finances and economics, medicine, (healthy) food, leisure and lifestyle); b. Types and frequency of motivation for semantic shifts in the already existing words, usually via metaphor/metonymy or by expanding/narrowing/swapping the area of their use (ambasador ‘of a country’ : ‘of an activity’, dopeči ‘to bake’ ‘to the end’ : ‘using special procedure in the shop’, sledilec ‘person following the track’ : ‘following the ideas, ideology; internet follower’, sodelo ‘cooperation in general’ : ‘a special type of cooperation – co-working’, vplivnež ‘influential in general’ : ‘influential in social media; influencer’; c. Types and frequency of the synonyms, especially in the relationship loanwords vs derivatives from the non-loanwords (hashtag : ključnik, influencer : vplivnež, selfie : sebek); 5 further discussion and conclusions The analysis was concerned with types of data and their use in the Growing Dictionary of the Slovenian Language (2014-), especially in regard to collecting potential neologisms (often called ‘neologism candidates’) and processing them in available corpora material and also beyond – in web texts, particularly when corpora analysis produces no results. In collecting and processing potential neologisms three complementary approaches, used in the Growing Dictionary of the Slovenian Language, were presented: (a) straightforward data comparison (the words not present in latest editions of explanatory dictionaries but present in the latest version of the corpora); 142 Slovenski jezik – Slovene Linguistic Studies 14 (2022) (b) temporal corpus analysis (the words with the peak of occurrences in the last years in each subsequent version of the corpus); (c) neologisms from the users’ point of view (propositions submitted by users at the dictionary portal Fran). The study has shown that the inclusion of users (even when initially viewed as a part of user inclusion policy rather than a way to obtain meaningful data) is an important part of methodology. Users’ propositions can draw attention to neologisms or other meaningful phenomena that could remain nearly or completely undetected relying solely on (especially non-updated) corpus data – even that of a reference corpus such as Gigafida. Corpus use is still indispensable, particularly in general data processing: it represents the most systematic and statistically reliable way of analysis. A methodology of an individual dictionary should define the role and share of the corpus data according to the intended goals – and it should be dynamic rather than static. As the growing type of dictionary has become commonplace in Slovenian lexicography in the last couple of years,13 users are increasingly included in the compiling process in one way or another: usually via suggesting new entries, additions or corrections (this is the type of propositions the portal Fran encourages), sometimes also in the editing process itself (for Collocations Dictionary of Modern Slovene , ). Our study – based on the data obtained from the Growing Dictionary of the Slovenian Language, which from 2016 onwards heavily relies on users’ propositions, and non-/scarcely updated corpus Gigafida – suggests dictionaries of (predominantly) neologisms in particular should try to provide a steady inflow of user’s propositions (preferably in standardized electronic form allowing for easy processability and trackability) in 13 Apart from the Growing Dictionary of the Slovenian Language there are eSSKJ (2016-), ePravopis (2014-) as well as the Collocations Dictionary of Modern Slovene (2018-) and Thesaurus of Modern Slovene (2018-). All of them are corpus-based or corpus-driven and use semi- (mainly in form of word sketches) or fully automated corpus data processing. Domen Krvina, The Growing Dictionary of the Slovenian languange 143 about 2-3 years after the start of compiling. This is especially true, if regularly updated corpora are not available. Such propositions can include those obtained from advanced users, such as other researchers, especially those working on material of a general explanatory dictionary. The candidates will most likely be the ones on the fringes of general lexis,14 low in frequency (≤ 50) and possibly with occurrences mostly in the last years covered by the reference corpus – thus relying on the corpus data alone might not be able to provide a sufficient result. Our study has also shown that the role a corpus, especially if it is not regularly updated, could play in detecting and processing neologisms may be well dependant on its ‘age’ – it seems to be much more efficient in a period not exceeding 3 years since the completion of the work on the corpus. After that, the corpus data alone, if not regularly updated (and even then, since a major overhaul of corpus data is rarely possible; cf. the case of Gigafida 1.0 vs 2.0), becomes less dependable, at least according to our study. The combined strategy – such as uniting approaches (a)-(c) in the Growing Dictionary of the Slovenian Language from 2015 onwards – can often be the most effective solution for a satisfactory degree of responsiveness. It also enables the advantages of one approach to mitigate the shortcomings of another. If a new or if old, completely overhauled, appropriate corpus appears, its share in detection and processing of the candidates is expected to rise according to its content – more so in specialised neologism corpora, such as (the first for the Slovenian, that is) recently announced SLED, whose role in description of potential neologisms only further research will be able to evaluate. For the peak of candidates’ occurrences, the last years covered in the corpus are always preferred (with data noise excluded). 14 The newest terminological lexis is also regularly monitored and Slovenian equivalents of foreign terms are suggested/evaluated, especially within the framework of Terminological counselling, which Fran Ramovš Institute of Slovenian Language provides as well. Due to a large quantity of specialized lexis entering general lexis via the process of determinologization, the results of Terminological counselling activity often prove valuable for the Growing Dictionary of the Slovenian Language. 144 Slovenski jezik – Slovene Linguistic Studies 14 (2022) Apart from using the most appropriate available corpora, the scope of analysis should be widened by including web texts and taking into account all the available statistics. To further increase chances of detection of possible candidates, a strategy of targeted reading of both corpora and web texts seems viable. Searching parameters, enabling also a potential (semi-)automated search, should be defined in advance: thematic fields (those standing out in the first years of compiling may serve as suggestion, others are not excluded) in general, text types, time interval (recent years) and derivative of frequency of a candidate’s occurrence within it etc. To sum up: the responsiveness of a (contemporary) dictionary should rest on a wide array of available data and approaches and try to combine them into an effective single whole, mainly by allowing for a flexibility in ratio of the individual approaches’ shares according to a situation at hand. This is especially true of the dictionaries of neologisms since even at the beginning of their compilation most available corpora with content wide enough are likely to be somewhat old. Thus, the corpora efficiency in detecting and – due to the low frequencies – also processing neologisms will diminish over time. The share of users’ propositions should preferably remain fairly high throughout the length of the process. For this to be possible, a kind of reference point should be made and maintained – such as the well- rounded portal Fran, the site of the Society for Danish Language and Literature or the site of the Wielki słownik języka polskiego, to name only a few. Targeted reading, complemented by possible development and implementation of (semi-)automated search and extraction of potential neologisms candidates – using parameters set in advance according to the already processed data (in the database of existing entries) –, should also not be neglected. Alongside recent and future developments in automated targeted search – despite a number of drawbacks (Kerremans et al. 2012), (Slána 2017), (Waszink 2019) – it may well become one of the most important sources of candidates. As for the research on the characteristics of the neologisms, for the Slovenian only a brief overview is given at the end of the study. Even the preliminary research, based on the entries in the database of the Domen Krvina, The Growing Dictionary of the Slovenian languange 145 Growing Dictionary of the Slovenian Language, has revealed a number of subjects on various linguistic levels (phonetics, word-formation and semantics in particular) that are worth further, comprehensive research. Some subjects are shared between (related) languages: in Slavic languages for instance types of nouns in which the accusative takes (also) animate forms; types of words which can be written as one word, in which case they can be interpreted as a compound, or separately, in which case a multi-word units arise. For more effective research, it is preferable for the (dictionary) database to be structured in a standard processable format (e. g. TEI) – if the (dictionary and corpora) databases of different languages are to be compared. All the above considered, the study of types of data and their use in the Growing Dictionary of the Slovenian Language has proven to be a worthwhile subject of study both in regard to detecting, collecting, processing (checking in corpora and beyond) and describing neologisms and examining their characteristics on various levels of linguistic description. Further research should focus on generalising the findings tied to the data of the Growing Dictionary of the Slovenian Language presented in this paper by comparing them to other (especially Slavic) languages in a wider scope and in greater detail. references dictionaries Krvina, Domen (ed.). Sprotni slovar slovenskega jezika 2014– [Growing dictionary of the Slovenian Language 2014–]. Available at: https://www. fran.si/132/sprotni-sprotni-slovar-slovenskega-jezika. Slovar novejšega besedja slovenskega jezika [Dictionary of New Slovenian Words]. 2013. Available at: https://www.fran.si/131/snb-slovar- novejsega-besedja. Slovar slovenskega knjižnega jezika [Dictionary of the Slovenian Standard Language]. Available at: https://www.fran.si/130/sskj-slovar- slovenskega-knjiznega-jezika. Slovar slovenskega knjižnega jezika, druga, dopolnjena in deloma prenovljena izdaja [Dictionary of the Slovenian Standard Language, 2nd Edition]. 2014. Available at: https://www.fran.si/133/sskj2-slovar-slovenskega- knjiznega-jezika-2. 146 Slovenski jezik – Slovene Linguistic Studies 14 (2022) eSSKJ: Slovar slovenskega knjižnega jezika 2016– [eSSKJ: Dictionary of the Slovenian Standard Language, 3rd Edition]. Available at: https://www. fran.si/201/esskj-slovar-slovenskega-knjiznega-jezika. ePravopis: Slovar slovenskega pravopisa 2014– [ePravopis – Slovenian Normative Guide]. Available at: https://www.fran.si/135/epravopis- slovenski-pravopis. Furlan, M. (ed.). Novi etimološki slovar slovenskega jezika 2017– [New Etymological Dictionary of Slovenian Language]. Available at: https:// www.fran.si/207/nessj-novi-etimoloski-slovar-slovenskega-jezika. Kolokacije 1.0: Kolokacijski slovar sodobne slovenščine [Collocations Dictionary of Slovene]. Available at: https://viri.cjvt.si/kolokacije. Sopomenke 1.0: Slovar sopomenk sodobne slovenščine [Thesaurus of Modern Slovene]. Available at: https://viri.cjvt.si/sopomenke. Society for Danish Language and Literature. Available at: https://dsl.dk/. Wielki słownik języka polskiego. Available at: https://www.wsjp.pl/. corpora Gigafida 1.0. Available at: http://www.gigafida.net/. Gigafida 2.0: Korpus pisne standardne slovenščine. Available at: viri.cjvt.si/ gigafida. Janes. Available at: https://www.clarin.si/noske/run.cgi/corp_ info?corpname=janes. slWaC. Available at: https://www.clarin.si/noske/run.cgi/corp_ info?corpname=slwac. KAS. Available at: https://www.clarin.si/noske/run.cgi/corp_ info?corpname=kas. Language Counselling at ZRC SAZU. Available at: https://svetovalnica.zrc- sazu.si/. Sporazumevanje v slovenskem jeziku. Available at: http://www. slovenscina.eu/. other literature Ahačič, Kozma, Ledinek, Nina, Perdih, Andrej. 2015. Fran: The next generation Slovenian dictionary portal. In: K Gajdošová, A. Žáková (ed.). Natural language processing, corpus linguistics, lexicography: proceedings. Eighth International Conference: Bratislava. 21–22. Erjavec, Tomaž, Lubešić, Nikola. 2014. The slWaC 2.0 corpus of the Slovene web. In: T. Erjavec. J. Žganec Gros (ed.). Jezikovne tehnologije: zbornik 17. mednarodne multikonference Informacijska družba. Ljubljana: Institut Jožef Stefan. 19–24. Fišer, Darja (ur.) 2018. Viri, orodja in metode za analizo spletne slovenščine. Domen Krvina, The Growing Dictionary of the Slovenian languange 147 Ljubljana: Znanstvena založba Filozofske fakultete. DOI: https://doi. org/10.4312/9789610600701 Fišer, Darja, Ljubešić, Nikola. 2018. Tviti kot leksikografski vir za analizo pomenskih premikov v slovenščini. Viri, orodja in metode za analizo spletne slovenščine. Ljubljana: Znanstvena založba Filozofske fakultete. 198–226. Gložančev, Alenka, Jakopin, Primož, Micheliza Mija, Uršič, Lučka, Žele, Andreja. 2009. Novejša slovenska leksika (v povezavi s spletnimi jezikovnimi viri). Ljubljana: Založba ZRC, ZRC SAZU. Gložančev, Alenka. 2012. Novejša slovenska leksika v luči obravnave samostalniških zloženk v Slovenskem pravopisu 2001. Pravopisna stikanja: razprave o pravopisnih vprašanjih. Ljubljana: Založba ZRC. 125–139. ten Hacken, Pius. 2020. Norms, New Words, and Empirical Reality. International Journal of Lexicography 33/2. 135–149. DOI: https://doi. org/10.1093/ijl/ecaa005 ten Hacken, Pius, Koliopoulou, Maria. 2020. Dictionaries, Neologisms, and Linguistic Purism. International Journal of Lexicography 33/2. 127–134. DOI: https://doi.org/10.1093/ijl/ecaa011 Kerremans, D. Stegmayr S., and Schmid H-J. 2012. The NeoCrawler: Identifying and Retrieving Neologisms from the Internet and Monitoring Ongoing Change. ’ In Allan, K., Robinson, J. (eds), Current methods in historical semantics. De Gruyter Mouton. 59–96. Klosa-Kückelhaus Annette, Wolfer Sascha. 2020. Considerations on the Acceptance of German neologisms from the 1990s. International Journal of Lexicography, 33/2:150–167. DOI: https://doi.org/10.1093/ijl/ ecz033 Krek, Simon, Kosem, Iztok, Gantar, Polona. 2013. Predlog za izdelavo Slovarja sodobnega slovenskega jezika. Accessed on 1–20 January 2022. Avaliable at: http://www.sssj.si/. Krvina, Domen 2021. Sprotni slovar slovenskega jezika, covid-19 in z njim povezano (novejše) besedje. In: S. Ristić, I. Lazić Konjik, N. Ivanović (ed.). Lexicography and lexicology in the light of current issues. Beograd: Serbian language institute of SASA. Marello, Carla. 2020. New Words and New Forms of Linguistic Purism in the 21st Century: The Italian Debate. International Journal of Lexicography 33/2. 168–186. DOI: https://doi.org/10.1093/ijl/ecz034 Michelizza, Mija. 2015. Spletna besedila in jezik na spletu. Primer blogov in Wikipedije v slovenščini. Ljubljana: Založba ZRC, ZRC SAZU. Michelizza, Mija, Žagar Karer, Mojca. 2018. Internetna leksika v slovenščini. Jezikoslovni zapiski 24/1. 79–92. Panocová, Renáta. 2020. Attitudes towards Anglicisms in Contemporary 148 Slovenski jezik – Slovene Linguistic Studies 14 (2022) Standard Slovak. International Journal of Lexicography 33/2. 187–202. DOI: https://doi.org/10.1093/ijl/ecaa006 Perdih, Andrej. 2018. Dictionary portal Fran: current state and future developments. In B. Niševa (ed.). Slovanská lexikografie počátkem 21. století: sborník příspěvků z mezinárodní konference. Vyd. 1. Praha: Slovanský ústav AV ČR, 57–65. Perdih, Andrej. 2020. Portal Fran: od začetkov do danes. Rasprave Instituta za hrvatski jezik i jezikoslovlje 46/2. 997–1018. Perdih Andrej, Ledinek, Nina. 2019. Multi-word Lexical Units in General Monolingual Explanatory Dictionaries of Slavic languages. Slovenski jezik / Slovene Linguistic Studies 12. 113–134. DOI: https://ojs.zrc-sazu. si/sjsls/article/view/7629 Peredrienko, Tat’jana, Istomina, Ekaterina. 2019. Lexical Parallels in the Academic Vocabulary of Russian and English. Slavistična revija 67/4. 605–614. Available at: https://srl.si/ojs/srl/article/view/2019-4-1-5. Sicherl, Eva. 2019. Določitev spola anglizmov v slovenščini. Slavistična revija 67/2. 343–352. Avaliable at: https://srl.si/ojs/srl/article/view/2019-2-1-22. Sinclair, John McHardy. (ed.). 1987. Looking Up: An Account of the COBUILD Project in Lexical Computing and the Development of the Collins COBUILD English Language Dictionary. Collins ELT. Slána, Jakub. 2017. K (polo)automatické excerpci neologismů. Jazykovědné aktuality 54/3-4. 34–46. Jazykovědné sdružení České republiky. Stopar, Andrej, Ilc, Gašper 2019. Stilistična (ne)zaznamovanost moških in ženskih poimenovalnih parov za poklice v angleščini in slovenščini. Slavistična revija, 67/2. 333–342. Avaliable at: https://srl.si/ojs/srl/ article/view/2019-2-1-21. Štumberger, Saška. 2015a. Besedotvorje novejše slovenske leksike: medponskoobrazilne zloženke. Zbornik prispevkov s simpozija Škrabčevi dnevi 8 (2013). Nova Gorica: Založba Univerze. 155–163. Štumberger, Saška. 2015b. Leksikološka opredelitev novejše leksike in terminološka raba v slovenskem jezikoslovju. Slavistična revija 63/2. 249–259. Avaliable at: https://srl.si/ojs/srl/article/view/COBISS_ID- 57985122. Voršič, Ines. 2015. Tvorjenke s pomenom nosilnika lastnosti v novejšem slovenskem besedju. Slavia Centralis 8/1. 119–134. Waszink, Vivien. 2019. Using Neoloog to detect and describe neologisms in online dictionaries. Abstracts_IDS. Instituut voor de Nederlandse Taal. Zatorska, Agnieszka. 2016. Czasowniki w nowszej leksyce słoweńskiej. Rozprawy komisji językowej 62. 229–239. Zwitter Vitez Ana, Fišer, Darja. 2018. Govorne prvine v nestandardni spletni slovenščini. Viri, orodja in metode za analizo spletne slovenščine. Ljubljana: Znanstvena založba Filozofske fakultete. 254–272. Domen Krvina, The Growing Dictionary of the Slovenian languange 149 Received January 2022, accepted March 2022. Prejeto januarja 2022, sprejeto marca 2022. acKnowledgments The publication of article was made possible by programme Slovenski jezik v sinhronem in diahronem razvoju (P6-0038 (A)), which is financially supported by the Slovenian Research agency. The author would like to thank Mitja Trojar for language editing and advice on terminological issues. summary the GrowinG Dictionary of the Slovenian lanGuaGe (2014-) and slovenian neologisms: sTudy on Types of daTa and Their use The article examines types of data and their use in the Growing Dictionary of the Slovenian Language, which is integrated into Fran, a well-established dictionary portal for dictionaries and other language resources by the ZRC SAZU Fran Ramovš Institute of the Slovenian Language. The data in question is mainly input in the process of analysing and selecting data for dictionary entries; the dictionary is a so-called growing dictionary, which means that new entries are published every year. Most entries relate to neologisms; less commonly, there are new meanings of existing words. In the first year of compiling the dictionary (2014) and for the following few years, it was possible to rely on the Gigafida 1.0 corpus, built in 2013 (updated to 2.0 in 2019), for entry candidates; subsequently, with the “ageing” of the corpus, the main role has been assumed by user suggestions. Users can submit suggestion directly on the Fran portal (“suggest a new word”), which, with its extensiveness, serves as an important point of reference: if users feel that something is new, and it cannot be found on the Fran portal and is not an archaism, it is most likely a neologism. User suggestions are reviewed in all available resources (the Gigafida, Janes, SlWaC, KAS corpora, the web); the minimal criterion for inclusion in the dictionary is an adequate occurrence in web texts that is diverse enough in terms of sources and temporally recent. “Other” suggestions for candidates originate in the lexicographic work on other growing dictionaries (especially eSSKJ), in a seminar that is part of lexicology and lexicography lectures at the University of Ljubljana, Faculty of Arts, and partly in the Institute’s Language Counselling service; these are far less numerous than user suggestions. About 50% of the total number of 150 Slovenski jezik – Slovene Linguistic Studies 14 (2022) all suggestions are included in the dictionary every year (those not included are re-analysed the following year); in 2020–2021, this share rose to nearly 70% through the appearance of COVID-related words. One of the main highlights of this analysis is that user engagement during the compilation of the dictionary is extremely important. Dictionaries consisting mostly of neologisms (new words), in particular, cannot rely only on corpus materials in detecting potential candidates for inclusion; if the corpus in question is a general (reference) one, it is outdated fairly quickly when it comes to neologisms. With Gigafida 1.0, which, put in comparison with SSKJ2 and SNB, was a major starting source for the Growing Dictionary of the Slovenian Language, the share of yearly entries that do not appear (or hardly appear) in the corpus (f ≤ 8) rose from 0% in 2014 all the way to 66% in 2021 (exceeding 50% since 2018). The update of the corpus (2.0) in 2019 has improved the situation to some extent (instead of total absence, there is f ≤ 8 presence), but not dramatically. A corpus that is being made within the SLED project is expected to be significantly more useful, and our analysis shows this work is justified. In terms of future research following up on the analysis in this article, it seems sensible to generalise the findings relating to the Growing Dictionary of the Slovenian Language in comparison with similar analyses of (dictionaries of) new words, especially in other Slavic languages. Sprotni Slovar SlovenSkeGa jezika (2014–) in slovensKo novejše besedje: analiza tipov podatKov in njihove uporabe Prispevek obravnava tipe podatkov in njihovo uporabo v Sprotnem slovarju slovenskega jezika, ki je integriran v uveljavljen slovarsko-jezikovni portal Fran Inštituta za slovenski jezik Frana Ramovša ZRC SAZU. Gre zlasti za vhodne podatke v procesu analize in izbora podatkov za slovarske iztočnice; slovar je t. i. rastoči slovar, kar pomeni, da se nove iztočnice objavljajo vsako leto. Med iztočnicami prevladujejo neologizmi, v manjši meri novi pomeni že obstoječih besed. V prvem letu nastajanja (2014) in še nekaj naslednjih se je bilo mogoče pri kandidatih za iztočnice nasloniti na leta 2013 končani korpus Gigafida 1.0 (posodobitev 2.0 2019), pozneje pa so s »staranjem« korpusa glavno vlogo prevzeli predlogi uporabnikov. Uporabniki predloge oddajajo neposredno na portalu Fran (»predlagaj novo besedo«), ki s svojo obsežnostjo služi kot pomembna primerjalna točka: kar uporabniki čutijo kot novo, pa tega ni na portalu Fran in ni arhaizem, je precej verjetno neologizem. Uporabniški predlogi so pregledani v vseh virih, ki so na voljo (korpusi Gigafida, Janes, SlWaC, KAS, splet), pri čemer minimum za uslovarjenje predstavlja zadostna, po virih dovolj pestra in časovno novejša pojavnost v spletnih besedilih. »Drugi« predlogi za kandidate prihajajo iz slovarskega dela za preostale rastoče slovarje (zlasti eSSKJ), seminarja v okviru predavanj Domen Krvina, The Growing Dictionary of the Slovenian languange 151 iz leksikologije in leksikografije na FF UL, delno tudi iz inštitutske Jezikovne svetovalnice; po številu jih je precej manj kot predlogov uporabnikov, skupaj pa tvorijo vsoto vseh predlogov. Letno je v povprečju uslovarjenih okoli 50 % te vsote (neuslovarjeni predlogi so znova analizirani naslednje leto); v letih 2020–2021 se je ta delež povzpel proti 70 %, k čemur je prispeval pojav koronabesedja. Kot enega glavnih poudarkov analize lahko izpostavimo, da je angažiranje uporabnikov v procesu nastajanja slovarja izjemno pomembno. Zlasti slovarji pretežno neologizmov (»novejšega besedja«) se, posebej pri zaznavi potencialnih kandidatov za uslovarjenje, ne morejo naslanjati zgolj na korpusno gradivo; če gre za splošni (referenčni) korpus, ta z vidika neologizmov precej hitro zastari. V primeru Gigafide 1.0, ki je pri Sprotnem slovarju slovenskega jezika ob sopostavitvi s SSKJ2 in SNB predstavljala pomemben izhodiščni vir, se je delež vsakoletnih iztočnic, ki v korpusu (skoraj) niso prisotne (f ≤ 8), od 0 % leta 2014 povečal vse do 66 % leta 2021 (in presegal 50 % od leta 2018 dalje). Posodobitev korpusa (2.0) v letu 2019 je stanje nekoliko izboljšala (namesto polne neprisotnosti prisotnost pod f ≤ 8), vendar ne izrazito. Precej večjo uporabnost je pričakovati od korpusa v okviru projekta SLED, na utemeljenost katerega kaže tudi naša analiza. Kar se tiče prihodnjih raziskav kot nadgradnje analize v tem prispevku, se zdi smiselno ugotovitve, vezane na Sprotni slovar slovenskega jezika, posplošiti ob primerjavi s podobnimi analizami (slovarjev) novejšega besedja, zlasti v drugih slovanskih jezikih.