Domen Krvina  
ORCID: 0000-0002-2276-1156  
 
The Growing Dictionary of the Slovenian Language 
(2014-) and Slovenian Neologisms: Study on Types of 
Data and Their Use  
 
Slovenski jezik / Slovene Linguistic Studies 14/2022. 117–151. 
 
DOI: https://doi.org/10.3986/sjsls.14.1.05 
 
 
 
 
 
ISSN tiskane izdaje: 1408-2616, ISSN spletne izdaje: 1581-127 
 
https://ojs.zrc-sazu.si/sjsls
Slovenski jezik – Slovene Linguistic Studies 14 (2022): 117–151
@language: sl, en, de, it, fr
@trans-language: sl, en, de, it, fr
@publisher.id: id
@doi: 10.3986/00.0.00
@article-type: 0.00
@article-category: category
@pages: 117–151
@history-received: dd. mm. yyyy
@history-accepted: dd. mm. yyyy
* * *  Ž u r n a l  m e t a  * * *
@issue: xx
@volume: 14
@pub-year: 2022
@pub-date: dd. mm. yyyy
* * *  O p r e m a  * * *
@avtorji: Domen Krvina @running-header: The Growing Dictionary of the Slovenian languange
Domen Krvina (ORCID: 0000-0002-2276-1156)
ZRC SAZU, Inštitut za slovenski jezik Frana Ramovša, Slovenija
DOI: https://doi.org/10.3986/sjsls.14.1.05
The GrowinG DicTionary of The Slovenian 
lanGuaGe (2014-) and Slovenian neologiSmS: 
Study on typeS of data and their uSe
The article aims at presenting the methods of detection of Slovenian 
neologisms, used in the making of the Growing Dictionary of the Slovenian 
Language, accessible at the Fran portal , which integrates 
various dictionaries into a single whole, form 2014 onwards. In the first 
year of compiling and for the following few years, the main source of the 
candidates was corpus Gigafida 1.0, built in 2013. Due to the corpus not 
being updated regularly (and unavailability of other appropriate sources), 
users’ suggestions have taken over the main role. Users submit suggestions 
directly on the Fran portal. The corpus Gigafida and other (Janes, SlWaC) 
are still used for checking users’ suggestions. Due to a high number of 
such suggestions and a growing demand for new lexical descriptions, their 
importance cannot be overlooked. The neologisms collected in the dictionary 
exhibit a number of characteristics, a brief overview of which is provided 
at the end of the study.
Keywords: Neologisms, Slovene, Growing Dictionary of the Slovenian 
Language, Data Detection, Corpora, Users’ Propositions, Overview of 
Neologisms’ Characteristics
Prispevek predstavlja metode zaznavanja slovenskih neologizmov, 
uporabljene pri izdelavi Sprotnega slovarja slovenskega jezika, ki je od leta 
2014 dostopen slovarskem portalu Fran . Ta združuje 
različne slovarje v eno celoto. V prvem letu nastajanja slovarja in nekaj 
naslednjih je bil glavni vir kandidatov za neologizme korpus Gigafida 
(zaključen leta 2013). Ker se ni redno posodabljal, drugi primerni viri pa 
tudi niso bili na voljo, so glavno vlogo prevzeli predlogi uporabnikov. 
Ti lahko svoje predloge oddajajo neposredno na portalu Fran. Korpusi 
Gigafida in drugi (Janes, SlWaC) ohranjajo vlogo gradiva za preverjanje 
118 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
uporabniških predlogov. Zaradi velikega števila tovrstnih predlogov in 
velikega povpraševanja po novih leksikalnih opisih njihovega pomena ne 
le da ni mogoče zanemariti –postali so temelj opisa novejšega besedja. 
Kratek pregled njegovih temeljnih značilnosti je podan na koncu prispevka.
Ključne besede: novejše besedje, slovenščina, Sprotni slovar slovenskega 
jezika, gradivna zaznava, korpusi, predlogi uporabnikov, pregled 
značilnosti novejšega besedja
1 bacKground: transformation of slovenian lexicography, the 
portal fran and the rise of new type of dictionary in 2014
Neologisms constantly appear in language: they reflect developments 
in lifestyles, environment, perceptions of the world (ten Hacken 
2020). In Slovene, the new lexis for the period 1991-2009 was 
comprehensively treated in the monograph Novejša slovenska 
leksika (v povezavi s spletnimi jezikovnimi viri) (Gložančev et al. 2009), 
mainly from a lexicological point of view, and lexicographically in the 
Dictionary of New Slovenian Words (2012). The neologisms presented 
in the dictionary spanned from 1991 to 2012 as the wordlist was 
compiled using the Nova beseda corpus in relation to the wordlist 
of the only (systematically compiled by a team of authors adhering 
to unified principles) monolingual general explanatory dictionary 
at the time – SSKJ: Dictionary of the Slovenian Standard Language 
(1970–1991).
In the following years Slovenian lexicography, after what could be 
called a preparatory decade, experienced some major shifts in its 
course, not unlike those that took place in English lexicography at 
the time of the COBUILD project (Sinclair et al. 1987), more than 
a decade before. Firstly, the corpus Gigafida 1.0, the first Slovene 
reference corpora to be fully equipped with formal POS tagging and 
at the same time accessible to the general public, built within the 
project Sporazumevanje v slovenskem jeziku , was compiled in 2013. Secondly, that same year, three authors 
published a dictionary conceptualization plan proposing to compile 
a new, mainly corpus-driven explanatory dictionary, planned in 
different phases: from the first, computer-driven phase, whose 
Domen Krvina, The Growing Dictionary of the Slovenian languange 119
results would be only partially revised and would be available 
immediately, to the final phase with fully revised entries on various 
levels that are marked as completed (Krek et al. 2013). Thirdly, the 
first edition of SSKJ was updated and partially revised into SSKJ2: 
Dictionary of the Slovenian Standard Language, 2nd Edition using the 
data from the corpus Gigafida 1.0.
These events set the stage for the following developments in the 
late 2014 and early 2015:
 1. the emergence of the dictionary portal Fran  at the ZRC SAZU, Fran Ramovš Institute of the Slovenian 
Language; 
2. the creation of the Growing Dictionary of the Slovenian Language 
and the publication of the first-year batch of entries; 
3. the making of dictionary conceptualization plan for a completely 
new, corpus-based dictionary eSSKJ: Dictionary of the Slovenian 
Standard Language, 3rd Edition, which saw the publication of its 
first entries in 2016.
 The main role of the portal Fran in 2014 was to bring together 
existing dictionaries and integrate them into a user-friendly and 
user-responsive website – by ensuring their transition into e-form 
by linking the data from various sources that are searchable 
through a single search engine (and results displayed from all the 
different sources all at once). The portal supports user-responsive 
interface. It enables general and highly advanced, targeted 
searches. Even when a dictionary is singled out by the user, 
the search is always performed against the entire background 
database – these results are shown separately from the main 
search in the navigation panel; see figure 1 (Ahačič et al. 2015, 
Perdih 2018, 2020). The other important function of the portal was 
to serve as a platform on which completed batches of entries in 
new type of e-dictionaries could be published regularly, alongside 
with some (minor) changes to those new dictionaries on the level 
of microstructure, if necessary. These new-type dictionaries 
would be called rastoči slovarji (‘growing’ dictionaries).
120 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
figure 1: Portal Fran (Growing Dictionary of the Slovenian Language)
In October 2015, the portal adopted a policy of encouraging users 
to suggest ‘missing’ words and meanings as well as equivalents 
of loanwords as candidates for lexical description (figures 2 and 
3). Especially in the case of Slovenian equivalents of loanwords, 
Slovenian word-formation strategies (such as sup: stojeska a board 
for ‘standing paddling’, plovček ‘sailing’, ‘rowing’) would play a 
pivotal role. First seen as a part of user inclusion policy, this type of 
encouragement quickly turned out to be an extremely important 
source for propositions of neologisms, stemming directly from 
users’ observations and answering their demand. These could be 
called ‘neologisms from the users’ point of view’.
Domen Krvina, The Growing Dictionary of the Slovenian languange 121
figure 2: Portal Fran: suggesting new (‘missing’) words
figure 3: Portal Fran: suggesting (and voting for) equivalents of loanwords
2 The GrowinG Dictionary of the Slovenian lanGuaGe, collecting 
potential neologisms and sTaTe-of-arT of their sources
The Growing Dictionary of the Slovenian Language, which is the central 
point of our study, was the first one of a new type of dictionaries in 
the portal Fran – hence its name. Designed from the beginning as a 
122 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
web dictionary, the Growing Dictionary of the Slovenian Language was 
one of the first to make good use of the adaptable environment of the 
portal Fran. As it was created literally at users request (and catering 
to their needs), editors decided that all the data should be presented 
as transparently and user-friendly as possible: no abbreviations 
(commonly used in linguistics and easily recognizable for linguists, 
but not necessarily for most other dictionary users) were to be used, 
hints to the dictionary content and structure were to be given in small 
grey frames on the right, the full list of all the word forms would be 
accessible by a simple click (figure 4).
figure 4: Growing Dictionary of the Slovenian Language: interface layout
The experience accumulated in the first two years of compiling the 
Growing Dictionary of the Slovenian Language was positive. This made 
the decision for the subsequent ‘growing’ dictionaries (ePravopis: 
Slovenian Normative Guide (2014–), eSSKJ: Dictionary of the Slovenian 
Standard Language, 3rd Edition (2016–) and NESSJ – New Etymological 
Dictionary of Slovenian Language (2017–)) to follow the same direction 
easier. It should be noted, however, that due to being the first of 
Domen Krvina, The Growing Dictionary of the Slovenian languange 123
‘growing’ kind, the Growing Dictionary of the Slovenian Language 
started more or less as the ‘dictionary on the fly’: apart from some 
basic principles of compilation (see below), most of its compiling 
criteria, especially in first years of compiling, would be dynamic rather 
than static. That was also linked to the circumstances regarding the 
availability of appropriate (corpora) resources and their (scarce or 
missing) updates. Therefore, dictionary compilation itself (as well 
as its assessment this paper is aiming at) could be seen as a certain 
experiment, particularly in the years prior to 2018-2019. The dynamic 
nature also applies to the dictionary’s definition of  ‘neologism’, which 
has been inclusive rather than exclusive, but more or less based on 
the three complementary approaches (see chapter 2.2) in dynamically 
changing proportions.
2.1 The Growing dictionary of the slovenian Language: main features and 
source limitations
The intention of the Growing Dictionary of the Slovenian Language 
was to continue the course of detecting and describing neologisms 
the Dictionary of New Slovenian Words had started. The latter had 
defined a neologism in a somewhat straightforward way: the words 
(if the word already existed, also meanings – but this was rarer) not 
present in the SSKJ: Dictionary of the Slovenian Standard Language 
(1970–1991), but appearing in one of the first Slovene corpora Nova 
beseda, would qualify as candidates for dictionary description. Their 
frequency was of lesser importance, though given the scope of the 
corpus Nova beseda, it would be rather low in most cases. 6,000 such 
neologisms were described as dictionary entries (some contained 
several multi-world units), published in 2012.
For the present Slovenian state-of-art, it is important to note that there 
are no corpora of new Slovenian texts that are regularly updated. 
In late 2021, a project SLED (Spremljevalni korpus in spremljajoči 
podatkovni viri – SLED (ijs.si)), aimed at tracking neologisms, was 
announced – including a specialised corpus. However, its first version 
will not be available until late 2022. There are other specialised corpora 
of social media texts (Twitter, forums, blogs), such as Janes, built within 
124 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
the same-name project in 2014-2018 (; cf. Fišer 
et al. 2018), and corpora of web texts, such as SlWaC, built in 2011, 
and updated in 2014 (v. 2.1) using the web crawler SpiderLing (Erjavec, 
Ljubešić 2014). The main reference corpus Gigafida, published in 
2013, saw a modest update of texts up to 2018 in 2019 (Gigafida 2.0). 
The past and especially present state-of-art, therefore, presented 
and still presents a substantial (but not insurmountable) obstacle to 
obtaining a completely corpus-driven candidate list of neologisms – 
which would contribute to its objectiveness.
As the goal of the Growing Dictionary of the Slovenian Language was 
to detect and analyse potential neologisms, it would make use of 
any appropriate resources at hand. At first, the Gigafida 1.0 corpus 
seemed sufficient (see below), but with no basic research of the 
newest lexis after 2013, its role could not be properly evaluated – at 
least not in a way it would remain the sole (major) source. With the 
number of users’ propositions growing, the focus shifted to them, 
while Gigafida (and other corpora, as they became available) retained 
the role of sources used for checking such propositions. Due to the 
scarce (or non-existent) corpora updates, the – ever changing and 
expanding – web content came to the fore. With no widespread and 
readily available crawling tools for Slovenian (the one used in SlWac 
was the same as used for Czech), the dictionary would also not try 
to develop its own; partly because it would be time-consuming for a 
rather small dictionary outside the frames of general analysis of new 
lexis after 2013. Therefore, the option yet to be explored is a (semi-)
automatic way of detecting neologisms in a process of comparing 
the content of all the available corpora against the expanding web 
content for the words not present in the corpora.
When decision was made in 2014 to start compiling the Growing 
Dictionary of the Slovenian Language, the first version (1.0) of the 
corpus Gigafida (2013) was the largest at hand and still relatively new. 
Therefore, it seemed feasible to retain the definition of neologism from 
the Dictionary of New Slovenian Words: words (or, rarer, meanings) not 
present in the latter nor in the recently updated and partially revised 
SSKJ2: Dictionary of the Slovenian Standard Language, 2nd Edition but 
Domen Krvina, The Growing Dictionary of the Slovenian languange 125
appearing in Gigafida 1.0 would qualify as candidates for dictionary 
description. Taking into account the scope of a billion-word corpus 
Gigafida 1.0, additional limitations regarding the frequency and time 
of appearance were introduced: the frequency of corpus lemma 
should be below 1,000 (and above 500), the peak of occurrences in 
years 2009-2012 – the last three years covered in the corpus Gigafida 
1.0. Thus, an additional frequency-time dimension (Slána 2017: 41) 
that corpus analysis allows for was provided – these could be called 
‘neologisms from the temporal point of view’. 
This procedure yielded some 500 candidates, out of which 224 (the 
majority of them with corpus frequency 700–500)1 were chosen and 
then further processed all the way to the final dictionary entries. 
Among various thematic fields some stood out in particular – and 
would mostly continue to do so in the following years (cf. also Slána 
2017: 42–43):2 
a.  computing and technology: android, driftati ‘drive a car drifting’, 
inoks ‘stainless steel’, karbon ‘carbon used in bike frames’, kevlar 
‘Kevlar’, multifunkcijski ‘multifunctional’, replikacija ‘replication’, 
večigralski ‘multi-player’, vtičnik ‘plugin’;
 b. finances and economics: depozitarni ‘depository’, fiskalno  
‘fiscally’, konsolidacija ‘consolidation’, prociklični ‘procyclic’, 
refinancirati ‘refinance’, volatilnost ‘volatility’; 
c.  medicine: artroskopija ‘arthroscopy’, epiduralni ‘epidural’, 
fibromialgija ‘fibromyalgia’, kandidiaza ‘candidiasis’, 
mirkocirkulacija ‘microcirculation’, obstruktiven ‘obstructive’, 
paradontalni ‘parodontal’;3
1 For further inclusion criteria see the chapter 2.3.
2 Be aware that words listed above would qualify as neologisms in 2014, which may 
not be the case anymore. They will be probably sooner or later described also in 
general explanatory dictionaries, such as eSSKJ.
3 In the fields of economics and especially medicine there is often a great deal of 
English-Slovene parallels both in form and meaning. For Russian-English comparison, 
see (Peredrienko and Istomina 2019).
126 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
d. (healthy) food, leisure and lifestyle: falafel, gamber ‘prawn’, 
goji, makadamija ‘macadamia’, tahini; glamping, selfi ‘selfie’, 
selfness, skike, sup ‘SUP’, trimaran. 
Some users, accustomed to the Dictionary of the Slovenian Standard 
Language, which was both descriptive and normative, would still 
expect a dictionary to mark certain words for their ‘foreign origin’ – 
in the Dictionary of New Slovenian Words this was done in cases the 
word retained the original written form from the donor language by 
applying the label cit. (lit. ‘cited form’). Since the Growing Dictionary of 
the Slovenian Language was intended not to shy away from collecting 
many such words, it would not continue that tradition. The labels 
were to be used sparingly and ‘loanword’ would not automatically 
translate to ‘colloquial’, as this was often the case in earlier dictionaries, 
particularly in loanwords from German (the process of labelling was not 
straightforward; the fact of being borrowed, especially from German, 
would quite commonly point to a non-formal language layer, however). 
4 Descriptiveness was the main goal and after two years users would 
embrace that fact – at least judging by their propositions, submitted 
(mainly) at the portal Fran.
In 2015, the total number of final dictionary entries was much lower (224 > 
94),5 although with some prominent additions, such as loanwords bitcoin, 
karite ‘shea tree, butter’, overland, vloger, vlogerka ‘woman vlogger’ etc.6 
This was mostly due to the fact that the initial supply of corpus candidates 
had been partially exhausted (note that until the modest above-
mentioned update in 2019, the corpus remained virtually unchanged). 
Some uncertainties arose about how the potential neologisms with 
fewer than 500 occurrences should be treated: is this frequency still 
4 For further information on neologisms and purism in other European languages see 
(ten Hacken and Koliopoulou 2020), (Klosa-Kückelhaus and Wolfer 2020), (Marello 
2020), (Panocová 2020).
5 Partially also due to the decision taken at the Fran Ramovš Institut of the Slovenian 
Language to expand the smaller-scope ‘growing’ dictionaries by approximately 100 
entries/units per year.
6 The formation of feminine forms usually follows their neutral (grammatically ‘male’) 
counterparts rather quickly. For English-Slovene comparison and general information 
on gender of English loanwords in Slovene see (Stopar and Ilc 2019), (Sicherl 2019).
Domen Krvina, The Growing Dictionary of the Slovenian languange 127
relevant in a corpus exceeding one billion tokens or not (provided the 
peak of occurrences occurs in final years still covered in corpus)? As it 
would turn out later when checking users’ propositions, this frequency 
not only suffices – it is rather high: as the time passes, many potential 
neologisms may not be present in (non-updated) corpora at all. In 2015, 
the inflow of users’ propositions was only gaining momentum to increase 
considerably in the following years and maintain the position of one of 
the most important methods of detecting neologisms.
2.2 The Growing dictionary of the slovenian Language: complementary approaches 
To collecting neologisms
As pointed out above, three main approaches have been developed 
and used complementarily, according to and in reaction to the available 
sources, in the Growing Dictionary of the Slovenian Language to collect 
potential neologisms:
a. Straightforward data comparison approach: the words (or 
meanings) not present in latest editions of explanatory dictionaries 
(if available, especially those of new words) but present in the 
latest version of corpora are very likely neologisms. This approach 
was used in the Dictionary of New Slovenian Words and retained 
(especially for the first two-three years) in the Growing Dictionary 
of the Slovenian Language.
b.  Temporal corpus analysis approach: the words with the peak 
of occurrences in the last years (data noise excluded) in each 
subsequent version of the corpus are potential neologisms for 
the time period covered in the corpus.
c.  Neologisms from the users’ point of view: words felt as ‘new’ 
by users themselves – according to their daily language use and 
observations.7 Perhaps the most subjective of the three, but the 
subjectiveness is somewhat mitigated by the sheer number of 
7 Direct interaction with users via collecting and answering their questions concerning 
mainly everyday (and often not completely expected/systemic) language use is 
also the mainstay of Fran Ramovš Institute of the Slovenian Language Language 
Counselling.
128 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
such propositions coming from various users interested in various 
thematic fields.
The Growing Dictionary of the Slovenian Language first combined the 
approaches described above in the points a) and b). It found itself 
at a certain crossroads in the year 2015 – after the publication of 
the first yearly batch of entries. The upper half of words not present 
in available dictionaries but present in the corpus Gigafida 1.0 with 
frequency 1,000-500 and the peak of occurrences in the years 2009-
2012 had been exhausted. Given the fact that the corpus Gigafida 1.0 
had not received any update since 2013, 2015 was absolutely the last 
year in which 2009-2012 as a peak of occurrences seemed convincing 
enough for the temporal criteria (point b above) to be still applicable. 
Their typical (extremes at both ends are not taken into account) 
frequency plummeted from over 500 to 300. Fewer than 100 such 
words were processed all the way to the final dictionary entries – and it 
would be the last time corpus-only candidates made the vast majority 
of the final entries; see the line ‘GF 1.0 (n/~ 500 initial)’ in figure 5. It 
became clear that new ways of collecting potential neologisms were 
to be actively sought out.
 As mentioned, encouraging users to suggest ‘missing’ words and 
meanings (and equivalents of loanwords) as candidates for lexical 
description was first seen as a part of user inclusion policy – at the 
time no one could predict what an important source of collecting 
potential neologisms it would become. It should be noted that faced 
with the entire portal Fran content – from present-day to historical as 
well as terminological dictionaries in a unified electronic form – users 
had a powerful tool to compare entries which could serve as a kind of 
checkpoint: anything felt as ‘new’, but already described in one of the 
dictionaries or other manuals at portal Fran, would not qualify as such. 
Anything non-present anywhere at the portal Fran, however, identified 
as new – and, as it could eventually turn out, not present even in the latest 
(2.0), let alone the first (1.0) version of the corpus Gigafida – would have 
a high qualification as a potential ‘new word’ (neologism).
Domen Krvina, The Growing Dictionary of the Slovenian languange 129
In 2015, however, user’s propositions were few (7 were submitted)8 
and available only late in the year, and a number of other sources were 
selected in search of potential neologisms:
1. regular mail, telephone – usually alongside a linguistic question, 
answered by one of the researchers at the Institute;
2. the formalised way of answering such questions: Institute’s 
Language Counselling site , 
which is also integrated into the portal Fran;
3.  systematic reading of new, mainly web texts of different genres 
which is done by students at Faculty of Arts in Ljubljana within 
their seminar work;
4. targeted reading of latest (news) web texts by paying special 
attention mainly to the fields which stood out in the first-year batch 
of entries (computing and technology, finances and economics, 
medicine, food, leisure and lifestyle); this is often done alongside the 
work on material for other growing dictionaries (eSSKJ, ePravopis);
5. external factors, such as projects which certain researchers from 
the Institute have taken part or interest in – e. g. Janes, alongside 
its proceedings.
These searches yielded some 20 candidates. As this was only a testing 
phase, they would not be processed further. The comparison with larger 
number of users’ propositions was needed to better evaluate their 
position. These propositions came before long: 2016 saw an enormous 
increase in users’ propositions submitted at the portal Fran (7 > 180).
figure 5 shows how the proportions of neologism candidates from 
the approaches a)–c) have changed over time: from the domination 
of the straightforward data comparison along with temporal corpus 
analysis in 2014-2015 (the line ‘GF 1.0 (n/~ 500 initial)’ and the line 
‘published (entries)’ as well as the line ‘sum of the candidates’ all 
follow the same curve) to the steep increase of the role of neologisms 
from the users’ point of view (with temporal corpus analysis, when 
8 Among them was sebek, Slovene equivalent to selfie (2015), which would eventually 
make it to the final entries in 2018.
130 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
applicable, remaining an important part of entry processing) from 
2016 onwards. While the line ‘sum of the candidates’ represents the 
sum of candidates from the initial approach a) + b) plus the all the 
candidates from the approach c) and other sources, the line ‘sum of 
the propositions’ unites only the latter: users’ propositions + other 
sources, listed in above points 1-5. The content united under this line 
is shown in detail in figure 6.
figure 5: Data acquisition vs final entries
figure 6: Proposition types
Domen Krvina, The Growing Dictionary of the Slovenian languange 131
After 2015, a substantial number of propositions came from the 
sources listed in the above points 1-5, especially in the years 2016-
2019. And an even larger number of users’ propositions enabled their 
proper comparison with propositions from other sources (above 
points 1-5), which could not have been done in 2015. One of the other 
sources was the Language Counselling site, shown separately in 
figure 6; most propositions were obtained either directly or indirectly 
from the questions related to either lexicology , , lexicography or word formation ; . The majority 
of propositions, however, stemmed from the process of compiling 
the dictionary eSSKJ and partially the normative guide ePravopis.
The 2018 was somewhat exceptional – the number of propositions 
from other sources, which had always been lower than those 
submitted at Fran by users, converged with the latter. This was mainly 
due to some researchers taking part or interest in the project Janes 
and its final proceedings, which was concluded in 2018. The project 
Janes, especially the corpus of social media (Twitter, forums, blogs) 
posts (), contributed 
substantially to the content of the Growing Dictionary of the Slovenian 
Language – and not only in the 2018. Due to its specialized nature, 
this corpus cannot substitute the Gigafida corpus as an important 
tool in processing candidates (see the following chapter). However, 
together with web texts, the corpus proved very useful – particularly 
when the proposed candidates are nearly (frequency ≤ 8) or fully 
absent from the corpus Gigafida.
The combined use of three approaches (which also applies, to a certain 
degree, to the above point 3, done by students, and especially to 
point 4, with linguists taking role similar to that of general language 
users but with clear goal in mind) certainly allows for a greater degree 
of flexibility. The listed approaches are complementary – they help 
alleviate limitations that would arise when sticking disproportionally 
to only one of them (say, only corpus data without taking into account 
132 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
user’s observations or taking latter for granted without checking 
them thoroughly in corpora and other available sources). Thus, it 
makes sense that all of them should be used not only in collecting 
potential neologism candidates but also when processing them in 
the preparatory phase and then, if they pass the initial test, all the 
way to the final dictionary entries.
2.3 the Growing dictionary of the slovenian Language: dictionary inclusion 
criteria
What criteria must or should a neologism candidate fulfill to be 
included in the Growing Dictionary of the Slovenian Language? Reliance 
on corpora data alone was good enough only in 2014, when corpus 
Gigafida 1.0 was still relatively new – which allowed the frequency 
below 1000 and above 500 alongside the requirement for the peak 
occurrences in the 2009-2012 to function fair enough. After 2015 – a 
transient and in regards of inclusion criteria somewhat ‘unsure’ year 
(which resulted in the lowest number of entries published ever) –, 
2016 saw a rise of number of users’ propositions beyond expectations. 
The number of propositions from other sources (see the points 1-5 
in the chapter 2.2) was substantial as well.
This required a careful consideration which neologism should be 
included immediately and which one should be put aside for possible 
inclusion later on. One could argue a big number of users’ propositions 
alone is enough to lessen their subjectivity. Be it as it may, a decision 
was made they should, without any exceptions, undergo a process 
of verification in all the available corpora (not only Gigafida 1.0) and, 
if search yielded no results, also beyond corpora in web texts. From 
2016-2017, web material and/or the corpus of web texts slWaC as well 
as corpus of academic texts KAS (in cases of determinologization), and 
from 2018 onwards also the corpus Janes, started being used much 
more frequently than before. The use of neologism candidates, along 
with frequency ≥ 10, in either of the listed corpora was preferred. 
However, should a candidate not be present in any of them, web 
texts still represented a sufficient last resort – although processing 
the data can hardly be as orderly as it is when doing it using corpora. 
Domen Krvina, The Growing Dictionary of the Slovenian languange 133
The non-included propositions were usually those not present in any 
of the corpora Gigafida, Janes or slWaC and at the same time barely 
present (or even absent) in the web texts. Meanwhile, the absence 
from the corpora alone – especially from Gigafida (1.0) and from the 
2017 onwards – did not prevent the inclusion.
Non-included propositions are stored in the database, and they 
undergo a yearly check – when their presence becomes noticeable 
in various sources (at least in web texts), their inclusion can be 
reconsidered. When certain candidate is included, word formation 
also comes into play in the search for potential neologisms pertaining 
to parts of speech different from that of the proposed candidate – 
this is particularly true in Slovene, as well as other Slavic languages, 
which are known for their rich word formation. All word-formation 
candidates are subjected to the checking procedure described above; 
they are counted among ‘other’ propositions.
figure 7: Neologism candidates vs published entries
As figure 7 shows, from the total of all the candidates from all the 
sources (represented by the line Sum_candidates) – with exception 
of the first year when the corpus candidates were only available 
134 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
– in average roughly about a half made it to the final entries each 
year. From the 2019, the line Prop_sum equals all candidates as the 
candidates from the initial Fran-Gigafida 1.0 alignment lost most 
of their initial relevance and stopped being used as source – the 
typical frequency of the lemmas in the corpus Gigafida 1.0 being 
also published dictionary entries was reduced from initial 500 in 
the 2014 to 40 in 2017. That, alongside a more streamlined process 
of checking users’ and ‘other’ propositions (Prop_sum), caused the 
ratio between all the candidates and the published entries to begin 
a moderate, but steady rise towards 70%. For a process of checking 
the propositions to retain its relevance, a new specialized corpus, 
such as the announced SLED, is highly desired. The following 
chapter, which will serve as a kind of discussion entry point, 
will reveal the inefficiency of non-updated or scarcely updated 
reference corpora – such as it is the case in Slovenian – as the main 
source for neologism candidates not long after their compilation. 
Due to the sources being limited to the Slovenian corpora Gigafida 
1.0 and 2.0, the results obtained apply only in regard to them, and 
thus cannot be generalized without taking into account the specific 
Slovenian situation described in the opening chapters. Further 
research and comparison of data from various languages may very 
well lead to different conclusions.
3 The feasibility of using (only reference) corpora 
for acquiring and/or processing the daTa
In regard to the corpus Gigafida 1.0, figures 8 and 9 show the 
absence or near absence – frequency ≤ 8, which makes appropriate 
processing of an entry relying solely on the limited corpus material 
very much inconvenient, impractical, if not outright undoable – of 
each year’s (2014-2021) entries of the Growing Dictionary of the 
Slovenian Language; from 2019 also in the updated version of the 
corpus (2.0: ). It has to 
be kept in mind, though, that due to the corpus Gigafida not being 
updated until 2019, most of the dictionary entries from the 2016 
onwards stemmed from users’ propositions. No large-scale analysis 
of the corpus Gigafida itself in terms of potential neologisms has 
been done. As shown in figure 9, the update was rather modest 
Domen Krvina, The Growing Dictionary of the Slovenian languange 135
– not making a notable difference, at least as far as neologisms, 
described in the Growing Dictionary of the Slovenian Language, are 
concerned. There was some shift9 from the complete (2019: 21 
(1.0) > 11 (2.0); 2020: 50 > 44, 2021: 71 > 42) absence in 1.0 to near 
(2019: 46 (1.0) < 54 (2.0); 2020: 38 < 39, 36 < 50) absence in 2.0 – but 
the change was not substantial. Due to the initial Fran-Gigafida 1.0 
alignment input, the absence or near absence from the corpus was 
nonexistent or negligible at first, but started gaining momentum 
with users’ propositions and steady work on the material for both 
eSSKJ and ePravopis from 2016 onwards (‘other’ propositions). 
Starting with 2018, the sum of (nearly) absent entries represented 
at least a half of each year’s entries, reaching up to 56-66% in 2020-
2021 (both values show the impact of new corona lexis). If such 
trends continue, one could even argue that (near) absence from the 
corpus Gigafida 1.0 should become one of the criteria for inclusion 
of neologism candidates into dictionary entries, discussed in the 
chapter 2.3. Not something the Growing Dictionary of the Slovenian 
Language would seriously consider, of course.
figure 8: Presence of entries in corpus Gigafida 1.0
9 With the corpus update there was also a shift in years of peak occurrences (2009-
2012  2015-2018) to look for when processing neologism candidates.
136 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
figure 9: Presence of entries in corpus Gigafida 1.0 vs 2.0
The (nearly) absent entries are mainly, but not exclusively, loanwords 
(sometimes also in form of calques) and new derivatives, cf. (ten 
Hacken 2020), for example:
(2016) helikopterski starši ‘helicopter parents’,10 plačilomat ‘self-
service payment machine’, vejpanje ‘vaping’, vejper ‘vaper’; antropocen 
‘Athropocene’, brezpilotnik ‘pilotless plane’, dismorfofobija ‘BDD’, 
camu camu, mangostin, zagonsko podjetje ‘start-up’;
(2017) emodži ‘emoji’, hipsterka ‘hipster woman’, ključnik ‘hashtag’, 
kriptovaluta ‘cryptocurrency’, pajkanje ‘web crawling’, zipline; 
antifeministka ‘antifeminist woman’, beachvolley, čustvenček ‘emoji’, 
hashtag, memorizirati ‘memorise’, netiketa ‘netiquette’, skrolati ‘to 
scroll’, smejko ‘emoji :-)’, tvitniti ‘to twit’, vitaminoza ‘vitaminosis’;
(2018) bestička ‘best she friend’, coworking, fixie, geolov ‘geocaching’, 
hejterka ‘hater woman’, influencer, influencerka, kretalec ‘user of sign 
language’, retvit, retvitati/-niti ‘to retwit/(pf)’, sebek ‘selfie’, selfiestick, 
supati ‘to sail on SUP’, tekstanje ‘texting’, vlogati ‘to publish on vlog’; 
backpackerka ‘backpacker woman’, bestič ‘best friend’, chefinja ‘she 
chef’, klikanost ‘number of web clicks within an interval’, mikroplastika 
10 Multi-word units are also included – one or more of them are listed under one of 
the components representing an entry. For further information on typology and 
treatment of multi-word lexical units in general monolingual explanatory Slavic 
dictionaries see (Perdih and Ledinek 2019).
Domen Krvina, The Growing Dictionary of the Slovenian languange 137
‘microplastic’, snorkljati ‘to snorkel’, sovrtičkar ‘kindergarten peer’, 
streamati ‘to stream’, trolati ‘to troll’, vstavljanka ‘toy with insertable 
parts’, youtubati ‘to publish on YouTube’;
(2019) časosled ‘timeline’, dojenčkati ‘to care for one’s own baby’, 
fejmič ‘famous person’, hejtanje ‘hating’, jajcemat ‘self-service 
egg machine’, mikrozelenjava ‘microgreen’, odslediti ‘to unfollow’, 
prokrastinirati ‘to procrastinate’, risoroman ‘graphic novel’, spletinar 
‘webinar’, vejpati ‘to vape’; antidementiv ‘anti-dementia’, gentrificirati 
‘to gentrify’, hendlanje ‘handling’, hrčkar ‘hoarder’, hrčkati ‘to hoard’, 
izsočiti ‘to extract juice’, jogistka ‘yogi woman’, kontrolfrik ‘control 
freak’, koruptibilen ‘corruptible’, nadkul ‘very cool’, napsihirati ‘to 
depress (pf)’, polajkati ‘to like on web (pf)’, polinkati ‘to link (pf)’, 
predtestirati ‘pretest’, rimoklepač ‘rapper’, shendlati ‘to manage’, 
takitos ‘taquito’, webinar;
(2020) alfakoronavirus, antikoronski ‘anti-corona’, brain freeze, halving, 
korona(čas, -humor, -kriza, -paket, -panika, …) ‘corona-(time, humour, 
crisis, package, panic)’, plavajoča licenca ‘floating licence’, megapaket 
‘mega-package’, odločbodajalec ‘decree-issuer’, ničti pacient ‘patient 
zero’, po(st)koronski ‘post-corona’, prekuževanje ‘infecting in order 
to build up immunity’, trikini; asimptomatično ‘asimptomatically’, 
bankster, brezsimptomen ‘asimptomatic’, brezstično ‘contactlessly’, 
hekaton ‘hackathon’, čredna imunost ‘herd immunity’, kohortna 
izolacija ‘cohort isolation’, megazakon ‘mega law-package’, novookužen 
‘newly infected’, samoizolirati se ‘to impose self-quarantine’;
(2021) anticepilec, antivakser, anticepilski, antivakserski (adj.) ‘anti-
vaxer’, antivakserka /proticepilka ‘woman anti-vaxer’, astroturfing, 
butaj ‘butai’, debelostnik ‘overweight person’, gerontocid, glinarjenje 
‘working with clay’, hribarjenje ‘mountain hiking’, hudi ‘hoodie’, 
infodemija, instagramerka, kriptorudar ‘cryptocurrency miner’, 
kriptorudarjenje ‘cryptocurrency mining’, kriptorudarski (adj.), 
lockdown, nevrorazličnost ‘neurodivergence’, odrast ‘degrowth’, 
pokovidni/postkovidni (adj.), poobjavljati ‘retwit’, prebolelost ‘recovery 
from illness’, predkoronski/predkovidni (adj.), procepilec ‘provaxer’, 
procepilski (adj.), protiukrepni (adj.) ‘being/working against the 
measures’, razogljičiti ‘decarbonise’, senicid, tiktoker, tiktokerka etc.
138 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
The (basic) meaning of many of these lexemes can be guessed 
even by a non-Slovenian speaker. They are certainly not ‘exotic’, 
yet none of them would make it to the entries relying only on 
the approaches a) + b) – it was users’ propositions (approach 
c)) that proved crucial for their inclusion. The typical frequency 
of the entries present both in the dictionary and (as lemmas) in 
the corpus Gigafida 1.0, shown in figure 10, further explains the 
diminishing role of the non-updated corpus as the reliable main 
source of neologism candidates without the aid provided by users’ 
propositions. In 2014 and some following years the corpus played 
a very important role – when the substantial number of candidates 
at certain frequency was exhausted, the next effective number 
usually turned out to be at approximately half the previous 
frequency (500 > 300 > 160). Those were frequencies allowing for 
quite a comfortable analysis of data, typically using Sketch Engine, 
which would yield reliable collocations (common in the first two to 
three years, rare afterwards), distinct meanings etc.
figure 10: Typical frequency of entries also present as lemmas in corpus 
Gigafida 1.0
The change after the pivotal year 2016 was quite pronounced: 
the typical frequency of entries, also present as lemmas in the 
corpus Gigafida 1.0, was cut down to 60 (afterwards to even less). 
Domen Krvina, The Growing Dictionary of the Slovenian languange 139
The work turned from that resembling compilation of a general 
explanatory dictionary to ‘trudging’ through the material in search 
of examples which would reliably confirm the detected potential 
meanings.11 With the ultimate goal of detecting (that was very 
much provided for by the users’ propositions) and analysing 
potential neologisms at any cost, all the available corpora and 
(ever changing and expanding) web content came into play. One of 
the options yet to be explored is a potential (semi-)automatic way 
of detecting neologisms in a process of comparing the content of 
all the available corpora against expanding web content for the 
words not present in the corpora – with targeted search aimed 
mainly, but not exclusively, at the thematic fields standing out 
in the whole of entries of the Growing Dictionary of the Slovenian 
Language in the period 2014-2021. In this regard, the announced 
project (and a specialized corpus) SLED is also expected to prove 
extremely useful.
4 an overview of the basic characteristics of slovenian 
neologisms in the period 2014–2021
This topic would require a study in a separate paper,12 therefore 
only a brief introduction will be provided. Since 2012 the research 
of the neologisms has been limited to certain linguistic phenomena 
on various levels – such as word formation (Gložančev 2012), (Voršič 
2015), (Štumberger 2015); semantics (Štumberger 2015a), (Zatorska 
2016), (Fišer and Ljubešić 2018) – or varieties (Michelizza 2015), 
(Michelizza and Žagar-Karer 2018), (Zwitter Vitez and Fišer 2018). Most 
of them, except for those that study (colloquial) online language and 
are based mainly on the comparison of the corpus Janes with other 
corpora, are based on the material contained within the Dictionary 
11 figure 10 also shows that the number of new meanings (or narrower/wider scope 
of a meaning or multi-word expressions) in lexis, already described in general 
explanatory dictionaries, such as SSKJ2, is relatively low compared to the number of 
completely new words. Certain external events, such as present corona crisis, seem 
to augment that potential (see the years 2020-2021).
12 A general overview of the topic, albeit with the focus on corona lexis (and its word-
formation), is given in (Krvina 2021).
140 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
of New Slovenian Words. No comprehensive research on the newer 
material has been done yet, apart from a preliminary report, based 
on the entries of the Growing Dictionary of the Slovenian Language, as 
part of a lexicologically oriented project proposition. Therefore, for 
the time being only some basic insight into certain questions, which 
have arisen at different levels of linguistic description, can be provided.
Phonetical and morphological features:
a. Existence of variants, sometimes with different stylistic value, 
in both spelling and pronunciation (dred/dread, selfi/selfie, 
snorkljati/šnorkljati, zero waste [ˈziːɾɔu ̍ uɛːist] : [ˈzeːɾɔ ̍ vɛːist]);
b. Types of nouns in which the accusative takes (also) animate 
forms, as is typical of many Slavic languages (narediti selfi/selfija 
‘to make a selfie’, dobiti všeček/všečka ‘to get a like’);
 Types of words which also act as a type of adjective and form 
multi-word units in which the first element stays undeclined 
(backpacker turist, korona kriza) and their relation to their 
potential competing adjectival derivative (backpackerski, 
koronski).
Word-formation features:
a. Types of word derivatives from loanwords: verbs (skrolati, supati, 
tekstati), their gerunds (skrolanje, supanje, tekstanje), animate 
agents (supar, suparka); +/– existence of word-formation basis 
as an independent loanword (e. g. *skrol; skrolati, skrolanje);
b.  Fully borrowed nouns (ending in -er) vs derivatives (with suffix 
-ar) from the verbs with the loanword as the basis (vejper : 
[sup-a-ti] sup-ar); occurrence of variants within the same word-
formation basis (youtuber : [youtub-a-ti] youtub-ar);
c.  Formation of verbs from loanwords as bases; the relationship 
between the suffix -a- and -(iz)ira- (rent-a-ti, retvit-a-ti, stream-
a-ti, vlog-a-ti : anonim-izira-ti, mentor-ira-ti);
d. Types, frequency and ways of derivation of feminine forms 
from nouns in comparison to the neutral/masculine form 
Domen Krvina, The Growing Dictionary of the Slovenian languange 141
(backpackerka, bestica. chefinja, influencerka); the ratio of the 
respective suffixes -ka, -ica, -inja;
Semantic features:
a. Types and frequency of thematic fields predominantly 
contributing to the neologisms, mostly loanwords (computing 
and technology, finances and economics, medicine, (healthy) 
food, leisure and lifestyle); 
b. Types and frequency of motivation for semantic shifts in the 
already existing words, usually via metaphor/metonymy 
or by expanding/narrowing/swapping the area of their use 
(ambasador ‘of a country’ : ‘of an activity’, dopeči ‘to bake’ ‘to 
the end’ : ‘using special procedure in the shop’, sledilec ‘person 
following the track’ : ‘following the ideas, ideology; internet 
follower’, sodelo ‘cooperation in general’ : ‘a special type of 
cooperation – co-working’, vplivnež ‘influential in general’ : 
‘influential in social media; influencer’;
c.  Types and frequency of the synonyms, especially in the 
relationship loanwords vs derivatives from the non-loanwords 
(hashtag : ključnik, influencer : vplivnež, selfie : sebek);
5 further discussion and conclusions
The analysis was concerned with types of data and their use in the 
Growing Dictionary of the Slovenian Language (2014-), especially in 
regard to collecting potential neologisms (often called ‘neologism 
candidates’) and processing them in available corpora material 
and also beyond – in web texts, particularly when corpora analysis 
produces no results. In collecting and processing potential neologisms 
three complementary approaches, used in the Growing Dictionary 
of the Slovenian Language, were presented: 
(a) straightforward data comparison (the words not present in 
latest editions of explanatory dictionaries but present in the 
latest version of the corpora); 
142 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
(b) temporal corpus analysis (the words with the peak of occurrences 
in the last years in each subsequent version of the corpus); 
(c) neologisms from the users’ point of view (propositions submitted 
by users at the dictionary portal Fran).
The study has shown that the inclusion of users (even when initially 
viewed as a part of user inclusion policy rather than a way to 
obtain meaningful data) is an important part of methodology. 
Users’ propositions can draw attention to neologisms or other 
meaningful phenomena that could remain nearly or completely 
undetected relying solely on (especially non-updated) corpus 
data – even that of a reference corpus such as Gigafida. Corpus 
use is still indispensable, particularly in general data processing: 
it represents the most systematic and statistically reliable way of 
analysis. A methodology of an individual dictionary should define 
the role and share of the corpus data according to the intended 
goals – and it should be dynamic rather than static.
As the growing type of dictionary has become commonplace in 
Slovenian lexicography in the last couple of years,13 users are 
increasingly included in the compiling process in one way or another: 
usually via suggesting new entries, additions or corrections (this is 
the type of propositions the portal Fran encourages), sometimes 
also in the editing process itself (for Collocations Dictionary of Modern 
Slovene , ). Our study – based on the data 
obtained from the Growing Dictionary of the Slovenian Language, 
which from 2016 onwards heavily relies on users’ propositions, 
and non-/scarcely updated corpus Gigafida – suggests dictionaries 
of (predominantly) neologisms in particular should try to provide 
a steady inflow of user’s propositions (preferably in standardized 
electronic form allowing for easy processability and trackability) in 
13 Apart from the Growing Dictionary of the Slovenian Language there are eSSKJ 
(2016-), ePravopis (2014-) as well as the Collocations Dictionary of Modern Slovene 
(2018-) and Thesaurus of Modern Slovene (2018-). All of them are corpus-based or 
corpus-driven and use semi- (mainly in form of word sketches) or fully automated 
corpus data processing.
Domen Krvina, The Growing Dictionary of the Slovenian languange 143
about 2-3 years after the start of compiling. This is especially true, 
if regularly updated corpora are not available. Such propositions 
can include those obtained from advanced users, such as other 
researchers, especially those working on material of a general 
explanatory dictionary. The candidates will most likely be the 
ones on the fringes of general lexis,14 low in frequency (≤ 50) and 
possibly with occurrences mostly in the last years covered by the 
reference corpus – thus relying on the corpus data alone might 
not be able to provide a sufficient result.
 Our study has also shown that the role a corpus, especially if it 
is not regularly updated, could play in detecting and processing 
neologisms may be well dependant on its ‘age’ – it seems to be 
much more efficient in a period not exceeding 3 years since the 
completion of the work on the corpus. After that, the corpus data 
alone, if not regularly updated (and even then, since a major 
overhaul of corpus data is rarely possible; cf. the case of Gigafida 
1.0 vs 2.0), becomes less dependable, at least according to our 
study. The combined strategy – such as uniting approaches (a)-(c) 
in the Growing Dictionary of the Slovenian Language from 2015 
onwards – can often be the most effective solution for a satisfactory 
degree of responsiveness. It also enables the advantages of one 
approach to mitigate the shortcomings of another. If a new or if 
old, completely overhauled, appropriate corpus appears, its share 
in detection and processing of the candidates is expected to rise 
according to its content – more so in specialised neologism corpora, 
such as (the first for the Slovenian, that is) recently announced 
SLED, whose role in description of potential neologisms only 
further research will be able to evaluate. For the peak of candidates’ 
occurrences, the last years covered in the corpus are always preferred 
(with data noise excluded).
14 The newest terminological lexis is also regularly monitored and Slovenian 
equivalents of foreign terms are suggested/evaluated, especially within the framework 
of Terminological counselling, which Fran Ramovš Institute of Slovenian Language 
provides as well. Due to a large quantity of specialized lexis entering general lexis 
via the process of determinologization, the results of Terminological counselling 
activity often prove valuable for the Growing Dictionary of the Slovenian Language.
144 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
Apart from using the most appropriate available corpora, the 
scope of analysis should be widened by including web texts and 
taking into account all the available statistics. To further increase 
chances of detection of possible candidates, a strategy of targeted 
reading of both corpora and web texts seems viable. Searching 
parameters, enabling also a potential (semi-)automated search, 
should be defined in advance: thematic fields (those standing out 
in the first years of compiling may serve as suggestion, others are 
not excluded) in general, text types, time interval (recent years) 
and derivative of frequency of a candidate’s occurrence within it etc.
To sum up: the responsiveness of a (contemporary) dictionary should 
rest on a wide array of available data and approaches and try to 
combine them into an effective single whole, mainly by allowing for 
a flexibility in ratio of the individual approaches’ shares according 
to a situation at hand. This is especially true of the dictionaries of 
neologisms since even at the beginning of their compilation most 
available corpora with content wide enough are likely to be somewhat 
old. Thus, the corpora efficiency in detecting and – due to the low 
frequencies – also processing neologisms will diminish over time. 
The share of users’ propositions should preferably remain fairly high 
throughout the length of the process. For this to be possible, a kind of 
reference point should be made and maintained – such as the well-
rounded portal Fran, the site of the Society for Danish Language and 
Literature or the site of the Wielki słownik języka polskiego, to name 
only a few. Targeted reading, complemented by possible development 
and implementation of (semi-)automated search and extraction of 
potential neologisms candidates – using parameters set in advance 
according to the already processed data (in the database of existing 
entries) –, should also not be neglected. Alongside recent and future 
developments in automated targeted search – despite a number of 
drawbacks (Kerremans et al. 2012), (Slána 2017), (Waszink 2019) – it 
may well become one of the most important sources of candidates.
As for the research on the characteristics of the neologisms, for the 
Slovenian only a brief overview is given at the end of the study. Even 
the preliminary research, based on the entries in the database of the 
Domen Krvina, The Growing Dictionary of the Slovenian languange 145
Growing Dictionary of the Slovenian Language, has revealed a number 
of subjects on various linguistic levels (phonetics, word-formation 
and semantics in particular) that are worth further, comprehensive 
research. Some subjects are shared between (related) languages: in 
Slavic languages for instance types of nouns in which the accusative 
takes (also) animate forms; types of words which can be written as 
one word, in which case they can be interpreted as a compound, or 
separately, in which case a multi-word units arise. For more effective 
research, it is preferable for the (dictionary) database to be structured 
in a standard processable format (e. g. TEI) – if the (dictionary and 
corpora) databases of different languages are to be compared.
All the above considered, the study of types of data and their use in 
the Growing Dictionary of the Slovenian Language has proven to be a 
worthwhile subject of study both in regard to detecting, collecting, 
processing (checking in corpora and beyond) and describing 
neologisms and examining their characteristics on various levels of 
linguistic description. Further research should focus on generalising 
the findings tied to the data of the Growing Dictionary of the Slovenian 
Language presented in this paper by comparing them to other 
(especially Slavic) languages in a wider scope and in greater detail.
references
dictionaries
Krvina, Domen (ed.). Sprotni slovar slovenskega jezika 2014– [Growing 
dictionary of the Slovenian Language 2014–]. Available at: https://www.
fran.si/132/sprotni-sprotni-slovar-slovenskega-jezika.
Slovar novejšega besedja slovenskega jezika [Dictionary of New Slovenian 
Words]. 2013. Available at: https://www.fran.si/131/snb-slovar-
novejsega-besedja.
Slovar slovenskega knjižnega jezika [Dictionary of the Slovenian Standard 
Language]. Available at: https://www.fran.si/130/sskj-slovar-
slovenskega-knjiznega-jezika.
Slovar slovenskega knjižnega jezika, druga, dopolnjena in deloma prenovljena 
izdaja [Dictionary of the Slovenian Standard Language, 2nd Edition]. 
2014. Available at: https://www.fran.si/133/sskj2-slovar-slovenskega-
knjiznega-jezika-2.
146 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
eSSKJ: Slovar slovenskega knjižnega jezika 2016– [eSSKJ: Dictionary of the 
Slovenian Standard Language, 3rd Edition]. Available at: https://www.
fran.si/201/esskj-slovar-slovenskega-knjiznega-jezika.
ePravopis: Slovar slovenskega pravopisa 2014– [ePravopis – Slovenian 
Normative Guide]. Available at: https://www.fran.si/135/epravopis-
slovenski-pravopis.
Furlan, M. (ed.). Novi etimološki slovar slovenskega jezika 2017– [New 
Etymological Dictionary of Slovenian Language]. Available at: https://
www.fran.si/207/nessj-novi-etimoloski-slovar-slovenskega-jezika.
Kolokacije 1.0: Kolokacijski slovar sodobne slovenščine [Collocations Dictionary 
of Slovene]. Available at: https://viri.cjvt.si/kolokacije.
Sopomenke 1.0: Slovar sopomenk sodobne slovenščine [Thesaurus of Modern 
Slovene]. Available at: https://viri.cjvt.si/sopomenke.
Society for Danish Language and Literature. Available at: https://dsl.dk/.
Wielki słownik języka polskiego. Available at: https://www.wsjp.pl/.
corpora
Gigafida 1.0. Available at: http://www.gigafida.net/.
Gigafida 2.0: Korpus pisne standardne slovenščine. Available at: viri.cjvt.si/
gigafida.
Janes. Available at: https://www.clarin.si/noske/run.cgi/corp_
info?corpname=janes.
slWaC. Available at: https://www.clarin.si/noske/run.cgi/corp_
info?corpname=slwac.
KAS. Available at: https://www.clarin.si/noske/run.cgi/corp_
info?corpname=kas.
Language Counselling at ZRC SAZU. Available at: https://svetovalnica.zrc-
sazu.si/.
Sporazumevanje v slovenskem jeziku. Available at: http://www. 
slovenscina.eu/.
other literature
Ahačič, Kozma, Ledinek, Nina, Perdih, Andrej. 2015. Fran: The next 
generation Slovenian dictionary portal. In: K Gajdošová, A. Žáková 
(ed.). Natural language processing, corpus linguistics, lexicography: 
proceedings. Eighth International Conference: Bratislava. 21–22.
Erjavec, Tomaž, Lubešić, Nikola. 2014. The slWaC 2.0 corpus of the Slovene 
web. In: T. Erjavec. J. Žganec Gros (ed.). Jezikovne tehnologije: zbornik 17. 
mednarodne multikonference Informacijska družba. Ljubljana: Institut 
Jožef Stefan. 19–24.
Fišer, Darja (ur.) 2018. Viri, orodja in metode za analizo spletne slovenščine. 
Domen Krvina, The Growing Dictionary of the Slovenian languange 147
Ljubljana: Znanstvena založba Filozofske fakultete. DOI: https://doi.
org/10.4312/9789610600701 
Fišer, Darja, Ljubešić, Nikola. 2018. Tviti kot leksikografski vir za analizo 
pomenskih premikov v slovenščini. Viri, orodja in metode za analizo 
spletne slovenščine. Ljubljana: Znanstvena založba Filozofske fakultete. 
198–226.
Gložančev, Alenka, Jakopin, Primož, Micheliza Mija, Uršič, Lučka, Žele, 
Andreja. 2009. Novejša slovenska leksika (v povezavi s spletnimi 
jezikovnimi viri). Ljubljana: Založba ZRC, ZRC SAZU. 
Gložančev, Alenka. 2012. Novejša slovenska leksika v luči obravnave 
samostalniških zloženk v Slovenskem pravopisu 2001. Pravopisna 
stikanja: razprave o pravopisnih vprašanjih. Ljubljana: Založba ZRC. 
125–139.
ten Hacken, Pius. 2020. Norms, New Words, and Empirical Reality. 
International Journal of Lexicography 33/2. 135–149. DOI: https://doi.
org/10.1093/ijl/ecaa005
ten Hacken, Pius, Koliopoulou, Maria. 2020. Dictionaries, Neologisms, and 
Linguistic Purism. International Journal of Lexicography 33/2. 127–134. 
DOI: https://doi.org/10.1093/ijl/ecaa011
Kerremans, D. Stegmayr S., and Schmid H-J. 2012. The NeoCrawler: 
Identifying and Retrieving Neologisms from the Internet and 
Monitoring Ongoing Change. ’ In Allan, K., Robinson, J. (eds), Current 
methods in historical semantics. De Gruyter Mouton. 59–96.
Klosa-Kückelhaus Annette, Wolfer Sascha. 2020. Considerations on the 
Acceptance of German neologisms from the 1990s. International 
Journal of Lexicography, 33/2:150–167. DOI: https://doi.org/10.1093/ijl/
ecz033
Krek, Simon, Kosem, Iztok, Gantar, Polona. 2013. Predlog za izdelavo 
Slovarja sodobnega slovenskega jezika. Accessed on 1–20 January 2022. 
Avaliable at: http://www.sssj.si/.
Krvina, Domen 2021. Sprotni slovar slovenskega jezika, covid-19 in z njim 
povezano (novejše) besedje. In: S. Ristić, I. Lazić Konjik, N. Ivanović 
(ed.). Lexicography and lexicology in the light of current issues. Beograd: 
Serbian language institute of SASA.
Marello, Carla. 2020. New Words and New Forms of Linguistic Purism in 
the 21st Century: The Italian Debate. International Journal of Lexicography 
33/2. 168–186. DOI: https://doi.org/10.1093/ijl/ecz034
Michelizza, Mija. 2015. Spletna besedila in jezik na spletu. Primer blogov in 
Wikipedije v slovenščini. Ljubljana: Založba ZRC, ZRC SAZU.
Michelizza, Mija, Žagar Karer, Mojca. 2018. Internetna leksika v slovenščini. 
Jezikoslovni zapiski 24/1. 79–92.
Panocová, Renáta. 2020. Attitudes towards Anglicisms in Contemporary 
148 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
Standard Slovak. International Journal of Lexicography 33/2. 187–202. 
DOI: https://doi.org/10.1093/ijl/ecaa006
Perdih, Andrej. 2018. Dictionary portal Fran: current state and future 
developments. In B. Niševa (ed.). Slovanská lexikografie počátkem 
21. století: sborník příspěvků z mezinárodní konference. Vyd. 1. Praha: 
Slovanský ústav AV ČR, 57–65.
Perdih, Andrej. 2020. Portal Fran: od začetkov do danes. Rasprave Instituta 
za hrvatski jezik i jezikoslovlje 46/2. 997–1018.
Perdih Andrej, Ledinek, Nina. 2019. Multi-word Lexical Units in General 
Monolingual Explanatory Dictionaries of Slavic languages. Slovenski 
jezik / Slovene Linguistic Studies 12. 113–134. DOI: https://ojs.zrc-sazu.
si/sjsls/article/view/7629 
Peredrienko, Tat’jana, Istomina, Ekaterina. 2019. Lexical Parallels in the 
Academic Vocabulary of Russian and English. Slavistična revija 67/4. 
605–614. Available at: https://srl.si/ojs/srl/article/view/2019-4-1-5.
Sicherl, Eva. 2019. Določitev spola anglizmov v slovenščini. Slavistična revija 
67/2. 343–352. Avaliable at: https://srl.si/ojs/srl/article/view/2019-2-1-22.
Sinclair, John McHardy. (ed.). 1987. Looking Up: An Account of the COBUILD 
Project in Lexical Computing and the Development of the Collins COBUILD 
English Language Dictionary. Collins ELT.
Slána, Jakub. 2017. K (polo)automatické excerpci neologismů. Jazykovědné 
aktuality 54/3-4. 34–46. Jazykovědné sdružení České republiky.
Stopar, Andrej, Ilc, Gašper 2019. Stilistična (ne)zaznamovanost moških 
in ženskih poimenovalnih parov za poklice v angleščini in slovenščini. 
Slavistična revija, 67/2. 333–342. Avaliable at: https://srl.si/ojs/srl/
article/view/2019-2-1-21.
Štumberger, Saška. 2015a. Besedotvorje novejše slovenske leksike: 
medponskoobrazilne zloženke. Zbornik prispevkov s simpozija Škrabčevi 
dnevi 8 (2013). Nova Gorica: Založba Univerze. 155–163.
Štumberger, Saška. 2015b. Leksikološka opredelitev novejše leksike in 
terminološka raba v slovenskem jezikoslovju. Slavistična revija 63/2. 
249–259. Avaliable at: https://srl.si/ojs/srl/article/view/COBISS_ID-
57985122.
Voršič, Ines. 2015. Tvorjenke s pomenom nosilnika lastnosti v novejšem 
slovenskem besedju. Slavia Centralis 8/1. 119–134.
Waszink, Vivien. 2019. Using Neoloog to detect and describe neologisms 
in online dictionaries. Abstracts_IDS. Instituut voor de Nederlandse Taal. 
Zatorska, Agnieszka. 2016. Czasowniki w nowszej leksyce słoweńskiej. 
Rozprawy komisji językowej 62. 229–239.
Zwitter Vitez Ana, Fišer, Darja. 2018. Govorne prvine v nestandardni 
spletni slovenščini. Viri, orodja in metode za analizo spletne slovenščine. 
Ljubljana: Znanstvena založba Filozofske fakultete. 254–272.
Domen Krvina, The Growing Dictionary of the Slovenian languange 149
Received January 2022, accepted March 2022. 
Prejeto januarja 2022, sprejeto marca 2022.
acKnowledgments
The publication of article was made possible by programme 
Slovenski jezik v sinhronem in diahronem razvoju (P6-0038 (A)), which 
is financially supported by the Slovenian Research agency. 
The author would like to thank Mitja Trojar for language editing 
and advice on terminological issues.
summary
the GrowinG Dictionary of the Slovenian lanGuaGe (2014-) and slovenian 
neologisms: sTudy on Types of daTa and Their use
The article examines types of data and their use in the Growing Dictionary 
of the Slovenian Language, which is integrated into Fran, a well-established 
dictionary portal for dictionaries and other language resources by the 
ZRC SAZU Fran Ramovš Institute of the Slovenian Language. The data in 
question is mainly input in the process of analysing and selecting data for 
dictionary entries; the dictionary is a so-called growing dictionary, which 
means that new entries are published every year. Most entries relate to 
neologisms; less commonly, there are new meanings of existing words. In the 
first year of compiling the dictionary (2014) and for the following few years, it 
was possible to rely on the Gigafida 1.0 corpus, built in 2013 (updated to 2.0 
in 2019), for entry candidates; subsequently, with the “ageing” of the corpus, 
the main role has been assumed by user suggestions. Users can submit 
suggestion directly on the Fran portal (“suggest a new word”), which, with its 
extensiveness, serves as an important point of reference: if users feel that 
something is new, and it cannot be found on the Fran portal and is not an 
archaism, it is most likely a neologism. User suggestions are reviewed in all 
available resources (the Gigafida, Janes, SlWaC, KAS corpora, the web); the 
minimal criterion for inclusion in the dictionary is an adequate occurrence in 
web texts that is diverse enough in terms of sources and temporally recent. 
“Other” suggestions for candidates originate in the lexicographic work on 
other growing dictionaries (especially eSSKJ), in a seminar that is part of 
lexicology and lexicography lectures at the University of Ljubljana, Faculty 
of Arts, and partly in the Institute’s Language Counselling service; these are 
far less numerous than user suggestions. About 50% of the total number of 
150 Slovenski jezik – Slovene Linguistic Studies 14 (2022)
all suggestions are included in the dictionary every year (those not included 
are re-analysed the following year); in 2020–2021, this share rose to nearly 
70% through the appearance of COVID-related words. One of the main 
highlights of this analysis is that user engagement during the compilation 
of the dictionary is extremely important. Dictionaries consisting mostly of 
neologisms (new words), in particular, cannot rely only on corpus materials 
in detecting potential candidates for inclusion; if the corpus in question 
is a general (reference) one, it is outdated fairly quickly when it comes to 
neologisms. With Gigafida 1.0, which, put in comparison with SSKJ2 and 
SNB, was a major starting source for the Growing Dictionary of the Slovenian 
Language, the share of yearly entries that do not appear (or hardly appear) in 
the corpus (f ≤ 8) rose from 0% in 2014 all the way to 66% in 2021 (exceeding 
50% since 2018). The update of the corpus (2.0) in 2019 has improved the 
situation to some extent (instead of total absence, there is f ≤ 8 presence), 
but not dramatically. A corpus that is being made within the SLED project is 
expected to be significantly more useful, and our analysis shows this work 
is justified. In terms of future research following up on the analysis in this 
article, it seems sensible to generalise the findings relating to the Growing 
Dictionary of the Slovenian Language in comparison with similar analyses of 
(dictionaries of) new words, especially in other Slavic languages.
Sprotni Slovar SlovenSkeGa jezika (2014–) in slovensKo novejše besedje: analiza 
tipov podatKov in njihove uporabe
Prispevek obravnava tipe podatkov in njihovo uporabo v Sprotnem slovarju 
slovenskega jezika, ki je integriran v uveljavljen slovarsko-jezikovni portal Fran 
Inštituta za slovenski jezik Frana Ramovša ZRC SAZU. Gre zlasti za vhodne 
podatke v procesu analize in izbora podatkov za slovarske iztočnice; slovar 
je t. i. rastoči slovar, kar pomeni, da se nove iztočnice objavljajo vsako leto. 
Med iztočnicami prevladujejo neologizmi, v manjši meri novi pomeni že 
obstoječih besed. V prvem letu nastajanja (2014) in še nekaj naslednjih se 
je bilo mogoče pri kandidatih za iztočnice nasloniti na leta 2013 končani 
korpus Gigafida 1.0 (posodobitev 2.0 2019), pozneje pa so s »staranjem« 
korpusa glavno vlogo prevzeli predlogi uporabnikov. Uporabniki predloge 
oddajajo neposredno na portalu Fran (»predlagaj novo besedo«), ki s svojo 
obsežnostjo služi kot pomembna primerjalna točka: kar uporabniki čutijo kot 
novo, pa tega ni na portalu Fran in ni arhaizem, je precej verjetno neologizem. 
Uporabniški predlogi so pregledani v vseh virih, ki so na voljo (korpusi 
Gigafida, Janes, SlWaC, KAS, splet), pri čemer minimum za uslovarjenje 
predstavlja zadostna, po virih dovolj pestra in časovno novejša pojavnost 
v spletnih besedilih. »Drugi« predlogi za kandidate prihajajo iz slovarskega 
dela za preostale rastoče slovarje (zlasti eSSKJ), seminarja v okviru predavanj 
Domen Krvina, The Growing Dictionary of the Slovenian languange 151
iz leksikologije in leksikografije na FF UL, delno tudi iz inštitutske Jezikovne 
svetovalnice; po številu jih je precej manj kot predlogov uporabnikov, skupaj 
pa tvorijo vsoto vseh predlogov. Letno je v povprečju uslovarjenih okoli 50 % 
te vsote (neuslovarjeni predlogi so znova analizirani naslednje leto); v letih 
2020–2021 se je ta delež povzpel proti 70 %, k čemur je prispeval pojav 
koronabesedja. Kot enega glavnih poudarkov analize lahko izpostavimo, da je 
angažiranje uporabnikov v procesu nastajanja slovarja izjemno pomembno. 
Zlasti slovarji pretežno neologizmov (»novejšega besedja«) se, posebej 
pri zaznavi potencialnih kandidatov za uslovarjenje, ne morejo naslanjati 
zgolj na korpusno gradivo; če gre za splošni (referenčni) korpus, ta z vidika 
neologizmov precej hitro zastari. V primeru Gigafide 1.0, ki je pri Sprotnem 
slovarju slovenskega jezika ob sopostavitvi s SSKJ2 in SNB predstavljala 
pomemben izhodiščni vir, se je delež vsakoletnih iztočnic, ki v korpusu 
(skoraj) niso prisotne (f ≤ 8), od 0 % leta 2014 povečal vse do 66 % leta 2021 
(in presegal 50 % od leta 2018 dalje). Posodobitev korpusa (2.0) v letu 2019 
je stanje nekoliko izboljšala (namesto polne neprisotnosti prisotnost pod f 
≤ 8), vendar ne izrazito. Precej večjo uporabnost je pričakovati od korpusa v 
okviru projekta SLED, na utemeljenost katerega kaže tudi naša analiza. Kar 
se tiče prihodnjih raziskav kot nadgradnje analize v tem prispevku, se zdi 
smiselno ugotovitve, vezane na Sprotni slovar slovenskega jezika, posplošiti 
ob primerjavi s podobnimi analizami (slovarjev) novejšega besedja, zlasti v 
drugih slovanskih jezikih.