Chapter 34 Language Report Slovenian Simon Krek Abstract Around2.5millionpeoplearoundtheworldspeakorunderstandSlovene, withthevastmajorityofthemlivingintheRepublicofSloveniawhereitistheoffi­cial language. The constitution grants the right to use their mother tongue to Italian and Hungarian minorities in certain municipalities. In terms of Language Technol­ogy, the Slovene CLARIN.SI consortium plays the key role in the community; all major Slovene institutions involved in the development of LT resources, tools and services are members of the consortium. In contrast, the number of private com­panies in Slovenia specialising in LT for Slovene remains low, and most of the LT productscomeeitherfromthe(Slovene)academicspherevianationalorEUfunding, orfromthe biginternational IT companies that covera largenumber of languages. 1 The Slovenian Language Slovene is a member of the South Slavic language family and is spoken mainly in Slovenia and the neighbouring areas in Italy, Austria, Hungary and Croatia. In the nationalcensusof 2002, the lastone that recordedthenumberof nativespeakersof differentlanguages,87.8%ofthepopulation–ofatotalofjustunder2millionatthe time–declared Slovene to betheirmothertongue, with another3.3%claimingthat they use Slovene as the language of their everyday communication at home, which amountsto91.1%ofthepopulationusingSloveneastheirfirstlanguage.Thisnum­ber puts Slovenia in the group of EU states with the most homogeneous linguistic situation. Among other linguistic groups, native speakers of languages of the for­mer Yugoslavia were the largest in 2002, with 3.3% of them using a combination of Slovene and their mother tongue for everyday communication, and another 1% using only their mother tongue: Bosnian, Croatian, Serbian or Montenegrin. Other smallercommunities included speakers ofAlbanian,Macedonian and Romani. Slovene is the official language in the Republic of Slovenia. The constitution grants the right to use their mother tongue to the two minorities declaring that “in SimonKrek Jožef StefanInstitute, Slovenia, simon.krek@ijs.si © The Author(s) 2023 G. Rehm,A. Way (eds.), European Language Equality, CognitiveTechnologies, https://doi.org/10.1007/978-3-031-28819-7_34 those municipalities where Italian or Hungarian national communities reside,” Ital-ianorHungarianarealsoofficiallanguages.In2002,itwasrecordedthatHungarian isthe mother tongueof0.4% of the population, andItalianof0.2%. According tolegislation inSlovenia,all educationand teaching provided as part of the current state curriculum, from preschool through to university level, must be in Slovene. In preschool, primary and secondary education, Italian is used in the schools of the Italian minority community, while Hungarian and Slovene are used in bilingual schools where the Hungarian minority is found. Special arrangements exist for children whose mother tongue is not Slovene, for the education of Roma children, children of foreign citizens and children of people without citizenship. 2 TechnologiesandResourcesforSlovenian A useful place to discover Slovene corpora are the CLARIN.SI NoSketch Engine1 and KonText2 concordancers.3 At the time of writing, there are 76 corpora of vary­ing sizes containing Slovene data in the repository, and 59 corpora in the concor-dancers. Most of them are available for download under open licences. The more important families of corpora cover general written standard language (Gigafida), Slovene Web and social media (slWaC, Janes), academic discourse (KAS), parlia­mentary transcriptions (siParl, ParlaMint), Slovene Wikipedia (CLASSLAWiki-sl), historical texts (IMP), literature (MAKS, ELTeC-slv), specialised domains (KoRP,DSI, Konji, etc.), and school essays (Šolar, SBSJ). There are also various manually annotated training and evaluation corpora available (ssj500k, etc.). The GOS (GOvorjena Slovenščina, Spoken Slovene) family of corpora contains transcriptions of spoken Slovene. The original GOS includes about 120 hours of transcriptsfromvarioussituations:radioandTVshows,schoollessonsandlectures, private conversations between friends or within the family, work meetings, consul-tations,conversations in buyingand selling situations, etc. Intermsofparalleldata,Slovenehasbenefitedfromitsstatusasoneoftheofficial EU languages since 2004 and is included in the standard multilingual parallel data setsproducedeitherbyEUinstitutions(JRC-Acquis,DGT-Acquis,DCEP,DGT-TM, EAC-TM, ECDC-TM, JRC-Names) or by EU-funded or other projects (INTERA, WIT3, ParaCrawl, CommonCrawl, OpenSubtitles etc.), which are available either from OPUS or from repositories such as ELG. Two TM corpora produced by the Secretariat-General of the Slovene government were made available in the context oftheELRCproject andare uploaded in the ELRC-SHARE repository. There are 82 lexical/conceptual resources with Slovene data in the CLARIN.SI repository available under open access licences. Those that deserve special men­tion due to their size or importance are: Sloleks – morphological lexicon contain­ 1 https://clarin.si/noske/ 2 https://clarin.si/kontext/corpora/corplist 3 https://clarin.si/info/about/ ing around 100,000 most frequent Slovene lemmas, their inflected or derivative word forms (2.7M) and the corresponding grammatical description; sloWNet is the SloveneWordNetdevelopedintheexpandapproach:itcontainsthecompletePrince­tonWordNet3.0andover70,000Sloveneliterals;DictionaryoftheSlovenianNor­mative Guide is a normative orthographic dictionary of Slovene standard language. Itcontains140,266lemmasandsublemmasin92,617entries;ThesaurusofModern SloveneisanautomaticallycreatedthesaurusfromSlovenedataavailableinacom­prehensive English–Slovene dictionary, a monolingual dictionary, and a corpus. It contains 105,473entriesand 368,117 synonym pairs. Intermsoflanguagemodels,themostrecentoneistheSloveneRoBERTamodel. The corpora used for training the model contain 3.47 billion tokens in total. The subwordvocabularycontains32,000tokens.4Multilingualmodelsarealsoavailable, e.g., a trilingualBERTmodel,trained on Croatian, Slovene, and English data.5 ThestandardandmostaccuratetextprocessingtoolforSloveneistheCLASSLA forkoftheStanzapipeline.6Itsupportsprocessingofbothstandardandnon-standard Slovene at the level of tokenisation and sentence segmentation, part-of-speech tag-ging,lemmatisation,dependency parsing and named entity recognition. There are some Slovene LT companies that develop speech-to-text and text-to-speech tools.7 Slovene is also available in speech technology services offered by large enterprises such as Microsoft and Google, as well as by other companies spe­cialisinginspeechtechnology.8Thesesolutionshavealsofoundtheirwayintosome specialiseddevicescoveringmanylanguages.9 AttheUniversityofLjubljana,asys­temhasbeendevelopedforautomaticallytranslatinglecturesfromSlovenetoother languages inreal time, in thecontext ofthe OnlineNotes project.10 Machine translation services for Slovene are available through more or less the samestakeholders:some Slovene LTcompanies,11 thelargeenterprisessuch as Mi-crosoftandGoogle,andsomeotherinternationalcompaniesspecialisinginmachine translationtechnologyorgeneraltranslationservices.12 AsanofficialEUlanguage, SloveneisincludedintheeTranslationserviceofferedbytheEuropeanCommission. ThebiggestinvestmentinLTforSloveneistheDevelopmentofSloveneinDigi-talEnvironmentprojectfinancedbytheSloveneMinistryofCulturebetween2020­2023.13 The project will significantly upgrade existing LT resources, tools and ser­vices, or produce many of those that do not exist yet. The results of the project are 4 http://hdl.handle.net/11356/1397 5 http://hdl.handle.net/11356/1330 6 https://github.com/clarinsi/classla, https://pypi.org/project/classla/ 7 Amebis,Alpineon:eBralec, https://ebralec.si; Vitasis: Truebar, https://vitasis.si 8 NEWTON Technologies, https://www.newtontech.net;Sonix: https://sonix.ai 9 Pocketalk:https://europe.pocketalk.com/languages-countries/ 10 https://www.cjvt.si/en/infrastructure-support/tolmac/ 11 Vitasis: Truebar, https://vitasis.si;Aikwit, https://aikwit.com;Taia, https://taia.io 12 DeepL Translate, https://www.deepl.com; Pangeanic, https://pangeanic.com/languages/sloven ian-translation-services/, etc. 13 Razvoj slovenščine v digitalnemokolju(RSDO): https://www.slovenscina.eu expected to be published on the CLARIN.SI and GitHub repositories in November 2022 and February 2023. 3 RecommendationsandNextSteps Ingeneral,onecanconcludethat1.thesupportforSloveneiscomparablewithother languages with a similar status (Krek 2022, 2012), 2. there is a general awareness ingovernmentalbodiesthatLTforSloveneshouldbesupportedinthefuture,3.the LT communityisgrowing,alsothrough new educational initiatives suchas the MA study of Digital Linguistics (Faculty of Arts, University of Ljubljana), and 4. there isinfrastructuralsupport,mainlythroughtheCLARIN.SIinfrastructureattheJožef Stefan Institute, which also covers all other stakeholders through the CLARIN.SI consortium. However, more efforts are needed in the future to bring the existing support closer to those available for other (official EU) languages. References Krek, Simon (2012). Slovenski jezik v digitalni dobi – The Slovene Language in the Digital Age. META-NET White Paper Series: Europe’s Languages in the Digital Age. Heidelberg etc.: Springer. http://www.meta-net.eu/whitepapers/volumes/slovene. Krek, Simon (2022). Deliverable D1.31 Report on the Slovenian Language. European Language Equality (ELE); EU project no. LC-01641480 – 101018166. https://european-language-equali ty.eu/reports/language-report-slovenian.pdf. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation,distributionandreproductioninanymediumorformat,aslongasyougiveappropriate credittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommons licenseand indicateif changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutoryregulationorexceedsthepermitteduse,youwillneedtoobtainpermissiondirectlyfrom thecopyrightholder.