{"?xml":{"@version":"1.0"},"edm:RDF":{"@xmlns:dc":"http://purl.org/dc/elements/1.1/","@xmlns:edm":"http://www.europeana.eu/schemas/edm/","@xmlns:wgs84_pos":"http://www.w3.org/2003/01/geo/wgs84_pos","@xmlns:foaf":"http://xmlns.com/foaf/0.1/","@xmlns:rdaGr2":"http://rdvocab.info/ElementsGr2","@xmlns:oai":"http://www.openarchives.org/OAI/2.0/","@xmlns:owl":"http://www.w3.org/2002/07/owl#","@xmlns:rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#","@xmlns:ore":"http://www.openarchives.org/ore/terms/","@xmlns:skos":"http://www.w3.org/2004/02/skos/core#","@xmlns:dcterms":"http://purl.org/dc/terms/","edm:WebResource":[{"@rdf:about":"http://www.dlib.si/stream/URN:NBN:SI:doc-1YZLGG4C/e45db025-d74e-4ae8-8fe3-1843400515f8/HTML","dcterms:extent":"43 KB"},{"@rdf:about":"http://www.dlib.si/stream/URN:NBN:SI:doc-1YZLGG4C/54391296-9f93-4339-a179-18459ef8d4d1/PDF","dcterms:extent":"133 KB"},{"@rdf:about":"http://www.dlib.si/stream/URN:NBN:SI:doc-1YZLGG4C/c6cdac9d-ea93-4c61-8b36-e0f3ecac7402/TEXT","dcterms:extent":"40 KB"},{"@rdf:about":"http://www.dlib.si/stream/URN:NBN:SI:doc-1YZLGG4C/c8afc109-18c8-4ee5-b106-29c0887c36df/WEB","dcterms:extent":"0 KB"}],"edm:TimeSpan":{"@rdf:about":"1957-2025","edm:begin":{"@xml:lang":"en","#text":"1957"},"edm:end":{"@xml:lang":"en","#text":"2025"}},"edm:ProvidedCHO":{"@rdf:about":"URN:NBN:SI:doc-1YZLGG4C","dcterms:isPartOf":[{"@rdf:resource":"https://www.dlib.si/details/URN:NBN:SI:spr-L1LNKFLF"},{"@xml:lang":"sl","#text":"Knjižnica"}],"dcterms:issued":"2012","dc:creator":"Erjavec, Tomaž","dc:format":[{"@xml:lang":"sl","#text":"številka:3"},{"@xml:lang":"sl","#text":"letnik:56"},{"@xml:lang":"sl","#text":"str. 205-221"}],"dc:identifier":["ISSN:0023-2424","COBISSID:264864256","URN:URN:NBN:SI:doc-1YZLGG4C"],"dc:language":"sl","dc:publisher":{"@xml:lang":"sl","#text":"Zveza bibliotekarskih društev Slovenije"},"dc:subject":[{"@xml:lang":"en","#text":"cultural heritage"},{"@xml:lang":"sl","#text":"digitalizacija"},{"@xml:lang":"en","#text":"digitization"},{"@xml:lang":"sl","#text":"dokumentiranje"},{"@xml:lang":"sl","#text":"kulturna dediščina"},{"@xml:lang":"sl","#text":"leksikologija"},{"@xml:lang":"en","#text":"lexicology"},{"@xml:lang":"sl","#text":"slovarji"},{"@xml:lang":"sl","#text":"starejša slovenščina"},{"@rdf:resource":"http://www.wikidata.org/entity/Q843958"}],"dcterms:temporal":{"@rdf:resource":"1957-2025"},"dc:title":{"@xml:lang":"sl","#text":"Jezikoslovni viri starejše slovenščine| Historical Slovenian language resources|"},"dc:description":[{"@xml:lang":"sl","#text":"The paper presents three language resources enabling better full-text access to digitised printed historical Slovenian texts: a hand-annotated corpus, a hand-annotated lexikon of historical words and a collection of transcribed texts. The aim of the resources is twofold: on one hand they support empirical linguistic research (corpus, collection) and represent a reference tool for the research of historical Slovenian (lexikon) while on the other hand they may serve as training data for the development of Human Language Technologies enabling better full-text search in digital libraries containing Slovenian written cultural heritage, modernisation of historical texts, and the development of better technological solutions for text recognition and scanning. The hand annotated corpus of historical Slovenian contains the text from 1,000 pages sampled from the years 1750 to 1900, two texts date to the end of the 16th or 17th century. The corpus contains a little more than 250,000 word token; each of them being annotated with hand validated linguistic features: modernised form, lemma or base form, and morhpo-syntactic description. Thus the word token \"ajfram\"is annotated with the normalised form \"ajfrom\", by the lemma \"ajfer\" and morphosyntactic description \"Som\" or \"Samostalnik\" (noun), \"občni\" (common), \"moški\" (maskuline) and a modernised form \"gorečnost\" (fervour). At first the corpus was anotated automatically and then manually verified and corrected. The lexikon was created automatically from the hand-annotated corpus. It contains only attested word-forms and examples of use. The word-forms are orderer under their modern equivalents. All the modern forms of a particular word constitute a dictionary entry, defined by its lemma with conjoint information i.e. the morpho-syntactic description and the closest contemporary synonyms. Thus the entry \"ajfer/Som/gorečnost\"is annotated by two modernised words \"ajfra\" and \"ajfrom\"and their archaic forms \"ajfram\" and \"aifram\" and by attestattion: \"... shaz noi frihtei tu shebranje karbo sdei udrukono is velzhim aifram noi is flisam inu is andohtjo 3 vezhiere saporedama ...\" (Tapravi inu tazieli Colemone-Shegen, 1800, p. 183). At present, the lexicon contains over 25,000 entries (including modern words in archaic texts), 50,000 word-forms and 70,000 archaic forms. The third resource is represented by an extensive collection of digitised texts similar to the corpus. The difference is that the words are annotated automatically by a tool developed to process historical Slovenian text named ToTrTaLe. The tool implements a pipeline, where it first tokenises the text and then attempts to transcribe the archaic words to their modern day equivalents. Then, the text is tagged and lemmatised using the models for modern Slovenian language. It contains about 5 million words of hand-corrected transcriptions from the following digitised texts: Slovenian books and editions of the newspaper \"Kmetijske in rokodelske novice\", digitised by the National University Library (NUK) in the frame of the EU project IMPACT (5000 pages), digital library AHLib, comprising Slovenian books translated from German (100 books), a selection of Slovenian books. All three resources (corpus, lexikon, collection) are encoded according to the Text Encoding Initiative Guidelines TEI P5, which enable the definition of XML schemas for encoding texts for scholarly purposes. The home page of the project at http://nl.ijs.si/imp/ enables access to the resources. The collection and the lexikon are available for on-line browsing, the corpus and the automatically annotated collection for linguistics searches via a concordancer, while all the resources can be also downloaded in their source XML form under the Creative Commons Attribution Licence. In future we expect to extend the resources, however, even their present scope is sufficient for corpus based diachronic studies of historical Slovenian language and for developing useful language technology tools for processing cultural heritage texts"},{"@xml:lang":"sl","#text":"V prispevku so predstavljeni trije jezikovni viri starejšega slovenskega jezika: zbirka besedil oz. digitalna knjižnica, referenčni jezikoslovno označeni korpus in slovar iz. besedišče. Zbirka besedil vsebuje 158 del, večinoma knjig z redigirano transkripcijo besedila in faksimili, skupaj nekaj več kot 13.000 strani. Korpus sestavlja 1000 strani, vzorčenih iz te zbirke, kjer je vsaki besedni pojavnici pripisana ročno pregledana sodobna ustreznica besedne oblike, njena lema in leksikalna oblikoskladenjska oznaka. Slovar je bil zajet iz razširjenega ročno pregledanega korpusa in ima 25.000 gesel, ki vsebujejo sodobne ustreznice in korpusno atestirane besedne oblike. Vsi trije viri so zapisani skladno s smernicami za zapis besedil TEI (Text Encoding Initiative Guidelines) in dostopni na spletu za pregledovanje in preiskovanje, kot tudi za prenos pod licenco Creative Commons - priznanje avtorstva. Namen virov je po eni strani omogočiti empirično podprte diahrone jezikoslovne raziskave in približati starejša besedila in leksiko sodobnemu bralcu, po drugi pa ti predstavljajo podatkovno infrastrukturo za razvoj jezikovnih tehnologij, ki lahko npr. omogočajo iskanje po polnem besedilu pisne kulturne dediščine. Zbirka besedil, korpusov in slovar so dostopni na http://nl.ijs.si/imp/"}],"edm:type":"TEXT","dc:type":[{"@xml:lang":"sl","#text":"znanstveno časopisje"},{"@xml:lang":"en","#text":"journals"},{"@rdf:resource":"http://www.wikidata.org/entity/Q361785"}]},"ore:Aggregation":{"@rdf:about":"http://www.dlib.si/?URN=URN:NBN:SI:doc-1YZLGG4C","edm:aggregatedCHO":{"@rdf:resource":"URN:NBN:SI:doc-1YZLGG4C"},"edm:isShownBy":{"@rdf:resource":"http://www.dlib.si/stream/URN:NBN:SI:doc-1YZLGG4C/54391296-9f93-4339-a179-18459ef8d4d1/PDF"},"edm:rights":{"@rdf:resource":"http://rightsstatements.org/vocab/InC/1.0/"},"edm:provider":"Slovenian National E-content Aggregator","edm:intermediateProvider":{"@xml:lang":"en","#text":"National and University Library of Slovenia"},"edm:dataProvider":{"@xml:lang":"sl","#text":"Zveza bibliotekarskih društev Slovenije"},"edm:object":{"@rdf:resource":"http://www.dlib.si/streamdb/URN:NBN:SI:doc-1YZLGG4C/maxi/edm"},"edm:isShownAt":{"@rdf:resource":"http://www.dlib.si/details/URN:NBN:SI:doc-1YZLGG4C"}}}}