<?xml version="1.0"?><rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:edm="http://www.europeana.eu/schemas/edm/" xmlns:wgs84_pos="http://www.w3.org/2003/01/geo/wgs84_pos" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdaGr2="http://rdvocab.info/ElementsGr2" xmlns:oai="http://www.openarchives.org/OAI/2.0/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ore="http://www.openarchives.org/ore/terms/" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dcterms="http://purl.org/dc/terms/"><edm:WebResource rdf:about="http://www.dlib.si/stream/URN:NBN:SI:doc-Y5LLQX89/ff2de3d2-8455-4223-b6b6-01685824214a/PDF"><dcterms:extent>439 KB</dcterms:extent></edm:WebResource><edm:WebResource rdf:about="http://www.dlib.si/stream/URN:NBN:SI:doc-Y5LLQX89/4948bd93-f2c8-45d9-96e6-b87a5734bb5d/TEXT"><dcterms:extent>57 KB</dcterms:extent></edm:WebResource><edm:TimeSpan rdf:about="2013-2025"><edm:begin xml:lang="en">2013</edm:begin><edm:end xml:lang="en">2025</edm:end></edm:TimeSpan><edm:ProvidedCHO rdf:about="URN:NBN:SI:doc-Y5LLQX89"><dcterms:isPartOf rdf:resource="https://www.dlib.si/details/URN:NBN:SI:spr-BR18JCH2" /><dcterms:issued>2016</dcterms:issued><dc:creator>Ljubešić, Nikola</dc:creator><dc:creator>Miličević, Maja</dc:creator><dc:format xml:lang="sl">številka:2</dc:format><dc:format xml:lang="sl">letnik:4</dc:format><dc:format xml:lang="sl">str. 156-188</dc:format><dc:identifier>ISSN:2335-2736</dc:identifier><dc:identifier>COBISSID:62290530</dc:identifier><dc:identifier>URN:URN:NBN:SI:doc-Y5LLQX89</dc:identifier><dc:language>en</dc:language><dc:publisher xml:lang="sl">Trojina, zavod za uporabno slovenistiko</dc:publisher><dcterms:isPartOf xml:lang="sl">Slovenščina 2.0</dcterms:isPartOf><dc:subject xml:lang="en">Croatian</dc:subject><dc:subject xml:lang="sl">hrvaščina</dc:subject><dc:subject xml:lang="sl">korpusi (jezikoslovje)</dc:subject><dc:subject xml:lang="sl">računalniško posredovana komunikacija</dc:subject><dc:subject xml:lang="sl">srbščina</dc:subject><dc:subject xml:lang="sl">Twitter (družabno omrežje)</dc:subject><dc:subject rdf:resource="http://www.wikidata.org/entity/Q9299" /><dcterms:temporal rdf:resource="2013-2025" /><dc:title xml:lang="sl">Tviterasi, tviteraši or twitteraši?| producing and analysing a normalised dataset of Croatian and Serbian tweets|</dc:title><dc:description xml:lang="sl">In this paper we discuss the parallel manual normalisation of samples extracted from Croatian and Serbian Twitter corpora. We describe the datasets, outline the unified guidelines provided to annotators, and present a series of analyses of standard-to-non-standard transformations found in the Twitter data. The results show that closed part-of-speech classes are transformed more frequently than the open classes, that the most frequently transformed lemmas are auxiliary and modal verbs, interjections, particles and pronouns, that character deletions are more frequent than insertions and replacements, and that more transformations occur at the word end than in other positions. Croatian and Serbian are found to share many, but not all transformation patterns; while some of the discrepancies can be ascribed to the structural differences between the two languages, others appear to be better explained by looking at extralinguistic factors. The produced datasets and their initial analyses can be used for studying the properties of non-standard language, as well as for developing language technologies for non-standard data</dc:description><dc:description xml:lang="sl">V prispevku predstavimo vzporedno ročno normalizacijo vzorcev, izluščenih iz korpusov hrvaških in srbskih tvitov. Najprej opišemo nabor podatkov, podamo poenotene smernice za anotatorje in predstavimo analizo pretvorb iz nestandardnega v standardni jezik, ki smo jih zajeli v gradivu. Rezultati kažejo, da se zaprte besedne vrste (tiste, ki redkeje sprejemajo nove besede ali pa jih sploh ne sprejemajo, torej predvsem slovnične besedne vrste) pretvarjajo pogosteje kot odprte (tiste, ki pogosteje sprejemajo nove elemente), da so najpogosteje pretvorjene leme pomožni in modalni glagoli, medmeti, členki in zaimki, da so izbrisi pogostejši kot vstavljanja ali zamenjave in da do pretvorb pogosteje prihaja na koncu besed kot na drugih mestih. Ugotovili smo, da si hrvaščina in srbščina delita številne pretvorbne vzorce, ne pa vseh. Medtem ko lahko nekatere razlike pripišemo strukturnim razlikam med jezikoma, se za druge zdi, da bi jih lahko lažje razložili z zunajjezikovnimi dejavniki. Izdelani nabori podatkov in začetne analize se lahko uporabljajo za proučevanje nestandardnega jezika kot tudi za razvoj jezikovnih tehnologij za nestandardne jezikovne podatke</dc:description><edm:type>TEXT</edm:type><dc:type xml:lang="sl">znanstveno časopisje</dc:type><dc:type xml:lang="en">journals</dc:type><dc:type rdf:resource="http://www.wikidata.org/entity/Q361785" /></edm:ProvidedCHO><ore:Aggregation rdf:about="http://www.dlib.si/?URN=URN:NBN:SI:doc-Y5LLQX89"><edm:aggregatedCHO rdf:resource="URN:NBN:SI:doc-Y5LLQX89" /><edm:isShownBy rdf:resource="http://www.dlib.si/stream/URN:NBN:SI:doc-Y5LLQX89/ff2de3d2-8455-4223-b6b6-01685824214a/PDF" /><edm:rights rdf:resource="http://creativecommons.org/licenses/by-sa/4.0/" /><edm:provider>Slovenian National E-content Aggregator</edm:provider><edm:intermediateProvider xml:lang="en">National and University Library of Slovenia</edm:intermediateProvider><edm:dataProvider xml:lang="sl">Trojina, zavod za uporabno slovenistiko</edm:dataProvider><edm:object rdf:resource="http://www.dlib.si/streamdb/URN:NBN:SI:doc-Y5LLQX89/maxi/edm" /><edm:isShownAt rdf:resource="http://www.dlib.si/details/URN:NBN:SI:doc-Y5LLQX89" /></ore:Aggregation></rdf:RDF>