Home
Catalogue search
Refine your search:
Keyword
Creator / Publisher:
Erjavec, Tomaž (4)
Farkaš, Daša (4)
Klubička, Filip (4)
Ljubešić, Nikola (4)
Miličević, Maja (4)
Dujmić, Barbara (2)
Filko, Matea (2)
Kranjčić, Denis (2)
Vuković, Teodora (2)
Year:
2017 (4)
Medium
Type
BLLDB-Access:
free (4)
subject to license (0)
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 4 of 4
1
Croatian Twitter training corpus ReLDI-NormTag-hr 1.1
Ljubešić, Nikola
;
Farkaš, Daša
;
Klubička, Filip
. - : Jožef Stefan Institute, 2017
BASE
Show details
2
Serbian Twitter training corpus ReLDI-NormTag-sr 1.0
Ljubešić, Nikola
;
Farkaš, Daša
;
Klubička, Filip
. - : Jožef Stefan Institute, 2017
BASE
Show details
3
Croatian Twitter training corpus ReLDI-NormTag-hr 1.0
Ljubešić, Nikola
;
Farkaš, Daša
;
Klubička, Filip
. - : Jožef Stefan Institute, 2017
BASE
Show details
4
Serbian Twitter training corpus ReLDI-NormTag-sr 1.1
Ljubešić, Nikola
;
Farkaš, Daša
;
Klubička, Filip
;
Erjavec, Tomaž
;
Miličević, Maja
;
Vuković, Teodora
. - : Jožef Stefan Institute, 2017
Abstract:
ReLDI-NormTag-sr 1.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Serbian. Each tweet is also annotated for its automatically assigned standardness levels (T = technical standardness, L = linguistic standardness). As an update to version 1.0, 1.1 corrects some minor errors. The corpus construction is (partially) described in: MILIČEVIĆ, Maja, LJUBEŠIĆ, Nikola. Tviterasi, tviteraši or twitteraši? Producing and analysing a normalised dataset of Croatian and Serbian tweets. Slovenščina 2.0: empirical, applied and interdisciplinary research, 4/2, 2016. ISSN 2335-2736. http://dx.doi.org/10.4312/slo2.0.2016.2.156-188
Keyword:
computer-mediated communication
;
lemmatisation
;
manual annotation
;
tagging
;
TEI
;
tokenisation
;
word normalisation
URL:
http://hdl.handle.net/11356/1120
BASE
Hide details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
4
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern