Home
Catalogue search
Refine your search:
Keyword
Creator / Publisher:
Arhar Holdt, Špela (4)
Erjavec, Tomaž (4)
Fišer, Darja (4)
Čibej, Jaka (4)
Ljubešić, Nikola (3)
Zupan, Katja (2)
Dobrovoljc, Kaja (1)
Year
Medium
Type
BLLDB-Access:
free (4)
subject to license (0)
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 4 of 4
1
CMC training corpus Janes-Tag 2.1
Erjavec, Tomaž
;
Fišer, Darja
;
Čibej, Jaka
. - : Jožef Stefan Institute, 2019
BASE
Show details
2
CMC training corpus Janes-Tag 2.0
Erjavec, Tomaž
;
Fišer, Darja
;
Čibej, Jaka
. - : Jožef Stefan Institute, 2017
BASE
Show details
3
CMC training corpus Janes-Tag 1.2
Erjavec, Tomaž
;
Fišer, Darja
;
Čibej, Jaka
;
Arhar Holdt, Špela
;
Ljubešić, Nikola
. - : Jožef Stefan Institute, 2016
Abstract:
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Slovene. As the corpus has been carefully manually annotated, it is also suitable for detailed linguistic explorations which require highly accurate and reliable annotations. A slightly older version of this corpus is described in: ERJAVEC, Tomaž, ČIBEJ, Jaka, ARHAR HOLDT, Špela, LJUBEŠIĆ, Nikola, FIŠER, Darja. Gold-standard datasets for annotation of Slovene computer-mediated communication. In Proceedings of RASLAN 2016: Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2016, pp. 29-40, https://nlp.fi.muni.cz/raslan/raslan16.pdf Note that a related corpus, Janes-Norm is also available, cf. http://hdl.handle.net/11356/1084.
Keyword:
computer-mediated communication
;
lemmatisation
;
manual annotation
;
tagging
;
TEI
;
tokenisation
;
word normalisation
URL:
http://hdl.handle.net/11356/1085
BASE
Hide details
4
CMC training corpus Janes-Norm 1.2
Erjavec, Tomaž
;
Fišer, Darja
;
Čibej, Jaka
. - : Jožef Stefan Institute, 2016
BASE
Show details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
4
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern