Home
Catalogue search
Refine your search:
Keyword:
TEI (4)
manual annotation (4)
named entities (4)
part-of-speech tagging (4)
tokenisation (4)
computer-mediated communication (2)
dependency treebank (2)
lemmatisation (2)
parsing (2)
word normalisation (2)
more
Creator / Publisher:
Batanović, Vuk (4)
Erjavec, Tomaž (4)
Ljubešić, Nikola (4)
Samardžić, Tanja (3)
Miličević, Maja (2)
Agić, Željko (1)
Klubička, Filip (1)
Year
Medium
Type
BLLDB-Access:
free (4)
subject to license (0)
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 4 of 4
1
Croatian Twitter training corpus ReLDI-NormTagNER-hr 2.1
Ljubešić, Nikola
;
Erjavec, Tomaž
;
Batanović, Vuk
. - : Jožef Stefan Institute, 2019
BASE
Show details
2
Serbian Twitter training corpus ReLDI-NormTagNER-sr 2.1
Ljubešić, Nikola
;
Erjavec, Tomaž
;
Batanović, Vuk
;
Miličević, Maja
;
Samardžić, Tanja
. - : Jožef Stefan Institute, 2019
Abstract:
ReLDI-NormTagNER-sr 2.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity recognition of non-standard Serbian. Each tweet is also annotated for its automatically assigned standardness levels (T = technical standardness, L = linguistic standardness). As an update to version 2.0, version 2.1 corrects some annotation errors and adds morphosyntactic annotations in the Universal Dependencies formalism in addition to the MULTEXT-East morphosyntactic descriptions. The corpus is now also available in CoNLL-U format.
Keyword:
computer-mediated communication
;
lemmatisation
;
manual annotation
;
named entities
;
part-of-speech tagging
;
TEI
;
tokenisation
;
word normalisation
URL:
http://hdl.handle.net/11356/1240
BASE
Hide details
3
Training corpus SETimes.SR 1.0
Batanović, Vuk
;
Ljubešić, Nikola
;
Samardžić, Tanja
. - : Regional Linguistic Data Initiative Centre ReLDI, 2018
BASE
Show details
4
Training corpus hr500k 1.0
Ljubešić, Nikola
;
Agić, Željko
;
Klubička, Filip
. - : Jožef Stefan Institute, 2018
BASE
Show details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
4
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern