Home
Catalogue search
Refine your search:
Keyword
Creator / Publisher:
Färber, Michael (5)
Saier, Tarek (5)
Shapiro, Igor (5)
Krause, Johan (3)
Year:
2022 (1)
2021 (4)
Medium
Type
BLLDB-Access
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 5 of 5
1
Bootstrapping Multilingual Metadata Extraction: A Showcase in Cyrillic
Shapiro, Igor
;
Färber, Michael
;
Saier, Tarek
. - : Association for Computational Linguistics, 2022
BASE
Show details
2
Data for Cyrillic Reference Parsing ...
Shapiro, Igor
;
Saier, Tarek
;
Färber, Michael
. - : Zenodo, 2021
BASE
Show details
3
Data for Training and Evaluating Metadata Extraction Models based on 15 Thousand Cyrillic Script Publications ...
Krause, Johan
;
Shapiro, Igor
;
Saier, Tarek
. - : Zenodo, 2021
BASE
Show details
4
Data for Training and Evaluating Metadata Extraction Models based on 15 Thousand Cyrillic Script Publications ...
Krause, Johan
;
Shapiro, Igor
;
Saier, Tarek
. - : Zenodo, 2021
BASE
Show details
5
Data for Cyrillic Reference Parsing ...
Shapiro, Igor
;
Saier, Tarek
;
Färber, Michael
. - : Zenodo, 2021
Abstract:
We provide a synthetic reference data set covering over 100,000 labeled references (mostly Russian language) and a manually annotated set of real references (771 in number) gathered from multidisciplinary Cyrillic script publications . Background: Extracting structured data from bibliographic references is a crucial task for the creation of scholarly databases. While approaches, tools, and evaluation data sets for the task exist, there is a distinct lack of support for languages other than English and scripts other than the Latin alphabet. A significant portion of the scientific literature that is thereby excluded consists of publications written in Cyrillic script languages. To address this problem, we introduce a new multilingual and multidisciplinary data set of over 100,000 labeled reference strings. The data set covers multiple Cyrillic languages and contains over 700 manually labeled references, while the remaining are generated synthetically. With random samples of varying size of this data, we train ...
Keyword:
citation data
;
citation field extraction
;
Cyrillic
;
digital libraries
;
NLP
;
references
;
scholarly data
;
SDU2022
;
sequence labeling
URL:
https://zenodo.org/record/5801914
https://dx.doi.org/10.5281/zenodo.5801914
BASE
Hide details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
5
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern