DE eng

Search in the Catalogues and Directories

Hits 1 – 2 of 2

1
Data for Training and Evaluating Metadata Extraction Models based on 15 Thousand Cyrillic Script Publications ...
BASE
Show details
2
Data for Training and Evaluating Metadata Extraction Models based on 15 Thousand Cyrillic Script Publications ...
Abstract: Description Data for training and evaluating sequence labeling models for metadata extraction based on 15,553 Cyrillic script language papers spanning 27 years and three languages. For each paper, ground truth sequence labeling output is provided in TEI format and as annotated plain text. The code used for creating and evaluating the data set can be found on GitHub. For citing , you can refer to our paper introducing the data set: @inproceedings{kssf-2021-cyrillic, title = {{Bootstrapping Multilingual Metadata Extraction: A Showcase in Cyrillic}}, author = {Krause, Johan and Shapiro, Igor and Saier, Tarek and F{\"a}rber, Michael}, booktitle = {Proceedings of the Second Workshop on Scholarly Document Processing}, year = {2021} } ...
Keyword: bulgarian; cyrillic; metadata extraction; russian; scholarly data; sequence labeling; ukranian
URL: https://dx.doi.org/10.5281/zenodo.4708696
https://zenodo.org/record/4708696
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
2
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern