DE eng

Search in the Catalogues and Directories

Hits 1 – 8 of 8

1
GECCC Grammar Error Correction Corpus for Czech
Náplava, Jakub; Straka, Milan; Straková, Jana. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2022
BASE
Show details
2
RobeCzech Base
Straka, Milan; Náplava, Jakub; Straková, Jana. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
BASE
Show details
3
RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model ...
BASE
Show details
4
Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset ...
BASE
Show details
5
AKCES-GEC Grammatical Error Correction Dataset for Czech
Šebesta, Karel; Bedřichová, Zuzanna; Šormová, Kateřina. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2019
BASE
Show details
6
Corpus for training and evaluating diacritics restoration systems
Náplava, Jakub; Straka, Milan; Hajič, Jan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2018
BASE
Show details
7
Automatically generated spelling correction corpus for Czech (Czech-SEC-AG)
Hajič, Jan; Náplava, Jakub; Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2017
Abstract: Automatically generated spelling correction corpus for Czech (Czesl-SEC-AG) is a corpus containg text with automatically generated spelling errors. To create spelling errors, a character error model containing probabilities of character substitution, insertion, deletion and probabilities of swaping two adjacent characters is used. Besides these probabilities, also the probabilities of changing character casing are considered. The original clean text on which the spelling errors were generated is PDT3.0 (http://hdl.handle.net/11858/00-097C-0000-0023-1AAF-3). The original train/dev/test sentence split of PDT3.0 corpus is preserved in this dataset. Besides the data with artificial spelling errors, we also publish texts from which the character error model was created. These are the original manual transcript of an audiobook Švejk and its corrected version performed by authors of Korektor (http://ufal.mff.cuni.cz/korektor). These data are similarly to CzeSL Grammatical Error Correction Dataset (CzeSL-GEC: http://hdl.handle.net/11234/1-2143) processed into four sets based on error difficulty present.
Keyword: natural language correction; spelling correction
URL: http://hdl.handle.net/11234/1-2144
BASE
Hide details
8
CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)
Šebesta, Karel; Bedřichová, Zuzanna; Šormová, Kateřina. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2017
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
8
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern