Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 8 of 8

1	GECCC Grammar Error Correction Corpus for Czech
	Náplava, Jakub; Straka, Milan; Straková, Jana. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2022
	BASE
	Show details

2	RobeCzech Base
	Straka, Milan; Náplava, Jakub; Straková, Jana. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
	BASE
	Show details

3	RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model ...
	Straka, Milan; Náplava, Jakub; Straková, Jana. - : arXiv, 2021
	BASE
	Show details

4	Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset ...
	Kocián, Matěj; Náplava, Jakub; Štancl, Daniel. - : arXiv, 2021
	BASE
	Show details

5	AKCES-GEC Grammatical Error Correction Dataset for Czech
	Šebesta, Karel; Bedřichová, Zuzanna; Šormová, Kateřina. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2019
	BASE
	Show details

6	Corpus for training and evaluating diacritics restoration systems
	Náplava, Jakub; Straka, Milan; Hajič, Jan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2018
	BASE
	Show details

7	Automatically generated spelling correction corpus for Czech (Czech-SEC-AG)
	Hajič, Jan; Náplava, Jakub; Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2017
	Abstract: Automatically generated spelling correction corpus for Czech (Czesl-SEC-AG) is a corpus containg text with automatically generated spelling errors. To create spelling errors, a character error model containing probabilities of character substitution, insertion, deletion and probabilities of swaping two adjacent characters is used. Besides these probabilities, also the probabilities of changing character casing are considered. The original clean text on which the spelling errors were generated is PDT3.0 (http://hdl.handle.net/11858/00-097C-0000-0023-1AAF-3). The original train/dev/test sentence split of PDT3.0 corpus is preserved in this dataset. Besides the data with artificial spelling errors, we also publish texts from which the character error model was created. These are the original manual transcript of an audiobook Švejk and its corrected version performed by authors of Korektor (http://ufal.mff.cuni.cz/korektor). These data are similarly to CzeSL Grammatical Error Correction Dataset (CzeSL-GEC: http://hdl.handle.net/11234/1-2143) processed into four sets based on error difficulty present.
	Keyword: natural language correction; spelling correction
	URL: http://hdl.handle.net/11234/1-2144
	BASE
	Hide details

8	CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)
	Šebesta, Karel; Bedřichová, Zuzanna; Šormová, Kateřina. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2017
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern