DE eng

Search in the Catalogues and Directories

Hits 1 – 2 of 2

1
Rule-based Synthetic Data for Japanese GEC ...
Kimn, Alex; Hu, Yiqun; Aikawa, Takako. - : Zenodo, 2020
Abstract: Title: Rule-based Synthetic Data for Japanese GEC Dataset Contents: This dataset contains two parallel corpora intended for the training and evaluating of models for the NLP (natural language processing) subtask of Japanese GEC (grammatical error correction). These are as follows: Synthetic Corpus - *synthesized_data.tsv* This corpus file contains 2,179,130 parallel sentence pairs synthesized using the process described in [1]. Each line of the file consists of two sentences delimited by a tab. The first sentence is the erroneous sentence while the second is the corresponding correction. These paired sentences are derived from data scraped from the keyword-lookup site . The data within this file is primarily intended to serve as or augment a training set for a Japanese GEC model. Overall the sentences cover a broad array of primarily simple Japanese grammatical errors. Teacher Corpus - *teacher_data.tsv* This corpus file contains 6,345 parallel sentence pairs created via what we call the ... : Associated Work - Kimn, A. (May, 2020). *A syntactic rule-based framework for parallel data synthesis in Japanese GEC* (Master's thesis, Massachusetts Institute of Technology, Cambridge, MA, United States of America). Retrieved from [1] - Aikawa, T. & T. Takahashi. (2019), 「AIチュータの実現に向け:誤用例文コーパスデータの構築と誤用文修正知識の習得」,『ICT×日本語教育:ICTが作る新しい日本語教育への挑戦』當作靖彦(監修),李在鎬(編集),ひつじ書房, pp.84-98. (Toward the Development of AI Tutor: the Development of Error Corpus Data and the Acquisition Process of Grammar Error Correction, Information and Communication Technology (ICT) x Japanese Language Education: ICT's Challenges for Japanese Language Education, Eds., Yasuhiko Tohsaku, Jae-Ho LEE, Hitsuji Shobo, Tokyo, Japan, pp. 84-98.) [2] ...
Keyword: common grammatical errors; corpus; grammatical error; Japanese-English; parallel sentence pairs
URL: https://zenodo.org/record/4276130
https://dx.doi.org/10.5281/zenodo.4276130
BASE
Hide details
2
Rule-based Synthetic Data for Japanese GEC ...
Kimn, Alex; Hu, Yiqun; Aikawa, Takako. - : Zenodo, 2020
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
2
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern