3 |
DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues
|
|
|
|
In: Castilho, Sheila orcid:0000-0002-8416-6555 , Cavalheiro Camargo, João Lucas orcid:0000-0003-3746-1225 , Menezes, Miguel and Way, Andy orcid:0000-0001-5736-5930 (2021) DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues. In: Sixth Conference on Machine Translation (WMT21), 10-11 Nov 2021, Punta Cana, Dominican Republic (Online). ISBN 978-1-954085-94-7 (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Large-scale study of speech acts' development using automatic labelling
|
|
|
|
In: Proceedings of the 43rd Annual Meeting of the Cognitive Science Society ; https://hal.archives-ouvertes.fr/hal-03234620 ; Proceedings of the 43rd Annual Meeting of the Cognitive Science Society, Jul 2021, Vienna, Austria (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Multitask Transformer Model-based Fintech Customer Service Chatbot NLU System with DECO-LGG SSP-based Data ; DECO-LGG 반자동 증강 학습데이터 활용 멀티태스크 트랜스포머 모델 기반 핀테크 CS 챗봇 NLU 시스템
|
|
|
|
In: Annual Conference on Human and Language Technology ; https://hal.archives-ouvertes.fr/hal-03603903 ; Annual Conference on Human and Language Technology, Oct 2021, Séoul, South Korea. pp.461-466 ; http://www.koreascience.or.kr/journal/OOGHAK.page (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Automatic Language Identification in Code-Switched Hindi-English Social Media Text
|
|
|
|
In: Journal of Open Humanities Data; Vol 7 (2021); 7 ; 2059-481X (2021)
|
|
BASE
|
|
Show details
|
|
7 |
Semi-automatic Annotation Proposal for Increasing a Fake News Dataset in Spanish
|
|
|
|
BASE
|
|
Show details
|
|
10 |
POS-Tagging für Transkripte gesprochener Sprache : Entwicklung einer automatisierten Wortarten-Annotation am Beispiel des Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK)
|
|
|
|
BLLDB
|
|
UB Frankfurt Linguistik
|
|
Show details
|
|
11 |
SMAD: A tool for automatically annotating the smile intensity along a video record
|
|
|
|
In: HRC2020, 10th Humour Research Conference ; https://hal.archives-ouvertes.fr/hal-02529371 ; HRC2020, 10th Humour Research Conference, Mar 2020, Commerce, Texas, United States (2020)
|
|
BASE
|
|
Show details
|
|
17 |
Automatic annotation of error types for grammatical error correction ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
PE2 rr corpus: manual error annotation of automatically pre-annotated MT post-edits
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Automatic annotation of error types for grammatical error correction
|
|
|
|
Abstract:
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting grammatical errors in text. Although previous work has focused on developing systems that target specific error types, the current state of the art uses machine translation to correct all error types simultaneously. A significant disadvantage of this approach is that machine translation does not produce annotated output and so error type information is lost. This means we can only evaluate a system in terms of overall performance and cannot carry out a more detailed analysis of different aspects of system performance. In this thesis, I develop a system to automatically annotate parallel original and corrected sentence pairs with explicit edits and error types. In particular, I first extend the Damerau- Levenshtein alignment algorithm to make use of linguistic information when aligning parallel sentences, and supplement this alignment with a set of merging rules to handle multi-token edits. The output from this algorithm surpasses other edit extraction approaches in terms of approximating human edit annotations and is the current state of the art. Having extracted the edits, I next classify them according to a new rule-based error type framework that depends only on automatically obtained linguistic properties of the data, such as part-of-speech tags. This framework was inspired by existing frameworks, and human judges rated the appropriateness of the predicted error types as ‘Good’ (85%) or ‘Acceptable’ (10%) in a random sample of 200 edits. The whole system is called the ERRor ANnotation Toolkit (ERRANT) and is the first toolkit capable of automatically annotating parallel sentences with error types. I demonstrate the value of ERRANT by applying it to the system output produced by the participants of the CoNLL-2014 shared task, and carry out a detailed error type analysis of system performance for the first time. I also develop a simple language model based approach to GEC, that does not require annotated training data, and show how it can be improved using ERRANT error types.
|
|
Keyword:
Automatic Annotation; Grammatical Error Correction; Natural Language Processing
|
|
URL: https://www.repository.cam.ac.uk/handle/1810/293719 https://doi.org/10.17863/CAM.40832
|
|
BASE
|
|
Hide details
|
|
20 |
Annotation des proéminences pour la segmentation de corpus oraux : l’expérience du projet SegCor
|
|
|
|
In: CMLF 2018 - 6e Congrès Mondial de Linguistique Française ; https://halshs.archives-ouvertes.fr/halshs-01839314 ; CMLF 2018 - 6e Congrès Mondial de Linguistique Française, Franck Neveu; Bernard Harmegnies; Linda Hriba; Sophie Prévost, Jul 2018, Mons, Belgique (2018)
|
|
BASE
|
|
Show details
|
|
|
|