3 |
DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues
|
|
|
|
In: Castilho, Sheila orcid:0000-0002-8416-6555 , Cavalheiro Camargo, João Lucas orcid:0000-0003-3746-1225 , Menezes, Miguel and Way, Andy orcid:0000-0001-5736-5930 (2021) DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues. In: Sixth Conference on Machine Translation (WMT21), 10-11 Nov 2021, Punta Cana, Dominican Republic (Online). ISBN 978-1-954085-94-7 (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Large-scale study of speech acts' development using automatic labelling
|
|
|
|
In: Proceedings of the 43rd Annual Meeting of the Cognitive Science Society ; https://hal.archives-ouvertes.fr/hal-03234620 ; Proceedings of the 43rd Annual Meeting of the Cognitive Science Society, Jul 2021, Vienna, Austria (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Multitask Transformer Model-based Fintech Customer Service Chatbot NLU System with DECO-LGG SSP-based Data ; DECO-LGG 반자동 증강 학습데이터 활용 멀티태스크 트랜스포머 모델 기반 핀테크 CS 챗봇 NLU 시스템
|
|
|
|
In: Annual Conference on Human and Language Technology ; https://hal.archives-ouvertes.fr/hal-03603903 ; Annual Conference on Human and Language Technology, Oct 2021, Séoul, South Korea. pp.461-466 ; http://www.koreascience.or.kr/journal/OOGHAK.page (2021)
|
|
Abstract:
National audience ; This study is based on the Semi-automatic Symbolic Propagation (SSP) method, which uses the DECO (Dictionnaire Electronique du COréen) Korean electronic dictionary and local grammar graphs (LGG). We created an annotated learning data set for a natural language understanding (NLU) chatbot for customer service (CS) in the field of financial technology (fintech). Using this dataset, we implemented a fintech CS chatbot NLU system with the Dual Intent and Entity Transformer (DIET) architecture provided by the RASA open source framework. Based on 10 conversation forums, and taking into account 32 topic types in the fintech field and 38 key events identified through real data, the DECO-LGG data generation module effectively generates high-quality annotated learning data for queries and complaint dialogues. An end-to-end multi-task transformer DIET model comprehensively processed object name recognition for intention classification and slot-filling. Learning with DIET-only led to an F1-score of 0.931 (Intent)/0.865 (Slot/Entity), and with DIET+KoBERT, the F1-score reached 0.951(Intent)/0.901(Slot/Entity). Thus, the DECO-LGG-based SSP-generated data is effective as training data and the KoBERT-based DIET model outperforms the DIET-only model. ; 본 연구에서는 DECO(Dictionnaire Electronique du COreen) 한국어 전자사전과 LGG(Local-Grammar Graph)에 기반한 반자동 언어데이터 증강(Semi-automatic Symbolic Propagation: SSP) 방식에 입각하여, 핀테크 분야의 CS(Customer Service) 챗봇 NLU(Natural Language Understanding)을 위한 주석 학습 데이터를 효과적으로 생성하고, 이를 기반으로 RASA 오픈 소스에서 제공하는 DIET(Dual Intent and Entity Transformer) 아키텍처를 활용하여 핀테크 CS 챗봇 NLU 시스템을 구현하였다. 실 데이터을 통해 확인된 핀테크 분야의 32가지의 토픽 유형 및 38가지의 핵심 이벤트와 10가지 담화소 구성에 따라, DECO-LGG 데이터 생성 모듈은 질의 및 불만 화행에 대한 양질의 주석 학습 데이터를 효과적으로 생성하며, 이를 의도 분류 및 Slot-filling을 위한 개체명 인식을 종합적으로 처리하는 End to End 방식의 멀티태스크 트랜스포머 모델 DIET로 학습함으로써 DIET-only F1-score 0.931(Intent)/0.865(Slot/Entity), DIET+KoBERT F1-score 0.951(Intent)/0.901(Slot/Entity)의 성능을 확인하였으며, DECO-LGG 기반의 SSP 생성 데이터의 학습 데이터로서의 효과성과 함께 KoBERT에 기반한 DIET 모델 성능의 우수성을 입증하였다.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; Chatbot; E-dictionary; Language Resource; Local grammar graph; Multitask transformer; NLP dictionary; Semi automatic annotation; Unitex; 멀티태스크 트랜스포머 모델; 반자동증강 학습데이터; 언어자원; 전자사전; 챗봇 NLU 시스템
|
|
URL: https://hal.archives-ouvertes.fr/hal-03603903 https://hal.archives-ouvertes.fr/hal-03603903/file/yoo-et-al-2021.pdf https://hal.archives-ouvertes.fr/hal-03603903/document
|
|
BASE
|
|
Hide details
|
|
6 |
Automatic Language Identification in Code-Switched Hindi-English Social Media Text
|
|
|
|
In: Journal of Open Humanities Data; Vol 7 (2021); 7 ; 2059-481X (2021)
|
|
BASE
|
|
Show details
|
|
7 |
Semi-automatic Annotation Proposal for Increasing a Fake News Dataset in Spanish
|
|
|
|
BASE
|
|
Show details
|
|
10 |
POS-Tagging für Transkripte gesprochener Sprache : Entwicklung einer automatisierten Wortarten-Annotation am Beispiel des Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK)
|
|
|
|
BLLDB
|
|
UB Frankfurt Linguistik
|
|
Show details
|
|
11 |
SMAD: A tool for automatically annotating the smile intensity along a video record
|
|
|
|
In: HRC2020, 10th Humour Research Conference ; https://hal.archives-ouvertes.fr/hal-02529371 ; HRC2020, 10th Humour Research Conference, Mar 2020, Commerce, Texas, United States (2020)
|
|
BASE
|
|
Show details
|
|
17 |
Automatic annotation of error types for grammatical error correction ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
PE2 rr corpus: manual error annotation of automatically pre-annotated MT post-edits
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Automatic annotation of error types for grammatical error correction
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Annotation des proéminences pour la segmentation de corpus oraux : l’expérience du projet SegCor
|
|
|
|
In: CMLF 2018 - 6e Congrès Mondial de Linguistique Française ; https://halshs.archives-ouvertes.fr/halshs-01839314 ; CMLF 2018 - 6e Congrès Mondial de Linguistique Française, Franck Neveu; Bernard Harmegnies; Linda Hriba; Sophie Prévost, Jul 2018, Mons, Belgique (2018)
|
|
BASE
|
|
Show details
|
|
|
|