2 |
Investigating alignment interpretability for low-resource NMT
|
|
|
|
In: ISSN: 0922-6567 ; EISSN: 1573-0573 ; Machine Translation ; https://hal.archives-ouvertes.fr/hal-03139744 ; Machine Translation, Springer Verlag, 2021, ⟨10.1007/s10590-020-09254-w⟩ (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Is there a bilingual disadvantage for word segmentation? A computational modeling approach
|
|
|
|
In: ISSN: 0305-0009 ; EISSN: 1469-7602 ; Journal of Child Language ; https://hal.archives-ouvertes.fr/hal-03498905 ; Journal of Child Language, Cambridge University Press (CUP), 2021, pp.1-28. ⟨10.1017/S0305000921000568⟩ (2021)
|
|
BASE
|
|
Show details
|
|
4 |
SM to: Is there a bilingual disadvantage for word segmentation? A computational modeling approach ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Early Tashelhiyt Berber word segmentation: the role of the Possible Word Constraint ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Discovering structure in speech recordings: Unsupervised learning of word and phoneme like units for automatic speech recognition
|
|
|
|
In: Fraunhofer IAIS (2021)
|
|
BASE
|
|
Show details
|
|
7 |
Handling cross and out-of-domain samples in Thai word segmentation
|
|
|
|
In: 1003 ; 1016 (2021)
|
|
Abstract:
© 2021 The Authors. Published by ACL. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://aclanthology.org/2021.findings-acl.86 ; While word segmentation is a solved problem in many languages, it is still a challenge in continuous-script or low-resource languages. Like other NLP tasks, word segmentation is domain-dependent, which can be a challenge in low-resource languages like Thai and Urdu since there can be domains with insufficient data. This investigation proposes a new solution to adapt an existing domaingeneric model to a target domain, as well as a data augmentation technique to combat the low-resource problems. In addition to domain adaptation, we also propose a framework to handle out-of-domain inputs using an ensemble of domain-specific models called MultiDomain Ensemble (MDE). To assess the effectiveness of the proposed solutions, we conducted extensive experiments on domain adaptation and out-of-domain scenarios. Moreover, we also proposed a multiple task dataset for Thai text processing, including word segmentation. For domain adaptation, we compared our solution to the state-of-the-art Thai word segmentation (TWS) method and obtained improvements from 93.47% to 98.48% at the character level and 84.03% to 96.75% at the word level. For out-of-domain scenarios, our MDE method significantly outperformed the state-of-the-art TWS and multi-criteria methods. Furthermore, to demonstrate our method’s generalizability, we also applied our MDE framework to other languages, namely Chinese, Japanese, and Urdu, and obtained improvements similar to Thai’s.
|
|
Keyword:
low-resource NLP; Thai; word segmentation
|
|
URL: http://hdl.handle.net/2436/624145 https://doi.org/10.18653/v1/2021.findings-acl.86
|
|
BASE
|
|
Hide details
|
|
8 |
Measuring (online) word segmentation in adults and children
|
|
|
|
In: Dutch Journal of Applied Linguistics, Vol 10 (2021) (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Investigating Language Impact in Bilingual Approaches for Computational Language Documentation
|
|
|
|
In: Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020), ; SLTU-CCURL workshop, LREC 2020 ; https://hal.archives-ouvertes.fr/hal-02895907 ; SLTU-CCURL workshop, LREC 2020, May 2020, Marseille, France (2020)
|
|
BASE
|
|
Show details
|
|
10 |
F0 Slope and Mean: Cues to Speech Segmentation in French
|
|
|
|
In: Interspeech 2020 ; https://hal.archives-ouvertes.fr/hal-03042331 ; Interspeech 2020, Oct 2020, Shanghai, China. pp.1610-1614, ⟨10.21437/Interspeech.2020-2509⟩ (2020)
|
|
BASE
|
|
Show details
|
|
11 |
The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Infants Segment Words from Songs—An EEG Study
|
|
|
|
In: Brain Sciences ; Volume 10 ; Issue 1 (2020)
|
|
BASE
|
|
Show details
|
|
16 |
Not all words are equally acquired: transitional probabilities and instructions affect the electrophysiological correlates of statistical learning
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Controlling Utterance Length in NMT-based Word Segmentation with Attention
|
|
|
|
In: International Workshop on Spoken Language Translation ; https://hal.archives-ouvertes.fr/hal-02343206 ; International Workshop on Spoken Language Translation, Nov 2019, Hong-Kong, China (2019)
|
|
BASE
|
|
Show details
|
|
18 |
Segmentability Differences Between Child-Directed and Adult-Directed Speech: A Systematic Test With an Ecologically Valid Corpus
|
|
|
|
In: EISSN: 2470-2986 ; Open Mind ; https://hal.archives-ouvertes.fr/hal-02274050 ; Open Mind, MIT Press, 2019, 3, pp.13-22. ⟨10.1162/opmi_a_00022⟩ (2019)
|
|
BASE
|
|
Show details
|
|
19 |
Unsupervised word discovery for computational language documentation ; Découverte non-supervisée de mots pour outiller la linguistique de terrain
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-02286425 ; Artificial Intelligence [cs.AI]. Université Paris Saclay (COmUE), 2019. English. ⟨NNT : 2019SACLS062⟩ (2019)
|
|
BASE
|
|
Show details
|
|
20 |
MiNgMatch—A Fast N-gram Model for Word Segmentation of the Ainu Language
|
|
|
|
In: Information ; Volume 10 ; Issue 10 (2019)
|
|
BASE
|
|
Show details
|
|
|
|