5 |
Akuzipik/Yupik (St. Lawrence Island, Alaska, USA; Chukotka, Russia) - Language Snapshot ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
A digital corpus of St. Lawrence Island Yupik for the Yupik Community
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Morphology Matters: A Multilingual Language Modeling Analysis ...
|
|
|
|
Abstract:
Prior studies in multilingual language modeling (e.g., Cotterell et al., 2018; Mielke et al., 2019) disagree on whether or not inflectional morphology makes languages harder to model. We attempt to resolve the disagreement and extend those studies. We compile a larger corpus of 145 Bible translations in 92 languages and a larger number of typological features. We fill in missing typological data for several languages and consider corpus-based measures of morphological complexity in addition to expert-produced typological features. We find that several morphological measures are significantly associated with higher surprisal when LSTM models are trained with BPE-segmented data. We also investigate linguistically-motivated subword segmentation strategies like Morfessor and Finite-State Transducers (FSTs) and find that these segmentation strategies yield better performance and reduce the impact of a language's morphology on language modeling. ... : To appear in TACL, a pre-MIT Press publication version; 15 pages, 3 figures; for the datasets, see https://github.com/hayleypark/MorphologyMatters ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2012.06262 https://arxiv.org/abs/2012.06262
|
|
BASE
|
|
Hide details
|
|
8 |
Multidirectional leveraging for computational morphology and language documentation and revitalization
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Multidirectional leveraging for computational morphology and language documentation and revitalization
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Community-Focused Language Documentation in Support of Language Education and Revitalization for St. Lawrence Island Yupik
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Liinnaqumalghiit: A web-based tool for addressing orthographic transparency in St. Lawrence Island/Central Siberian Yupik
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Liinnaqumalghiit: A web-based tool for addressing orthographic transparency in St. Lawrence Island/Central Siberian Yupik
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Compiling contextualized lists of frequent vocabulary from user- supplied corpora using natural language processing techniques
|
|
|
|
BASE
|
|
Show details
|
|
16 |
An incremental syntactic language model for statistical phrase-based translation.
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Incremental Syntactic Language Models for Phrase-Based Translation
|
|
|
|
In: DTIC (2011)
|
|
BASE
|
|
Show details
|
|
|
|