1 |
Investigating language impact in bilingual approaches for computational language documentation
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Empirical evaluation of sequence-to-sequence models for word discovery in low-resource settings
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Unwritten languages demand attention too! Word discovery with encoder-decoder models
|
|
|
|
Abstract:
Word discovery is the task of extracting words from un-segmented text. In this paper we examine to what extent neural networks can be applied to this task in a realistic unwritten language scenario, where only small corpora and limited annotations are available. We investigate two scenarios: one with no supervision and another with limited supervision with access to the most frequent words. Obtained results show that it is possible to retrieve at least 27% of the gold standard vocabulary by training an encoder-decoder neural machine translation system with only 5,157 sentences. This result is close to those obtained with a task-specific Bayesian nonparametric model. Moreover, our approach has the advantage of generating translation alignments, which could be used to create a bilingual lexicon. As a future perspective, this approach is also well suited to work directly from speech.
|
|
URL: http://eprints.whiterose.ac.uk/153555/
|
|
BASE
|
|
Hide details
|
|
|
|