DE eng

Search in the Catalogues and Directories

Hits 1 – 4 of 4

1
Investigating language impact in bilingual approaches for computational language documentation
Boito, M.Z.; Villavicencio, A.; Besacier, L.. - : Special Interest Group: Under-resourced Languages (SIGUL), 2020
BASE
Show details
2
Empirical evaluation of sequence-to-sequence models for word discovery in low-resource settings
Boito, M.Z.; Villavicencio, A.; Besacier, L.. - : International Speech Communication Association (ISCA), 2019
BASE
Show details
3
Unsupervised word segmentation from speech with attention
Godard, P.; Boito, M.Z.; Ondel, L.. - : ISCA, 2018
BASE
Show details
4
Unwritten languages demand attention too! Word discovery with encoder-decoder models
Abstract: Word discovery is the task of extracting words from un-segmented text. In this paper we examine to what extent neural networks can be applied to this task in a realistic unwritten language scenario, where only small corpora and limited annotations are available. We investigate two scenarios: one with no supervision and another with limited supervision with access to the most frequent words. Obtained results show that it is possible to retrieve at least 27% of the gold standard vocabulary by training an encoder-decoder neural machine translation system with only 5,157 sentences. This result is close to those obtained with a task-specific Bayesian nonparametric model. Moreover, our approach has the advantage of generating translation alignments, which could be used to create a bilingual lexicon. As a future perspective, this approach is also well suited to work directly from speech.
URL: http://eprints.whiterose.ac.uk/153555/
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
4
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern