1 |
Investigating language impact in bilingual approaches for computational language documentation
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Empirical evaluation of sequence-to-sequence models for word discovery in low-resource settings
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Unsupervised word segmentation from speech with attention
|
|
|
|
Abstract:
We present a first attempt to perform attentional word segmentation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL). Our methodology assumes a pairing between recordings in the UL with translations in a well-resourced language. It uses Acoustic Unit Discovery (AUD) to convert speech into a sequence of pseudo-phones that is segmented using neural soft-alignments produced by a neural machine translation model. Evaluation uses an actual Bantu UL, Mboshi; comparisons to monolingual and bilingual baselines illustrate the potential of attentional word segmentation for language documentation.
|
|
URL: http://eprints.whiterose.ac.uk/150390/ http://eprints.whiterose.ac.uk/150390/7/1308.pdf
|
|
BASE
|
|
Hide details
|
|
4 |
Unwritten languages demand attention too! Word discovery with encoder-decoder models
|
|
|
|
BASE
|
|
Show details
|
|
|
|