1 |
Unsupervised word segmentation from speech with attention
|
|
|
|
Abstract:
We present a first attempt to perform attentional word segmentation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL). Our methodology assumes a pairing between recordings in the UL with translations in a well-resourced language. It uses Acoustic Unit Discovery (AUD) to convert speech into a sequence of pseudo-phones that is segmented using neural soft-alignments produced by a neural machine translation model. Evaluation uses an actual Bantu UL, Mboshi; comparisons to monolingual and bilingual baselines illustrate the potential of attentional word segmentation for language documentation.
|
|
URL: http://eprints.whiterose.ac.uk/150390/ http://eprints.whiterose.ac.uk/150390/7/1308.pdf
|
|
BASE
|
|
Hide details
|
|
2 |
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|