3 |
Leveraging pre-trained representations to improve access to untranscribed speech from endangered languages ...
|
|
San, Nay; Bartelds, Martijn; Browne, Mitchell; Clifford, Lily; Gibson, Fiona; Mansfield, John; Nash, David; Simpson, Jane; Turpin, Myfany; Vollmer, Maria; Wilmoth, Sasha; Jurafsky, Dan. - : arXiv, 2021
|
|
Abstract:
Pre-trained speech representations like wav2vec 2.0 are a powerful tool for automatic speech recognition (ASR). Yet many endangered languages lack sufficient data for pre-training such models, or are predominantly oral vernaculars without a standardised writing system, precluding fine-tuning. Query-by-example spoken term detection (QbE-STD) offers an alternative for iteratively indexing untranscribed speech corpora by locating spoken query terms. Using data from 7 Australian Aboriginal languages and a regional variety of Dutch, all of which are endangered or vulnerable, we show that QbE-STD can be improved by leveraging representations developed for ASR (wav2vec 2.0: the English monolingual model and XLSR53 multilingual model). Surprisingly, the English model outperformed the multilingual model on 4 Australian language datasets, raising questions around how to optimally leverage self-supervised speech representations for QbE-STD. Nevertheless, we find that wav2vec 2.0 representations (either English or ... : Accepted at ASRU 2021 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2103.14583 https://dx.doi.org/10.48550/arxiv.2103.14583
|
|
BASE
|
|
Hide details
|
|
6 |
Clause chaining and the utterance phrase: Syntax–prosody mapping in Matukar Panau
|
|
|
|
In: Open Linguistics, Vol 7, Iss 1, Pp 423-447 (2021) (2021)
|
|
BASE
|
|
Show details
|
|
10 |
Category clustering: A probabilistic bias in the morphology of verbal agreement marking ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
UniMorph 3.0: Universal Morphology
|
|
|
|
In: Proceedings of the 12th Language Resources and Evaluation Conference (2020)
|
|
BASE
|
|
Show details
|
|
16 |
Epistemic authority and sociolinguistic stance in an Australian Aboriginal language
|
|
|
|
In: Open Linguistics, Vol 5, Iss 1, Pp 25-48 (2019) (2019)
|
|
BASE
|
|
Show details
|
|
18 |
Documenting sociolinguistic variation in lesser-studied indigenous communities: Challenges and practical solutions
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Documenting sociolinguistic variation in lesser-studied indigenous communities: Challenges and practical solutions
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Exploring Murrinhpatha dialectal variation in a diachronic corpus ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|