21 |
MuST-Cinema: a Speech-to-Subtitles corpus ...
|
|
|
|
Abstract:
Growing needs in localising audiovisual content in multiple languages through subtitles call for the development of automatic solutions for human subtitling. Neural Machine Translation (NMT) can contribute to the automatisation of subtitling, facilitating the work of human subtitlers and reducing turn-around times and related costs. NMT requires high-quality, large, task-specific training data. The existing subtitling corpora, however, are missing both alignments to the source language audio and important information about subtitle breaks. This poses a significant limitation for developing efficient automatic approaches for subtitling, since the length and form of a subtitle directly depends on the duration of the utterance. In this work, we present MuST-Cinema, a multilingual speech translation corpus built from TED subtitles. The corpus is comprised of (audio, transcription, translation) triplets. Subtitle breaks are preserved by inserting special symbols. We show that the corpus can be used to build ... : Accepted at LREC 2020 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2002.10829 https://arxiv.org/abs/2002.10829
|
|
BASE
|
|
Hide details
|
|
22 |
Low Resource Neural Machine Translation: A Benchmark for Five African Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
24 |
Breeding Gender-aware Direct Speech Translation Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
25 |
Machine-oriented NMT Adaptation for Zero-shot NLP tasks: Comparing the Usefulness of Close and Distant Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
26 |
APE Shared Task WMT18: Human Post-edits and References Test Data EN-DE PBSMT
|
|
|
|
BASE
|
|
Show details
|
|
27 |
Identification of bilingual terms from monolingual documents for statistical machine translation
|
|
|
|
BASE
|
|
Show details
|
|
28 |
Knowledge portability with semantic expansion of ontology labels
|
|
|
|
BASE
|
|
Show details
|
|
29 |
Enhancing statistical machine translation with bilingual terminology in a CAT environment
|
|
|
|
BASE
|
|
Show details
|
|
30 |
Multilingual Neural Machine Translation for Zero-Resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
32 |
Adapting Multilingual Neural Machine Translation to Unseen Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
33 |
Adapting Multilingual Neural Machine Translation to Unseen Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
34 |
Adapting Multilingual Neural Machine Translation to Unseen Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
35 |
MuST-C A Multilingual Speech Translation Corpus (Conference slides) ...
|
|
|
|
BASE
|
|
Show details
|
|
39 |
Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary ...
|
|
|
|
BASE
|
|
Show details
|
|
40 |
Improving Zero-Shot Translation of Low-Resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|