1 |
AixOx, a multi-layered learners corpus: automatic annotation
|
|
|
|
In: Specialisation and variation in language corpora ; https://hal.archives-ouvertes.fr/hal-01363434 ; Díaz Pérez J.; Díaz Negrillo A. Specialisation and variation in language corpora, Linguistic insights (179), Peter Lang, pp.41-76, 2014, 978-3035107135 ; https://www.peterlang.com/view/9783035199192/Chapter02.html (2014)
|
|
Abstract:
This paper presents a multilingual learners corpus, AixOx, collect-ed in the framework of an Alliance project (a partnership between the British Council and The French Ministry of Foreign Affairs). The corpus consists of the recording of 40 1-minute passages in English and French from the Eurom 1 corpus (Chan et al., 1995), read by native speakers and L2 learners. French native speakers reading the French and English passages were recorded in Aix-en-Provence, and English native speakers reading the English and French passages were recorded in Oxford. The AixOx corpus con-tains about 40 hours of read speech and can be downloaded from the “Speech and Language Data Repository” (http://sldr.org). This paper also presents the tools used for automatic anno-tation on several layers using algorithms: •SPPAS –SPeech Phonetization Alignment and Syllabifica-tion– (Bigi, 2012) for a segmentation into utterances, words, syllables and phonemes;•MoMel –Modelling Melody– and INTSINT –INternational Transcription System for INTonation– (Hirst, 2007) for the modelling and coding of intonation.Finally, an example of a pedagogical application of the cor-pus is given: a pilot-study on the intonation of questions. We show how the AixOx corpus can be used to compare the produc-tions of natives with learners and how it is possible, thanks to the annotation, to understand the prosodic realisations (whether they be positive or negative) and explain them. We conclude that AixOx, with its multi-layered annotation, is a very rich oral data-base for all kinds of studies on L1 productions, L2 productions, language contact, both at the segmental and supra-segmental levels since it offers a phonemic segmentation and alignment and a pro-sodic labelling.
|
|
Keyword:
[SHS.LANGUE]Humanities and Social Sciences/Linguistics; automatic annotation; intonation; INTSINT; L1; L2; language contact; MOMEL; oral corpus; questions; SPPAS
|
|
URL: https://hal.archives-ouvertes.fr/hal-01363434 https://hal.archives-ouvertes.fr/hal-01363434/file/Herment_et_al_PeterLang_FINAL_version%20HAL.pdf https://hal.archives-ouvertes.fr/hal-01363434/document
|
|
BASE
|
|
Hide details
|
|
2 |
AixOx
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-01363437 ; 2012 (2012)
|
|
BASE
|
|
Show details
|
|
3 |
AixOx, a multi-layered learners corpus: automatic annotation
|
|
|
|
In: 4th International Conference on Corpus Linguistics ; https://hal.archives-ouvertes.fr/hal-01363458 ; 4th International Conference on Corpus Linguistics, Mar 2012, Jaèn, SPAIN, Spain (2012)
|
|
BASE
|
|
Show details
|
|
4 |
Rhythm measures and dimensions of durational variation in speech
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Towards the acoustic analysis of lateral consonants in Modern Greek dialects: a preliminary study ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Long range prosody prediction and rhythm: doing rhythm with fewer assumptions
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Do rhythm measures separate languages or speakers?: Poster presented at BAAP 2010 on 30 March 2010
|
|
|
|
BASE
|
|
Show details
|
|
|
|