DE eng

Search in the Catalogues and Directories

Page: 1...738 739 740 741 742
Hits 14.821 – 14.837 of 14.837

14821
B. Helong
BASE
Show details
14822
Tutong 1772-2040 & texts Part 5a
BASE
Show details
14823
after fight near kitchen
BASE
Show details
14824
Ndao
BASE
Show details
14825
SP6
BASE
Show details
14826
Rai Jua
BASE
Show details
14827
Extracts of B3, extracts of B2
BASE
Show details
14828
[Title to be supplied]
BASE
Show details
14829
copy B3 Extracts
BASE
Show details
14830
Dimu
BASE
Show details
14831
[Title to be supplied]
BASE
Show details
14832
RainJua
BASE
Show details
14833
Hj Matamit bin Iradat Kg Sengkarai Sony HDMD & E CHECK Edit for Content needed kah last 15 mins
BASE
Show details
14834
Tutong_elicitation Wordlist Jurgen Burkhardt
BASE
Show details
14835
Sounds for Film
BASE
Show details
14836
Henry
BASE
Show details
14837
Building Multilingual Comparable Corpora
Abstract: Building on existing corpora and new audio/video documentary fieldwork from 12+ languages from across West Africa, we are creating a multilingual comparative corpus with input from 20+ collaborating researchers (Nikitina et al. 2020). We present a toolkit of technologies and three parallel workflows that can be used to mobilize language materials from diverse sources for a variety of purposes, particularly for the discovery of discourse patterns in legacy materials that could then be used in revitalization efforts. Our toolkit includes the following technologies: ELAN-CorpA (Chanard 2015; 2019), Fieldworks Language Explorer (FLEx, SIL International), Toolbox (SIL International), ELAN Tools (Chanard et al. 2020, under development), SpeechReporting Template (Nikitina et al. 2019) and Tsakorpus (Arkhangelskiy 2019). The three workflows differ with regard to the initial file format and the software platform that is to be used for parsing and glossing of texts. All three workflows lead to a collection of annotated files that can be queried with ELAN-CorpA (Hantgan 2019). In the first workflow, (1) ELAN-CorpA is used for time-aligned translation and transcription of a recorded text; (2) FLEx is used to parse and gloss the text, and (3) ELAN Toolsconverts the .flextext export into the project template for use in ELAN-CorpA. (4) Once in the project template, the text is annotated for project categories and complex queries can be run across all texts in any language using search features of ELAN-CorpA. In the second workflow, (1) transcription, translation, parsing and glossing is done in Toolbox, (2) ELAN Tools converts a Toolbox file to the project template; (3) ELAN-CorpA is used to time align and annotate for the project. In the third workflow, translation, transcription, parsing and glossing is all done in ELAN-CorpA using the project template from initial stages. Using these three workflows, the data from various source file types is processed into a shared format that will be displayed in an online platform via Tsakorpus.This methodology may be of interest to community members looking for ways to prepare already-collected language materials in order to display them on the internet, those interested in specific questions regarding discourse phenomena, typologists, and linguists in general. ; Building on existing corpora and new documentary fieldwork in West Africa, we are creating a multilingual comparative corpus. We present a technology toolkit and three parallel workflows that can be used to mobilize language materials for a variety of purposes, particularly for the discovery of discourse patterns in legacy materials.
Keyword: Corpora; ELAN; language documentation; language_documentation; Methodology; text and corpus linguistics; text_and_corpus_linguistics; West Afrcia
URL: https://hughandbecky.us/Becky-CV/talk/2021-building-multilingual-comparable-corpora/
BASE
Hide details

Page: 1...738 739 740 741 742

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
14.837
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern