3 |
The Lannang Corpus (LanCorp): A POS-tagged, sociolinguistic corpus containing recordings and transcriptions of Lannang speech collected from the metropolitan Manila Lannangs between 2016 and 2020 ...
|
|
|
|
Abstract:
The Lannang Corpus (LanCorp) is a sociolinguistic POS-tagged 375,000-word speech-and-text corpus of Lannang languages based on audio recordings collected in metropolitan Manila between 2016 and 2020. It hopes to furnish scholars interested in Sino-Philippine (socio)linguistics with a contemporary, multilingual corpus (i.e., Hokkien, Tagalog, English, Lánnang-uè, Mandarin) compiled using recorded oral data primarily collected from a Sino-Philippine community in metropolitan Manila by the community: the Manila Lannangs. The publicly available corpus contains manual transcriptions (time-aligned to the audio), source language and part-of-speech tags derived using a mix of manual and computational methods, and a wide range of social metadata; it is also organized and stored systematically for easy data retrieval and (socio)linguistic analysis. Although there are existing sociolinguistic corpora, they are small in scale and were not released publicly due to lack of informant consent – LanCorp readily fills the gap. ...
|
|
Keyword:
Lannang, Chinese Filipino, Filipino-Chinese, Hokkien, diaspora, mixed language, recordings, oral variety, multilingual, corpus, data, dataset, databank, LanCorp, Lannang Corpus, ELAN, sociolinguistics
|
|
URL: https://dx.doi.org/10.7302/66g9-e028 http://deepblue.lib.umich.edu/data/concern/data_sets/g445cd563
|
|
BASE
|
|
Hide details
|
|
4 |
Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques
|
|
|
|
In: Conférence en Recherche d'Informations et Applications (CORIA 2021) ; https://hal.archives-ouvertes.fr/hal-03320332 ; Conférence en Recherche d'Informations et Applications (CORIA 2021), ARIA : Association Francophone de Recherche d’Information (RI) et Applications, Apr 2021, Grenoble (virtuel), France. pp.1 - 7 ; http://coria.asso-aria.org/2021/articles/mini_24/main.pdf (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Contextualization of Web contents through semantic enrichment from linked open data ; Contextualisation des contenus Web par l'enrichissement sémantique à partir de données
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03561788 ; Databases [cs.DB]. Normandie Université, 2021. English. ⟨NNT : 2021NORMC243⟩ (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Lost in translation: Qualitative data collecting and translating challenges in multilingual settings in information systems research
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Would auto-translation of metadata enhance discovery and impact of research data? ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Would auto-translation of metadata enhance discovery and impact of research data? ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Report on the Published Updates to CESSDA Vocabulary Service Content ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Report on the Published Updates to CESSDA Vocabulary Service Content ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text ...
|
|
Ionov, Maxim. - : Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021
|
|
BASE
|
|
Show details
|
|
13 |
Improving Multilingual Models for the Swedish Language : Exploring CrossLingual Transferability and Stereotypical Biases
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Extending a Text Classifier to Multiple Languages ; Utöka en textklassificeringsmodell till flera språk
|
|
Byström, Albin. - : KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021
|
|
BASE
|
|
Show details
|
|
15 |
LLOD-driven Bilingual Word Embeddings Rivaling Cross-lingual Transformers in Quality of Life Concept Detection from French Online Health Communities ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
LLOD-driven Bilingual Word Embeddings Rivaling Cross-lingual Transformers in Quality of Life Concept Detection from French Online Health Communities ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Analyzing Non-Textual Content Elements to Detect Academic Plagiarism
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Flipgrid and Second Language Acquisition Using Flipgrid to Promote Speaking Skills for English Language Learners
|
|
|
|
In: Master’s Theses and Projects (2020)
|
|
BASE
|
|
Show details
|
|
|
|