DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...148
Hits 1 – 20 of 2.945

1
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
In: https://hal.inria.fr/hal-03550289 ; 2022 (2022)
Abstract: 8 pages plus appendix and references ; In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large language models. This prioritization, however, has resulted in concerns with respect to the rights of data subjects represented in data collections, particularly when considering the difficulty in interrogating these collections due to insufficient documentation and tools for analysis. Mindful of these pitfalls, we present our methodology for a documentation-first, human-centered data collection project as part of the BigScience initiative. We identified a geographically diverse set of target language groups (Arabic, Basque, Chinese, Catalan, English, French, Indic languages, Indonesian, Niger-Congo languages, Portuguese, Spanish, and Vietnamese, as well as programming languages) for which to collect metadata on potential data sources. To structure this effort, we developed our online catalogue as a supporting tool for gathering metadata through organized public hackathons. We present our development process; analyses of the resulting resource metadata, including distributions over languages, regions, and resource types; and our lessons learned in this endeavor.
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Applications; Collaborative Resource Construction & Crowdsourcing; LR Infrastructures and Architectures; Systems; Tools
URL: https://hal.inria.fr/hal-03550289
BASE
Hide details
2
END-TO-END SPEECH RECOGNITION FROM FEDERATED ACOUSTIC MODELS
In: The International Conference on Acoustics, Speech, & Signal Processing (ICASSP) ; https://hal.archives-ouvertes.fr/hal-03601224 ; The International Conference on Acoustics, Speech, & Signal Processing (ICASSP), May 2022, Singapour, Singapore (2022)
BASE
Show details
3
Space omics research in Europe: contributions, geographical distribution and ESA member state funding schemes
BASE
Show details
4
From FreEM to D'AlemBERT ; From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French
In: Proceedings of the 13th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-03596653 ; Proceedings of the 13th Language Resources and Evaluation Conference, European Language Resources Association, Jun 2022, Marseille, France (2022)
BASE
Show details
5
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus
In: https://hal.inria.fr/hal-03536361 ; 2022 (2022)
BASE
Show details
6
Bilingualism and translation ...
Togato, Giulia; Macizo Soria, Pedro. - : Zenodo, 2022
BASE
Show details
7
Bilingualism and translation ...
Togato, Giulia; Macizo Soria, Pedro. - : Zenodo, 2022
BASE
Show details
8
Bilingüismo y traducción ...
Togato, Giulia; Macizo Soria, Pedro. - : Zenodo, 2022
BASE
Show details
9
Bilingüismo y traducción ...
Togato, Giulia; Macizo Soria, Pedro. - : Zenodo, 2022
BASE
Show details
10
Arguing About “COVID” ; Metalinguistic Arguments on What Counts as a “COVID-19 Death”
Lewiński, Marcin; Abreu, Pedro. - : Springer, 2022
BASE
Show details
11
Fifty Definitions of English Learner: A Proposed Solution to Inconsistent State-by-State Systems in the United States for Classifying Students Who Speak English as a Second Language
In: Educational Considerations (2022)
BASE
Show details
12
Science and Heritage Language Integrated Learning (SHLIL): Evidence for the Effectiveness of an Innovative Science Outreach Program for Migrant Students ...
Schiefer, Julia; Philipp, Jana; Moscoso, Joana. - : Marine Data Archive, 2022
BASE
Show details
13
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus ...
BASE
Show details
14
An NLP Solution to Foster the Use of Information in Electronic Health Records for Efficiency in Decision-Making in Hospital Care ...
BASE
Show details
15
72 - A Corpus of Neutral Voice Speech in Brazilian Portuguese ...
PROPOR 2022 2022; Antelo, Álvaro; Biscainho, Luiz. - : Underline Science Inc., 2022
BASE
Show details
16
Larinia tumulus Framenau & Castanheira 2022, n. sp. ...
BASE
Show details
17
Larinia tumulus Framenau & Castanheira 2022, n. sp. ...
BASE
Show details
18
MAESTRO: Matched Speech Text Representations through Modality Matching ...
BASE
Show details
19
Rare Disorders: Diagnosis and Therapeutic Planning for Patients Seeking Orthodontic Treatment
In: Journal of Clinical Medicine; Volume 11; Issue 6; Pages: 1527 (2022)
BASE
Show details
20
The Natural, Artificial, and Social Domains of Intelligence: A Triune Approach
In: Proceedings; Volume 81; Issue 1; Pages: 2 (2022)
BASE
Show details

Page: 1 2 3 4 5...148

Catalogues
Bibliographies
Linked Open Data catalogues
Online resources
Open access documents
2.945
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern