Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5...14

Hits 1 – 20 of 264

1	Language identification, a tool for Corsican and for the evaluation of linguistic resources ; L'identification de langue, un outil au service du corse et de l'évaluation des ressources linguistiques
	Kevers, Laurent
	In: Traitement Automatique des Langues ; https://hal.archives-ouvertes.fr/hal-03633290 ; Traitement Automatique des Langues, 2022, Diversité Linguistique, 62 (3), pp.13-37 ; https://www.atala.org/content/diversité-linguistique-linguistic-diversity-natural-language-processing (2022)
	Abstract: International audience ; The constitution of corpora is one of the first priorities faced by less-resourced languages. The emergence of Internet-based resources of increasing size and covering more and more languages may suggest that this issue has been resolved, but this is not the case. Following Caswell et al. (2021), who evaluated several large resources, including one with Corsican content, we conducted an analysis of two corpora including this language: An Crúbadán and W2C. In parallel to a manual evaluation, we considered the possibility of using one or more language identification modules to filter the content of these resources, which turns out to be possible but at the cost of low recall. For this task, we tested and re-trained various systems in order to adapt them to Corsican. This work makes it possible to provide a model allowing the identification of 17 European languages as well as Corsican ; La constitution de corpus est une des premières priorités que rencontrent les langues peu dotées. L’émergence de ressources issues d’Internet, de tailles de plus en plus imposantes et couvrant de nombreuses langues, peut laisser penser que ce point est désormais résolu, ce qui n’est pas le cas. À la suite de Caswell et al. (2021), qui ont évalué plusieurs ressources de grande envergure, dont une disposant de contenu corse, nous avons mené une analyse de deux corpus incluant cette langue : An Crúbadán et W2C. Parallèlement à une évaluation manuelle, nous avons estimé la possibilité d’utiliser un ou plusieurs modules d’identification de langue afin de filtrer le contenu de ces ressources, ce qui s’avère possible mais au prix d’un rappel peu élevé. Pour cette tâche, nous avons testé et réentraîné divers systèmes afin de les adapter au mieux au corse. Ce travail nous permet de mettre à disposition un modèle capable d’identifier le corse ainsi que 17 autres langues européennes.
	Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; corpora; corpus; corse; Corsican; identification de langue; language identification; langues peu dotées; less-resourced languages; qualité; quality
	URL: https://hal.archives-ouvertes.fr/hal-03633290/file/TAL_62_3_1_Kevers_HAL.pdf https://hal.archives-ouvertes.fr/hal-03633290/document https://hal.archives-ouvertes.fr/hal-03633290
	BASE
	Hide details

2	Machine Translation and Gender biases in video game localisation: a corpus-based analysis
	Rivas Ginel, María,; Theroine, Sarah
	In: https://hal.archives-ouvertes.fr/hal-03540605 ; 2022 (2022)
	BASE
	Show details

3	Lothian Diaries Dataset 1 (May-September 2020) ...
	Hall-Lew, Lauren. - : Edinburgh DataVault, 2022
	BASE
	Show details

4	Frequency, Informativity and Word Length: Insights from Typologically Diverse Corpora
	Natalia Levshina
	In: Entropy; Volume 24; Issue 2; Pages: 280 (2022)
	BASE
	Show details

5	Text+: Language- and text-based Research Data Infrastructure ...
	Hinrichs, Erhard; Leinen, Peter; Geyken, Alexander. - : Zenodo, 2022
	BASE
	Show details

6	Text+: Language- and text-based Research Data Infrastructure ...
	Hinrichs, Erhard; Leinen, Peter; Geyken, Alexander. - : Zenodo, 2022
	BASE
	Show details

7	Text+: Language- and text-based Research Data Infrastructure ...
	Hinrichs, Erhard; Leinen, Peter; Geyken, Alexander. - : Zenodo, 2022
	BASE
	Show details

8	ANLIzing the Adversarial Natural Language Inference Dataset
	Williams, Adina; Thrush, Tristan; Kiela, Douwe
	In: Proceedings of the Society for Computation in Linguistics (2022)
	BASE
	Show details

9	Control in free adjuncts: the 'dangling modifier' in English ...
	Donaldson, James. - : The University of Edinburgh, 2021
	BASE
	Show details

10	Loose and tight languages: A typology based on associations between constructions and lexemes ...
	Levshina, Natalia; Hawkins, John A.. - : Zenodo, 2021
	BASE
	Show details

11	Loose and tight languages: A typology based on associations between constructions and lexemes ...
	Levshina, Natalia; Hawkins, John A.. - : Zenodo, 2021
	BASE
	Show details

12	Community Involvement in Research Infrastructures: The User Story Call for Text+ ...
	Rißler-Pipka, Nanette; Barthauer, Raisa; Buddenbohm, Stefan. - : Zenodo, 2021
	BASE
	Show details

13	Community Involvement in Research Infrastructures: The User Story Call for Text+ ...
	Rißler-Pipka, Nanette; Barthauer, Raisa; Buddenbohm, Stefan. - : Zenodo, 2021
	BASE
	Show details

14	You’re a bitch, the stallion said: estudio contrastivo inglés-español sobre el uso sexista del lenguaje.
	Alonso González, María; Baliña Ben, Lucía. - 2021
	BASE
	Show details

15	Control in free adjuncts: the 'dangling modifier' in English
	Donaldson, James. - : The University of Edinburgh, 2021
	BASE
	Show details

16	Corpora in the Classroom - the Case of the Serbian Language for Italian Speakers
	Perisic Olja. - : URSS, 2021. : country:RUS, 2021. : place:Mosca, 2021
	BASE
	Show details

17	Clausal Complementation in Nepal Bhasa
	Zhang, Borui. - 2021
	BASE
	Show details

18	Overview of AMALGUM – Large Silver Quality Annotations across English Genres
	Gessler, Luke D; Peng, Siyao; Liu, Yang...
	In: Proceedings of the Society for Computation in Linguistics (2021)
	BASE
	Show details

19	Boosting English Vocabulary Knowledge through Corpus-Aided Word Formation Practice
	González Martínez, Ana; Gandón-Chapela, Evelyn
	In: RAEL: revista electrónica de lingüística aplicada, ISSN 1885-9089, Vol. 20, Nº. 1, 2021, pags. 49-70 (2021)
	BASE
	Show details

20	Semantic prosody and collocation: A corpus study of the near-synonyms persist and persevere
	Supakorn Phoocharoensil
	In: Eurasian Journal of Applied Linguistics, Vol 7, Iss 1, Pp 240-258 (2021) (2021)
	BASE
	Show details

Page: 1 2 3 4 5...14

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern