1 |
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
|
|
|
|
In: ISSN: 2375-4699 ; EISSN: 2375-4702 ; ACM Transactions on Asian and Low-Resource Language Information Processing ; https://hal.inria.fr/hal-03616853 ; ACM Transactions on Asian and Low-Resource Language Information Processing, ACM, In press, ⟨10.1145/3523179⟩ (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Didactique de l’oral et interactions : rétrospective méthodologie et expérimentation en contexte algérien
|
|
|
|
In: ISSN: 0077-2712 ; EISSN: 1952-4250 ; Mélanges CRAPEL ; https://hal.archives-ouvertes.fr/hal-03630135 ; Mélanges CRAPEL, Centre de recherches et d'applications pédagogiques en langues, 2022, Enseignement du français parlé aujourd’hui : Recherches et expériences de terrain, Mélanges CRAPEL (43/1) ; https://www.atilf.fr/wp-content/uploads/publications/MelangesCrapel/Melanges_43_1_6_Cortier_et_al.pdf (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Language identification, a tool for Corsican and for the evaluation of linguistic resources ; L'identification de langue, un outil au service du corse et de l'évaluation des ressources linguistiques
|
|
|
|
In: Traitement Automatique des Langues ; https://hal.archives-ouvertes.fr/hal-03633290 ; Traitement Automatique des Langues, 2022, Diversité Linguistique, 62 (3), pp.13-37 ; https://www.atala.org/content/diversité-linguistique-linguistic-diversity-natural-language-processing (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Machine Translation and Gender biases in video game localisation: a corpus-based analysis
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03540605 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
8 |
Gendered body language in children’s literature over time ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Measuring Terminology Consistency in Translated Corpora: Implementation of the Herfindahl-Hirshman Index
|
|
|
|
In: Information; Volume 13; Issue 2; Pages: 43 (2022)
|
|
BASE
|
|
Show details
|
|
10 |
Frequency, Informativity and Word Length: Insights from Typologically Diverse Corpora
|
|
|
|
In: Entropy; Volume 24; Issue 2; Pages: 280 (2022)
|
|
Abstract:
Zipf’s law of abbreviation, which posits a negative correlation between word frequency and length, is one of the most famous and robust cross-linguistic generalizations. At the same time, it has been shown that contextual informativity (average surprisal given previous context) is more strongly correlated with word length, although this tendency is not observed consistently, depending on several methodological choices. The present study examines a more diverse sample of languages than the previous studies (Arabic, Finnish, Hungarian, Indonesian, Russian, Spanish and Turkish). I use large web-based corpora from the Leipzig Corpora Collection to estimate word lengths in UTF-8 characters and in phonemes (for some of the languages), as well as word frequency, informativity given previous word and informativity given next word, applying different methods of bigrams processing. The results show different correlations between word length and the corpus-based measure for different languages. I argue that these differences can be explained by the properties of noun phrases in a language, most importantly, by the order of heads and modifiers and their relative morphological complexity, as well as by orthographic conventions.
|
|
Keyword:
corpora; frequency; informativity; linguistic typology; n-grams; Zipf’s law of abbreviation
|
|
URL: https://doi.org/10.3390/e24020280
|
|
BASE
|
|
Hide details
|
|
11 |
Text+: Language- and text-based Research Data Infrastructure ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Text+: Language- and text-based Research Data Infrastructure ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Text+: Language- and text-based Research Data Infrastructure ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
DCT vs. corpus orales: reflexiones metodológicas sobre el estudio de los actos de habla ; DCT vs. spoken corpora: methodologic reflections on the study of speech acts
|
|
|
|
In: Pragmalingüística, (29), 377-395 (2022)
|
|
BASE
|
|
Show details
|
|
15 |
Source language difficulties in learner translation: Evidence from an error-annotated corpus
|
|
|
|
BASE
|
|
Show details
|
|
16 |
ANLIzing the Adversarial Natural Language Inference Dataset
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2022)
|
|
BASE
|
|
Show details
|
|
17 |
Chinese Idioms: Stepping Into L2 Student’s Shoes
|
|
|
|
In: Acta Linguistica Asiatica, Vol 12, Iss 1 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
19 |
A Navigation Tool for Exploring Semantic Web Corpora
|
|
|
|
In: Proceedings of the ISWC 2021 Posters, Demos and Industry Tracks ; https://hal.univ-lorraine.fr/hal-03485155 ; Proceedings of the ISWC 2021 Posters, Demos and Industry Tracks, Oct 2021, Virtual conference, France (2021)
|
|
BASE
|
|
Show details
|
|
20 |
A Semantic Web Navigation Tool for Exploring the Henri Poincaré Correspondence Corpus
|
|
|
|
In: Proceedings of the International Joint Workshop on Semantic Web and Ontology Design for Cultural Heritage ; https://hal.univ-lorraine.fr/hal-03406713 ; Proceedings of the International Joint Workshop on Semantic Web and Ontology Design for Cultural Heritage, Antonis Bikakis, Roberta Ferrario, Stéphane Jean, Béatrice Markhoff, Alessandro Mosca, Marianna and Nicolosi Asmundo, Sep 2021, Bolzano, Italy (2021)
|
|
BASE
|
|
Show details
|
|
|
|