1 |
Tone realization in Mandarin speech: a large corpus based study of disyllabic words
|
|
|
|
In: The 12th International Symposium on Chinese Spoken Language Processing (ISCSLP 2021) ; https://hal.archives-ouvertes.fr/hal-03153413 ; The 12th International Symposium on Chinese Spoken Language Processing (ISCSLP 2021), Jan 2021, Hong Kong, China (2021)
|
|
BASE
|
|
Show details
|
|
2 |
Synchronic Fortition in Five Romance Languages? A Large Corpus-Based Study of Word-Initial Devoicing
|
|
|
|
In: Proceedings of Interspeech ; Interspeech 2021 ; https://hal.sorbonne-universite.fr/hal-03339852 ; Interspeech 2021, Aug 2021, Brno, Czech Republic. pp.996-1000, ⟨10.21437/Interspeech.2021-939⟩ (2021)
|
|
BASE
|
|
Show details
|
|
3 |
A corpus-based study of the distribution of word-final schwa in Standard French and what it teaches us about its phonological status
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Frequency-Dependent Regularization in Syntactic Constructions
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Distribution and deletion of /ʁ/ in fluent speech
|
|
|
|
In: Studii de Lingvistica, Vol 11, Pp 39-53 (2021) (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Mandarin Lexical Tones: A Corpus-Based Study of Word Length, Syllable Position and Prosodic Position on Duration
|
|
|
|
In: Interspeech 2020 ; https://hal.archives-ouvertes.fr/hal-03153402 ; Interspeech 2020, Oct 2020, Shanghai, China. pp.1908-1912, ⟨10.21437/Interspeech.2020-1614⟩ (2020)
|
|
BASE
|
|
Show details
|
|
7 |
Is word-final schwa in Standard French a “phonetic lubricant”? ; Le schwa final en français standard est-il un «lubrifiant phonétique»?
|
|
|
|
In: Actes du 7e Congrès Mondial de Linguistique Française ; 7e Congrès Mondial de Linguistique Française - CMLF 2020 ; https://hal.archives-ouvertes.fr/hal-02931786 ; 7e Congrès Mondial de Linguistique Française - CMLF 2020, Jul 2020, Montpellier, France. pp.id. 09004, ⟨10.1051/shsconf/20207809004⟩ ; https://www.linguistiquefrancaise.org/ (2020)
|
|
BASE
|
|
Show details
|
|
8 |
Lénition et fortition des occlusives en coda finale dans deux langues romanes : le français et le roumain
|
|
|
|
In: Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole ; 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole ; https://hal.archives-ouvertes.fr/hal-02798551 ; 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole, 2020, Nancy, France. pp.289-298 (2020)
|
|
BASE
|
|
Show details
|
|
9 |
Ongoing phonologization of word-final voicing alternations in two Romance languages: Romanian and French
|
|
|
|
In: Interspeech 2020 ; https://hal.archives-ouvertes.fr/hal-02977812 ; Interspeech 2020, Oct 2020, Shanghai, China. ⟨10.21437/Interspeech.2020-1460⟩ (2020)
|
|
BASE
|
|
Show details
|
|
11 |
POST-CONSONANTAL WORD-FINAL /ʁ/ REALIZATION IN FRENCH: CONTRIBUTIONS OF LARGE CORPORA
|
|
|
|
In: Proceedings of the International Congress of Phonetic Sciences ICPhS 2019 ; International Congress of Phonetic Sciences ICPhS 2019 ; https://hal.archives-ouvertes.fr/hal-03171147 ; International Congress of Phonetic Sciences ICPhS 2019, Aug 2019, Melbourne, Australia (2019)
|
|
BASE
|
|
Show details
|
|
12 |
"Gra[f]e!" Word-final devoicing of obstruents in Standard French: An acoustic study based on large corpora
|
|
|
|
In: Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-02336119 ; Annual Conference of the International Speech Communication Association, ISCA, Sep 2019, Graz, Austria. DOI:10.21437/Interspeech.2019-2329 (2019)
|
|
BASE
|
|
Show details
|
|
13 |
Speech Style Effects on Local and Non-local Coarticulation in French
|
|
|
|
In: Studies on Speech Production (11th International Seminar, ISSP 2017, Tianjin, China, October 16-19, 2017, Revised Selected Papers) ; https://hal.archives-ouvertes.fr/hal-02427702 ; Studies on Speech Production (11th International Seminar, ISSP 2017, Tianjin, China, October 16-19, 2017, Revised Selected Papers), pp.121-133, 2018, ⟨10.1007/978-3-030-00126-1_12⟩ (2018)
|
|
BASE
|
|
Show details
|
|
14 |
Schwa Realization in French: Using Automatic Speech Processing to Study Phonological and Socio-linguistic Factors in Large Corpora
|
|
|
|
In: Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-01837179 ; Annual Conference of the International Speech Communication Association , ISCA, Aug 2017, Stockholm, Sweden (2017)
|
|
BASE
|
|
Show details
|
|
15 |
Learning from Noisy Data in Statistical Machine Translation
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Rôle des contextes lexical et post-lexical dans la réalisation du schwa : apports du traitement automatique de grands corpus
|
|
|
|
In: 31èmes Journées d'Etudes sur la Parole ; https://halshs.archives-ouvertes.fr/halshs-01401348 ; 31èmes Journées d'Etudes sur la Parole, Jul 2016, Paris, France. pp.633-641 (2016)
|
|
BASE
|
|
Show details
|
|
17 |
On Very Large Corpora of French
|
|
|
|
In: History of Quantitative Linguistics in France ; https://hal.univ-cotedazur.fr/hal-01362713 ; Jacqueline Léon; Sylvain Loiseau. History of Quantitative Linguistics in France, RAM Verlag, pp.137-156, 2016, Studies in Quantitative Linguistics, 978-3-942303-48-4 (2016)
|
|
Abstract:
International audience ; Concerning French, it would be natural to turn to the French National Library, which is rich in 14 million documents including 11 million books on the Tolbiac site. This would be comparable to Google Books offer, if access was similarly electronic. Unfortunately the number of documents accessible on the Internet, mainly in the Gallica base, is far from reaching that figure. In reality, the most reliable texts of Gallica, aside from newer ones transmitted by publishers in digital form, are those coming from the Frantext legacy. Those owe nothing to scanning, whose invention in 1974 by Ray Kurzweil is after the initial capturing, carried out by keyboardists on perforated tape. This manual input, duly revised and corrected for fifty years, resisted all changes of systems or supports. To that reliability of texts, even when they are older editions, Frantext adds many other virtues: a balance between eras, allowing comparisons and pro¬viding a solid basis for analysing the evolution of the language; covering a wide chronological span of five centuries of publication; a desired homogeneity of texts whose choice is governed by specific criteria, concerning genre and language level; consistency in the services offered to the scientific community, the same soft¬ware being kept unchanged for twenty years on the Internet; a moderate increase and a controlled enrichment of data ensuring compatibility with the previous treatment. The catalogue of Frantext is now expanding by adding more recent production: it has currently 4000 references and 270 million words. The BNF weighs ten times more; Google Books is a thousand times more and its pace of growth is much faster.But other Institutional corpora ( we study Encarta, Wikipedia and some ones) are like huge tanks that distribute their content, word by word, as would a dictionary. The consultation can be only punctual. They do not allow any statistical overview, no overall analysis, as can be seen from three gigantic corpora of the French language built respectively in Germany (Wortschatz), in UK (Sketchengine) and in the USA (Google Books). Wortschatz was build at the University of Leipzig (with collaborators from the University of Neuchâtel). It is a corpus of the French language with 700 million words, 36 million sentences from newspapers (19 million), web (11 million) and Wikipedia (6 million). Sketchengine is an English website which offers (together with corpora of other languages) a corpus of the French language. Like many web-based corpora, Sketchengine is harvesting the web in order to build a large representative corpus of a language rather than to build corpora targeted at analyzing lexical innovations. Culturomics (or Google Books) is the biggest corpus of the French language, with a size 100 times greater than that of Sketchengine (89 billion words in 2012). One can be enthusiastic given the huge size of the corpora. But the doubt remains as to the validity of the statistical results. The doubt grows especial¬ly as the composition of the corpora are still “black boxes”. If the choices underlying the building of the corpus under scrutiny are unknown, the size of the data does not prevent the result from being very difficult to interpret.
|
|
Keyword:
[SHS.LANGUE]Humanities and Social Sciences/Linguistics; [SHS.LITT]Humanities and Social Sciences/Literature; [SHS]Humanities and Social Sciences; [STAT]Statistics [stat]; Culturomics; Encarta; Frantext; french language; French National Library; Google Books; large corpora; Sketchengine; statistics; textual databases; Wikipedia; Wortschatz
|
|
URL: https://hal.univ-cotedazur.fr/hal-01362713/document https://hal.univ-cotedazur.fr/hal-01362713/file/Brunet%20V3%20esw_JL%20SL.pdf https://hal.univ-cotedazur.fr/hal-01362713
|
|
BASE
|
|
Hide details
|
|
18 |
Phoneme deletion and fusion in conversational speech
|
|
|
|
In: Experimental Approaches to Perception and Production of Language Variation 2013 ; https://hal.archives-ouvertes.fr/hal-01510214 ; Experimental Approaches to Perception and Production of Language Variation 2013, Mar 2013, Copenhague, Denmark (2013)
|
|
BASE
|
|
Show details
|
|
19 |
Dynamics, causation, duration in the predicate-argument structure of verbs : a computational approach based on parallel corpora ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Dynamics, causation, duration in the predicate-argument structure of verbs : a computational approach based on parallel corpora
|
|
|
|
BASE
|
|
Show details
|
|
|
|