81 |
A shared substrate between Greek and Italic
|
|
|
|
In: ISSN: 0019-7262 ; EISSN: 1613-0405 ; Indogermanische Forschungen ; https://hal.inria.fr/hal-01621467 ; Indogermanische Forschungen, De Gruyter, 2017, 122 (1), pp.29-60. ⟨10.1515/if-2017-0002⟩ (2017)
|
|
BASE
|
|
Show details
|
|
82 |
Improving neural tagging with lexical information
|
|
|
|
In: 15th International Conference on Parsing Technologies ; https://hal.inria.fr/hal-01592055 ; 15th International Conference on Parsing Technologies, Sep 2017, Pisa, Italy. pp.25-31 ; http://compling.ucdavis.edu/iwpt2017/ (2017)
|
|
BASE
|
|
Show details
|
|
83 |
Universal Dependencies 2.1
|
|
|
|
In: https://hal.inria.fr/hal-01682188 ; 2017 (2017)
|
|
BASE
|
|
Show details
|
|
84 |
Paris and Stanford at EPE 2017: Downstream Evaluation of Graph-based Dependency Representations
|
|
|
|
In: EPE 2017 - The First Shared Task on Extrinsic Parser Evaluation ; https://hal.inria.fr/hal-01592051 ; EPE 2017 - The First Shared Task on Extrinsic Parser Evaluation, Sep 2017, Pisa, Italy. pp.47-59 ; http://epe.nlpl.eu (2017)
|
|
BASE
|
|
Show details
|
|
85 |
Computational methods for descriptive and theoretical morphology: a brief introduction
|
|
|
|
In: ISSN: 1871-5621 ; EISSN: 1871-5656 ; Morphology ; https://hal.inria.fr/hal-01628253 ; Morphology, Springer Verlag, 2017, Computational methods for descriptive and theoretical morphology, 27 (4), pp.1-7. ⟨10.1017/CBO9781139248860⟩ (2017)
|
|
BASE
|
|
Show details
|
|
86 |
Annotating omission in statement pairs
|
|
|
|
In: 11th Linguistic Annotation Workshop ; https://hal.inria.fr/hal-01584035 ; 11th Linguistic Annotation Workshop, Apr 2017, Valencia, Spain. pp.41-45 (2017)
|
|
BASE
|
|
Show details
|
|
87 |
Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin
|
|
|
|
In: Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature ; https://hal.inria.fr/hal-01570614 ; Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Aug 2017, Vancouver, Canada. pp.89 - 94, ⟨10.18653/v1/W17-2212⟩ ; https://sighum.wordpress.com/events/latech-clfl-2017/ (2017)
|
|
BASE
|
|
Show details
|
|
88 |
Milk and the Indo-Europeans
|
|
|
|
In: Language Dispersal Beyond Farming ; https://hal.inria.fr/hal-01667476 ; Martine Robeets; Alexander Savalyev Language Dispersal Beyond Farming, John Benjamins Publishing Company, pp.291-311, 2017, 978 90 272 1255 9. ⟨10.1075/z.215.13gar⟩ (2017)
|
|
BASE
|
|
Show details
|
|
91 |
Milk and the Indo-Europeans
|
|
|
|
In: Language Dispersal Beyond Farming ; https://hal.inria.fr/hal-01667476 ; Martine Robeets; Alexander Savalyev Language Dispersal Beyond Farming, John Benjamins Publishing Company, pp.291-311, 2017, 978 90 272 1255 9. ⟨10.1075/z.215.13gar⟩ (2017)
|
|
BASE
|
|
Show details
|
|
92 |
From Noisy Questions to Minecraft Texts: Annotation Challenges in Extreme Syntax Scenarios
|
|
|
|
In: 2nd Workshop on Noisy User-generated Text (W-NUT) at CoLing 2016 ; https://hal.inria.fr/hal-01584054 ; 2nd Workshop on Noisy User-generated Text (W-NUT) at CoLing 2016, Dec 2016, Osaka, Japan (2016)
|
|
BASE
|
|
Show details
|
|
93 |
External Lexical Information for Multilingual Part-of-Speech Tagging ...
|
|
|
|
BASE
|
|
Show details
|
|
94 |
Constructing a poor man’s wordnet in a resource-rich world
|
|
|
|
In: ISSN: 1574-020X ; EISSN: 1574-0218 ; Language Resources and Evaluation ; https://hal.inria.fr/hal-01174492 ; Language Resources and Evaluation, Springer Verlag, 2015, 49 (3), pp.601-635. ⟨10.1007/s10579-015-9295-6⟩ (2015)
|
|
BASE
|
|
Show details
|
|
95 |
Could Greek and Italic share a same Indo-European substratum?
|
|
|
|
In: 22nd International Conference on Historical Linguistics ; https://hal.inria.fr/hal-01256310 ; 22nd International Conference on Historical Linguistics, Jul 2015, Naples, Italy ; http://www.ichl22.unina.it (2015)
|
|
BASE
|
|
Show details
|
|
96 |
Developing a French FrameNet: Methodology and First results
|
|
|
|
In: LREC - The 9th edition of the Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-01022385 ; LREC - The 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland (2014)
|
|
BASE
|
|
Show details
|
|
97 |
A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages
|
|
|
|
In: Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-01022298 ; Language Resources and Evaluation Conference, European Language Resources Association, May 2014, Reykjavik, Iceland (2014)
|
|
BASE
|
|
Show details
|
|
98 |
Data-driven Synset Induction and Disambiguation for Wordnet Development
|
|
|
|
In: ISSN: 1574-020X ; EISSN: 1574-0218 ; Language Resources and Evaluation ; https://hal.inria.fr/hal-01088000 ; Language Resources and Evaluation, Springer Verlag, 2014, 48 (4), pp.655-677. ⟨10.1007/s10579-014-9291-2⟩ (2014)
|
|
BASE
|
|
Show details
|
|
99 |
Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use
|
|
|
|
In: Human Language Technology Challenges for Computer Science and Linguistics ; https://hal.inria.fr/hal-01053047 ; Vetulani, Zygmunt and Mariani, Joseph. Human Language Technology Challenges for Computer Science and Linguistics, 8387, Springer International Publishing, pp.303-314, 2014, Lecture Notes in Computer Science, 978-3-319-08957-7. ⟨10.1007/978-3-319-08958-4_25⟩ (2014)
|
|
BASE
|
|
Show details
|
|
100 |
The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres
|
|
|
|
In: ISSN: 0175-1336 ; Journal for language technology and computational linguistics ; https://halshs.archives-ouvertes.fr/halshs-00953507 ; Journal for language technology and computational linguistics, GSCL (Gesellschaft für Sprachtechnologie und Computerlinguistik) 2014, 29 (2), pp.1-30 ; http://www.jlcl.org/2014_Heft2/Heft2-2014.pdf (2014)
|
|
Abstract:
Final version to Special Issue of JLCL (Journal of Language Technology and Computational Linguistics (JLCL, http://jlcl.org/): BUILDING AND ANNOTATING CORPORA OF COMPUTER-MEDIATED DISCOURSE: Issues and Challenges at the Interface of Corpus and Computational Linguistics (ed. by Michael Beißwenger, Nelleke Oostdijk, Angelika Storrer & Henk van den Heuvel) ; International audience ; The CoMeRe project aims to build a kernel corpus of different Computer-Mediated Com-munication (CMC) genres with interactions in French as the main language, by assembling interactions stemming from networks such as the Internet or telecommunication, as well as mono and multimodal, synchronous and asynchronous communications. Corpora are assem-bled using a standard, thanks to the TEI (Text Encoding Initiative) format. This implies extending, through a European endeavor, the TEI model of text, in order to encompass the richest and the more complex CMC genres. This paper presents the Interaction Space model. We explain how this model has been encoded within the TEI corpus header and body. The model is then instantiated through the first four corpora we have processed: three corpora where interactions occurred in single-modality environments (text chat, or SMS systems) and a fourth corpus where text chat, email and forum modalities were used simultaneously. The CoMeRe project has two main research perspectives: Discourse Analysis, only alluded to in this paper, and the linguistic study of idiolects occurring in different CMC genres. As NLP algorithms are an indispensable prerequisite for such research, we present our motiva-tions for applying an automatic annotation process to the CoMeRe corpora. Our wish to guarantee generic annotations meant we did not consider any processing beyond morphosyn-tactic labelling, but prioritized the automatic annotation of any freely variant elements within the corpora. We then turn to decisions made concerning which annotations to make for which units and describe the processing pipeline for adding these. All CoMeRe corpora are verified, thanks to a staged quality control process, designed to allow corpora to move from one project phase to the next. Public release of the CoMeRe corpora is a short-term goal: corpora will be integrated into the forthcoming French National Reference Corpus, and disseminated through the national linguistic infrastructure ORTOLANG. We, therefore, highlight issues and decisions made concerning the OpenData perspective.
|
|
Keyword:
[SHS.LANGUE]Humanities and Social Sciences/Linguistics; CMC; CoMeRe; Computer Mediated Communication; corpus
|
|
URL: https://halshs.archives-ouvertes.fr/halshs-00953507v2/file/cmr-article-jlcl-v140912-hal.pdf https://halshs.archives-ouvertes.fr/halshs-00953507 https://halshs.archives-ouvertes.fr/halshs-00953507v2/document
|
|
BASE
|
|
Hide details
|
|
|
|