1 |
Delexicalized Word Embeddings for Cross-lingual Dependency Parsing
|
|
|
|
In: EACL ; https://hal.inria.fr/hal-01590639 ; EACL, Apr 2017, Valencia, Spain. pp.241 - 250, ⟨10.18653/v1/E17-1023⟩ ; http://eacl2017.org/ (2017)
|
|
BASE
|
|
Show details
|
|
2 |
IRISA at DeFT2017 : classification systems of increasing complexity ; Participation de l'IRISA à DeFT2017 : systèmes de classification de complexité croissante
|
|
|
|
In: DeFT 2017 - Défi Fouille de texte ; https://hal.archives-ouvertes.fr/hal-01643993 ; DeFT 2017 - Défi Fouille de texte, Jun 2017, Orléans, France. pp.1-10 (2017)
|
|
BASE
|
|
Show details
|
|
3 |
Invariance: a Theoretical Approach for Coding Sets of Words Modulo Literal (Anti)Morphisms
|
|
|
|
In: Springer, LNCS. ; https://hal-normandie-univ.archives-ouvertes.fr/hal-02117030 ; Springer, LNCS., 2017, pp.214-227 (2017)
|
|
BASE
|
|
Show details
|
|
4 |
Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding
|
|
Ju, Yiting. - : eScholarship, University of California, 2017
|
|
In: Ju, Yiting. (2017). Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding. 0035: Geography. Retrieved from: http://www.escholarship.org/uc/item/4w60s702 (2017)
|
|
BASE
|
|
Show details
|
|
5 |
An empirical study of the Algerian dialect of Social network
|
|
|
|
In: ICNLSSP 2017 - International Conference on Natural Language, Signal and Speech Processing ; https://hal.inria.fr/hal-01659997 ; ICNLSSP 2017 - International Conference on Natural Language, Signal and Speech Processing, Dec 2017, Casablanca, Morocco ; http://icnlssp.isga.ma (2017)
|
|
Abstract:
International audience ; In this paper, we present analysis on the use of Algerian dialect in Youtube. To do so, we harvested a corpus of 17M of words. This latter was exploited to extract a comparable Algerian corpus, named CALYOU by aligning pairs of sentences written in Latin and Arabic. This one was built by using a multilingual word embeddings approach. Several experiments have been conducted to fix the parameters of the Continuous Bag of Words approach that will be discussed in this article. The method we proposed achieved a performance of 41% in terms of Recall. In the following, we present several figures on the collected data that led to several unexpected results. In fact, 51% of the vocabulary words are written in Latin script and 82% of the total comments are subject to the phenomenon of code-switching.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Algerian dialect; Code-switching; comparable corpora; Word embedding
|
|
URL: https://hal.inria.fr/hal-01659997/file/ICNLSSP2017_paper_16.pdf https://hal.inria.fr/hal-01659997 https://hal.inria.fr/hal-01659997/document
|
|
BASE
|
|
Hide details
|
|
7 |
Linguistic Knowledge Transfer for Enriching Vector Representations
|
|
|
|
In: http://rave.ohiolink.edu/etdc/view?acc_num=osu1500571436042414 (2017)
|
|
BASE
|
|
Show details
|
|
11 |
Induction de lexiques bilingues à partir de corpus comparables et parallèles
|
|
|
|
BASE
|
|
Show details
|
|
|
|