1 |
Compositional Translation of Single-Word Complex Terms Using Multilingual Splitting
|
|
|
|
In: ISSN: 0929-9971 ; Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication ; https://hal.archives-ouvertes.fr/hal-01171113 ; Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication , John Benjamins Publishing, 2015, Terminology across languages and domains, 21 (2), 30 p ; https://benjamins.com/#catalog/journals/term (2015)
|
|
BASE
|
|
Show details
|
|
2 |
TTC Web Platform: from Corpus Compilation to Bilingual Terminologies for MT and CAT Tools
|
|
|
|
In: Actes du colloque Tralogy : Anticiper les technologies pour la traduction ; Tralogy II. Trouver le sens : où sont nos manques et nos besoins respectifs ? ; https://hal.archives-ouvertes.fr/hal-00820331 ; Tralogy II. Trouver le sens : où sont nos manques et nos besoins respectifs ?, Jan 2013, Paris, France. 14 p (2013)
|
|
BASE
|
|
Show details
|
|
3 |
Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking
|
|
|
|
In: COLING 2012 ; https://hal.archives-ouvertes.fr/hal-00743807 ; COLING 2012, Dec 2012, Mumbai, India. pp.745-762 (2012)
|
|
BASE
|
|
Show details
|
|
4 |
Compiling terminological data using comparable corpora: from term extraction to dictionaries
|
|
|
|
In: 34th Annual Conference of the German Linguistic Society (DGfS) ; https://hal.archives-ouvertes.fr/hal-00819590 ; 34th Annual Conference of the German Linguistic Society (DGfS), Mar 2012, Frankfurt, Germany (2012)
|
|
BASE
|
|
Show details
|
|
5 |
Terminology Extraction, Translation Tools and Comparable Corpora: TTC concept, midterm progress and achieved results
|
|
|
|
In: LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS) ; https://hal.archives-ouvertes.fr/hal-00819909 ; LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS), May 2012, Istanbul, Turkey. 4 p (2012)
|
|
BASE
|
|
Show details
|
|
6 |
Reference Lists for the Evaluation of Term Extraction Tools
|
|
|
|
In: Proceedings of the 10th Terminology and Knowledge Engineering Conference (TKE'12) ; Proceedings of the 10th Terminology and Knowledge Engineering Conference (TKE 12) ; Terminology and Knowledge Engineering Conference (TKE) ; https://hal.archives-ouvertes.fr/hal-00816566 ; Terminology and Knowledge Engineering Conference (TKE), Jun 2012, Madrid, Spain. http://www.oeg-upm.net/tke2012/proceedings (2012)
|
|
BASE
|
|
Show details
|
|
7 |
METRICC: Harnessing Comparable Corpora for Multilingual Lexicon Development
|
|
|
|
In: 15th EURALEX International Congress ; https://halshs.archives-ouvertes.fr/halshs-00725224 ; 15th EURALEX International Congress, Aug 2012, Oslo, Norway. pp.389-403 (2012)
|
|
BASE
|
|
Show details
|
|
8 |
Building Bilingual Terminologies from Comparable Corpora: The TTC TermSuite
|
|
|
|
In: 5th Workshop on Building and Using Comparable Corpora with special topic "Language Resources for Machine Translation in Less-Resourced Languages and Domains", co-located with LREC 2012 ; https://hal.archives-ouvertes.fr/hal-00819594 ; 5th Workshop on Building and Using Comparable Corpora with special topic "Language Resources for Machine Translation in Less-Resourced Languages and Domains", co-located with LREC 2012, May 2012, Istambul, Turkey (2012)
|
|
BASE
|
|
Show details
|
|
9 |
Neoclassical Compound Alignments from Comparable Corpora
|
|
|
|
In: Computational Linguistics and Intelligent Text Processing - 13th International Conference, CICLing 2012 ; https://hal.archives-ouvertes.fr/hal-00822519 ; Computational Linguistics and Intelligent Text Processing - 13th International Conference, CICLing 2012, Mar 2012, New Delhi, India. pp.72-82 (2012)
|
|
BASE
|
|
Show details
|
|
10 |
Identifying and Grouping Variants of Technical Terms on the Basis of Text Corpora
|
|
|
|
In: 33rd Annual Conference of the German Linguistic Society (DGfS) ; https://hal.archives-ouvertes.fr/hal-00818647 ; 33rd Annual Conference of the German Linguistic Society (DGfS), Feb 2011, Göttingen, Germany (2011)
|
|
BASE
|
|
Show details
|
|
11 |
User-centred Views on Terminology Extraction Tools: Usage Scenarios and Integration into MT and CAT Tools.
|
|
|
|
In: Actes du colloque Tralogy : Anticiper les technologies pour la traduction ; Tralogy I. Métiers et technologies de la traduction : quelles convergences pour l'avenir ? ; https://hal.archives-ouvertes.fr/hal-00818657 ; Tralogy I. Métiers et technologies de la traduction : quelles convergences pour l'avenir ?, Mar 2011, Paris, France. 10 p (2011)
|
|
BASE
|
|
Show details
|
|
12 |
Comparability Measurement for Terminology Extraction
|
|
|
|
In: Workshop on Creation, Harmonization and Application of Terminology resources (CHAT 2011) in conjunction with the 18th Nordic Conference on Computational Linguistics (NODALIDA 2011). ; https://hal.archives-ouvertes.fr/hal-00819338 ; Workshop on Creation, Harmonization and Application of Terminology resources (CHAT 2011) in conjunction with the 18th Nordic Conference on Computational Linguistics (NODALIDA 2011)., May 2011, Riga, Latvia. pp.3-10 (2011)
|
|
BASE
|
|
Show details
|
|
13 |
Simple methods for dealing with term variation and term alignment
|
|
|
|
In: 9th International Conference on Terminology and Artificial Intelligence (TIA 2011) ; https://hal.archives-ouvertes.fr/hal-00819376 ; 9th International Conference on Terminology and Artificial Intelligence (TIA 2011), Nov 2011, Paris, France. pp.87-93 (2011)
|
|
BASE
|
|
Show details
|
|
14 |
TTC TermSuite: A UIMA Application for Multilingual Terminology Extraction from Comparable Corpora
|
|
|
|
In: 5th International Joint Conference on Natural Language Processing (IJCNLP) ; https://hal.archives-ouvertes.fr/hal-00819025 ; 5th International Joint Conference on Natural Language Processing (IJCNLP), Nov 2011, Chiang Mai, Thailand. pp.9-12 (2011)
|
|
BASE
|
|
Show details
|
|
15 |
Evaluation of terminologies acquired from comparable corpora : an application perspective
|
|
|
|
In: Proceedings of the 18th International Nordic Conference of Computational Linguistics ; NODALIDA 2011 ; https://hal.archives-ouvertes.fr/hal-00585187 ; NODALIDA 2011, May 2011, Riga, Latvia. pp.66--73 (2011)
|
|
BASE
|
|
Show details
|
|
16 |
Dealing with lexicon acquired from comparable corpora : validation and exchange
|
|
|
|
In: TKE2010 ; https://hal.archives-ouvertes.fr/hal-00544403 ; TKE2010, Aug 2010, Dublin, Ireland. pp.211-224 (2010)
|
|
BASE
|
|
Show details
|
|
17 |
TTC: Terminology Extraction, Translation Tools and Comparable Corpora
|
|
|
|
In: 14th EURALEX International Congress ; https://hal.archives-ouvertes.fr/hal-00819365 ; 14th EURALEX International Congress, Jul 2010, Leeuwarden/Ljouwert, Netherlands. pp.263-268 (2010)
|
|
BASE
|
|
Show details
|
|
18 |
Characterization and Compilation of Specialized Comparable Corpora ; Découverte et caractérisation des corpus comparables spécialisés
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-00474405 ; Interface homme-machine [cs.HC]. Université de Nantes, 2009. Français (2009)
|
|
Abstract:
Comparable corpora are sets of texts written in different languages that are not translations of each other but that share common characteristics. Their main advantage is to be fully representative of linguistics and cultural specificities of their respective language. The Web could theoretically be considered as a comparable corpora source. However, the quality of corpora and of their extracted resources depends on the preliminary definition of corpora and on the carefulness of their compilation (i.e. the definition of common features in comparable corpora). In this thesis, we focus on the compilation of specialized comparable corpora in French and Japanese which documents are extracted from the Web. We propose a definition of these corpora and a set of common features: a specialized domain, a topic and a type of discourse (science or popular science). Our goal is to create a tool to assist comparable corpora compilation. first, we present automatic recognition of common features. Topics can be easily identified with keywords used in Web searches. On the contrary, the detection of the type of discourse needs a wide stylistic analysis. This task is performed over a learning corpus, which leads to the creation of a bilingual typology based on three levels of analysis: structural, modal and lexical. Second, we use this typology to learn a classification model with SVMlight and C4.5. This classification model is tested over an evaluation corpus. Our test results indicate that more than 70 % of the documents are well classified. finally, the classifier is integrated into a comparable corpora compilation assistant tool developed on UIMA system. ; Les corpus comparables rassemblent des textes dans plusieurs langues qui ne sont pas des traductions mais partagent certaines caractéristiques. Ces corpus présentent l'avantage d'être représentatifs des particularités culturelles et linguistiques de chaque langue. Le Web peut théoriquement être considéré comme un réservoir à corpus comparables mais la qualité des corpus et des ressources qui en sont extraites réside dans la définition préalable des objectifs du corpus et du soin mis à sa composition (les caractéristiques communes aux textes dans le cas des corpus comparables). Notre travail porte sur la constitution de corpus comparables spécialisés en français et japonais dont les documents sont extraits du Web. Nous en proposons une définition et des caractéristiques communes : un domaine de spécialité, un thème et un type de discours (scientifique ou vulgarisé). Notre objectif est de créer un système d'aide à la construction de corpus comparables. Nous présentons d'abord la reconnaissance automatique des caractéristiques communes du corpus. Le thème peut être détecté grâce aux mots-clés utilisés lors de la recherche. Pour le type de discours nous utilisons les méthodes d'apprentissage automatique. Une analyse stylistique sur un corpus d'apprentissage nous permet de créer une typologie bilingue composée de trois niveaux d'analyse : structurel, modal et lexical. Nous l'utilisons ensuite afin d'apprendre un modèle de classification avec les systèmes SVMlight et C4.5. Ces modèles sont ensuite évalués sur un corpus d'évaluation et permettent de classer correctement plus de 70 % des documents dans les deux langues. Nous intégrons ensuite le classifieur au sein d'une chaîne logicielle d'aide à la construction de corpus comparables implémentée sur la plateforme UIMA.
|
|
Keyword:
[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; analyse stylistique; apprentissage automatique; comparable corpora; corpus comparables; langues de spécialité; machine learning; multilingual typology; specialized languages; stylistic analysis; type of discourse; types de discours; typologie multilingue
|
|
URL: https://tel.archives-ouvertes.fr/tel-00474405/file/these-lorraine-goeuriot.pdf https://tel.archives-ouvertes.fr/tel-00474405/document https://tel.archives-ouvertes.fr/tel-00474405
|
|
BASE
|
|
Hide details
|
|
19 |
Multilingual alignment from specialised comparable corpora ; Alignement multilingue en corpus comparables spécialisés
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-00462248 ; Interface homme-machine [cs.HC]. Université de Nantes, 2009. Français (2009)
|
|
BASE
|
|
Show details
|
|
20 |
Effective Compositional Model for Lexical Alignment
|
|
|
|
In: Proceedings, IJCNLP 2008: Third International Joint Conference on Natural Language Processing ; IJCNLP 2008: Third International Joint Conference on Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-00403643 ; IJCNLP 2008: Third International Joint Conference on Natural Language Processing, Jan 2008, Hyderabad, India. pp.95-102 (2008)
|
|
BASE
|
|
Show details
|
|
|
|