2 |
Compiling terminological data using comparable corpora: from term extraction to dictionaries
|
|
|
|
In: 34th Annual Conference of the German Linguistic Society (DGfS) ; https://hal.archives-ouvertes.fr/hal-00819590 ; 34th Annual Conference of the German Linguistic Society (DGfS), Mar 2012, Frankfurt, Germany (2012)
|
|
BASE
|
|
Show details
|
|
3 |
Terminology Extraction, Translation Tools and Comparable Corpora: TTC concept, midterm progress and achieved results
|
|
|
|
In: LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS) ; https://hal.archives-ouvertes.fr/hal-00819909 ; LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS), May 2012, Istanbul, Turkey. 4 p (2012)
|
|
BASE
|
|
Show details
|
|
4 |
Reference Lists for the Evaluation of Term Extraction Tools
|
|
|
|
In: Proceedings of the 10th Terminology and Knowledge Engineering Conference (TKE'12) ; Proceedings of the 10th Terminology and Knowledge Engineering Conference (TKE 12) ; Terminology and Knowledge Engineering Conference (TKE) ; https://hal.archives-ouvertes.fr/hal-00816566 ; Terminology and Knowledge Engineering Conference (TKE), Jun 2012, Madrid, Spain. http://www.oeg-upm.net/tke2012/proceedings (2012)
|
|
BASE
|
|
Show details
|
|
5 |
Identifying and Grouping Variants of Technical Terms on the Basis of Text Corpora
|
|
|
|
In: 33rd Annual Conference of the German Linguistic Society (DGfS) ; https://hal.archives-ouvertes.fr/hal-00818647 ; 33rd Annual Conference of the German Linguistic Society (DGfS), Feb 2011, Göttingen, Germany (2011)
|
|
Abstract:
Poster and demonstration ; National audience ; In the TTC project, monolingual term candidate extraction tools are being developed, as a basis for later term alignment, using comparable corpora. TTC covers English, German, French, Spanish, Latvian, Russian and Chinese. In the poster, we address the extraction of morphological and syntactic term variants of the first four languages. The term extraction pipeline is aimed at texts of technical domains (e.g. wind energy), which are obtained by a thematic web-crawler and then are POS-tagged and lemmatized. We extract multiword term candidates by POS-patterns (e.g. adjective+noun) and statistically filter domain-relevant terms. In a second step, morpho-syntactic term variants are grouped into sets: (1) metallhaltiger Abfall ↔ Abfall aus Metall. We use the typology of term variation proposed by (Daille, 2005). Certain types of variation are captured by expanding basic POS-patterns; for modifications (adjectives, embedding under genitive or of-phrases), additional distributional data are used, e.g. the frequency of modifiers in particular constructions. For morphological variation, we experiment with a simple stemming-based approach and a morphological analyzer: stemming rules allow us to relate terms like production d'électricité and électricité produite. To relate e.g. Emissionssteigerung and gesteigerte Emission, we need to split the compound noun Emission|Steigerung and to model the relation between steigern and gesteigert, which can be done using SMOR (Schmid et al., 2004). In addition to monolingual term variants, we are interested in patterns applicable to several languages: (2) noun adjective = noun de noun énergie générable ↔ génération d'énergie (FR) energía generable ↔ generación de energía (ES).
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; comparable corpora; term extraction; term variation
|
|
URL: https://hal.archives-ouvertes.fr/hal-00818647
|
|
BASE
|
|
Hide details
|
|
6 |
Simple methods for dealing with term variation and term alignment
|
|
|
|
In: 9th International Conference on Terminology and Artificial Intelligence (TIA 2011) ; https://hal.archives-ouvertes.fr/hal-00819376 ; 9th International Conference on Terminology and Artificial Intelligence (TIA 2011), Nov 2011, Paris, France. pp.87-93 (2011)
|
|
BASE
|
|
Show details
|
|
|
|