181 |
Large-Scale Acquisition of Feature-Based Conceptual Representations from Textual Corpora
|
|
|
|
In: Proceedings of the Annual Meeting of the Cognitive Science Society ; The Annual Meeting of the Cognitive Science Society ; https://hal.archives-ouvertes.fr/hal-00507103 ; The Annual Meeting of the Cognitive Science Society, 2010, United States. 6 p (2010)
|
|
BASE
|
|
Show details
|
|
182 |
Towards unrestricted, large-scale acquisition of feature-based conceptual representations from corpus data
|
|
|
|
In: ISSN: 1570-7075 ; EISSN: 1572-8706 ; Research on Language and Computation ; https://hal.archives-ouvertes.fr/hal-00605539 ; Research on Language and Computation, Springer Verlag, 2010, 7 (2-4), pp.137-170 (2010)
|
|
Abstract:
International audience ; In recent years a number of methods have been proposed for the automatic acquisition of feature-based conceptual representations from text corpora. Such methods could offer valuable support for theoretical research on conceptual representation. However, existing methods do not target the full range of concept-relation-feature triples occurring in human-generated norms (e.g. flute produce sound) but rather focus on concept-feature pairs (e.g. flute - sound) or triples involving specific relations only (e.g. is-a or part-of relations). In this article we investigate the challenges that need to be met in both methodology and evaluation when moving towards the acquisition of more comprehensive conceptual representations from corpora. In particular, we investigate the usefulness of three types of knowledge in guiding the extraction process: encyclopedic, syntactic and semantic. We present first a semantic analysis of existing, human-generated feature production norms, which reveals information about co-occurring concept and feature classes. We introduce then a novel method for large-scale feature extraction which uses the class-based information to guide the acquisition process. The method involves extracting candidate triples consisting of concepts, relations and features (e.g. deer have antlers, flute produce sound) from corpus data parsed for grammatical dependencies, and re-weighting the triples on the basis of conditional probabilities calculated from our semantic analysis. We apply this method to an automatically parsed Wikipedia corpus which includes encyclopedic information and evaluate its accuracy using a number of different methods: direct evaluation against the McRae norms in terms of feature types and frequencies, human evaluation, and novel evaluation in terms of conceptual structure variables. Our investigation highlights a number of issues which require addressing in both methodology and evaluation when aiming to improve the accuracy of unconstrained feature extraction further.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SCCO.COMP]Cognitive science/Computer science; [SCCO.LING]Cognitive science/Linguistics; [SHS.LANGUE]Humanities and Social Sciences/Linguistics
|
|
URL: https://hal.archives-ouvertes.fr/hal-00605539
|
|
BASE
|
|
Hide details
|
|
183 |
Investigating the cross-linguistic potential of VerbNet-style classification
|
|
|
|
In: Proceedings of CoLing ; CoLing 2010 ; https://hal.archives-ouvertes.fr/hal-00539036 ; CoLing 2010, 2010, Beijing, China, China. pp.94 (2010)
|
|
BASE
|
|
Show details
|
|
185 |
LexSchem: A Large Subcategorization Lexicon for French Verbs
|
|
|
|
In: Proceedings of the Language Resources and Evaluation Conference (LREC) ; LREC 2008 ; https://hal.archives-ouvertes.fr/hal-00539025 ; LREC 2008, 2008, Marrakech, Morocco. pp.142 (2008)
|
|
BASE
|
|
Show details
|
|
|
|