2 |
Logical Aspects of Computational Linguistics: 8th International Conference, LACL 2014, Toulouse, France, June 18-24, 2014. Proceedings
|
|
|
|
In: ISSN: 0302-9743 ; Lecture Notes in Computer Science ; 8th International Conference on Logical Aspects of Computational Linguistics (LACL 2014) ; https://hal.archives-ouvertes.fr/hal-03246622 ; Asher, Nicholas; Soloviev, Sergei. 8th International Conference on Logical Aspects of Computational Linguistics (LACL 2014), Toulouse, France. Lecture Notes in Computer Science, 8535 (8535), Springer, 2014, Theoretical Computer Science and General Issues, 978-3-662-43741-4. ⟨10.1007/978-3-662-43742-1⟩ ; https://www.springer.com/gp/book/9783662437414 (2014)
|
|
BASE
|
|
Show details
|
|
3 |
Discours. A journal of linguistics, psycholinguistics and computational linguistics. ; Discours. Revue de linguistique, psycholinguistique et informatique.
|
|
|
|
In: https://halshs.archives-ouvertes.fr/halshs-01432020 ; France. Presses universitaires de Caen, 2014, Discours. Revue de linguistique, psycholinguistique et informatique., ISSN électronique 1963-1723 (2014)
|
|
BASE
|
|
Show details
|
|
4 |
Making the Most of It: Word Sense Annotation and Disambiguation in the Face of Data Sparsity and Ambiguity
|
|
|
|
In: Jurgens, David Alan. (2014). Making the Most of It: Word Sense Annotation and Disambiguation in the Face of Data Sparsity and Ambiguity. UCLA: Computer Science 0201. Retrieved from: http://www.escholarship.org/uc/item/2wn4h7ph (2014)
|
|
Abstract:
Natural language is highly ambiguous, with the same word having different meanings depending on the context. While human readers often have no trouble interpreting the correct meaning, semantic ambiguity poses a significant problem for many natural language systems, such as those that translate text or perform machine reading. The task of identifying which meaning of a word is present in a given context is known as Word Sense Disambiguation (WSD), where a word's meanings are discretized into units referred to as senses. Because languages contain hundreds of thousand of unique words and each of those words can have multiple meanings, comprehensive sense-annotated corpora are often sparse, with only tens to low-hundreds of annotated examples of each word. As a result, creating high performance WSD systems requiring overcoming this data sparsity.This thesis provides a three-fold approach to improving WSD performance in the face of data sparsity. First, we introduce two new algorithms that take the role of a lexicographer and automatically learn the senses of a word from example uses in a fully unsupervised way. We then demonstrate that these unsupervised systems can be combined with a limited amount of annotated data to create a semi-supervised WSD system that significantly outperforms a state-of-the-art supervised WSD system trained on the same data. Second, we propose a novel method for gathering high-quality sense annotations from large numbers of untrained, online workers, commonly referred to as crowdsourcing. Our method lowers the time and cost of building sense-annotated corpora, while maintaining as high a level of agreement between annotators, comparable with that of trained experts. Third, we analyze cases of ambiguity in sense annotations, when two annotators differ about which sense best describes the meaning of a particular usage of a word. To perform this analysis, we built the largest sense-annotated corpus where cases of semantic ambiguity are explicitly marked. Our analysis of this corpus revealed multiple causes for this ambiguity as well as how the ambiguity may be interpreted and resolved by natural language applications using ambiguous data. To complement this work on ambiguity, we have also introduced a new methodology for evaluating WSD systems that explicitly report ambiguous instances.
|
|
Keyword:
annotation; computational linguistics; Computer science; crowdsourcing; Linguistics; semantics; word sense disambiguation; word sense induction
|
|
URL: http://n2t.net/ark:/13030/m5hb1d0t http://www.escholarship.org/uc/item/2wn4h7ph
|
|
BASE
|
|
Hide details
|
|
5 |
Ranking canonical English poems
|
|
|
|
In: Literary and Linguistic Computing ; http://llc.oxfordjournals.org/ (2014)
|
|
BASE
|
|
Show details
|
|
6 |
The Montagovian Generative Lexicon Lambda Ty_n: a Type Theoretical Framework for Natural Language Semantics
|
|
: Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2014. : LIPIcs - Leibniz International Proceedings in Informatics. 19th International Conference on Types for Proofs and Programs (TYPES 2013), 2014
|
|
BASE
|
|
Show details
|
|
7 |
The ACL RD-TEC: A Dataset for Benchmarking Terminology Extraction and Classification in Computational Linguistics
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Supplementary Material For "Sequence Comparison In Historical Linguistics" ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Finding expertise using online community dialogue and the Duality of Expertise
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Improving Phonetic Alignment By Handling Secondary Sequence Structures ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Anonymise sound files ; Anonymisation de fichiers sonores
|
|
HIRST, Daniel. - : Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR), 2014. : http://lpl-aix.fr, 2014
|
|
BASE
|
|
Show details
|
|
19 |
Script praat pour recueil infos acoustiques
|
|
CORNAZ, Sandra. - : Grenoble Images Parole Signal Automatique - UMR 5216 (Gipsa, Grenoble FR), 2014. : http://www.gipsa-lab.inpg.fr, 2014
|
|
BASE
|
|
Show details
|
|
20 |
Following the Trail of Source Languages in Literary Translations ; Research and Development in Intelligent Systems XXXI
|
|
|
|
BASE
|
|
Show details
|
|
|
|