2 |
Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets
|
|
|
|
In: Future Internet ; Volume 13 ; Issue 11 (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Using Semantics for Granularities of Tokenization
|
|
|
|
In: Computational Linguistics, Vol 44, Iss 3, Pp 483-524 (2018) (2018)
|
|
Abstract:
Depending on downstream applications, it is advisable to extend the notion of tokenization from low-level character-based token boundary detection to identification of meaningful and useful language units. This entails both identifying units composed of several single words that form a several single words that form a, as well as splitting single-word compounds into their meaningful parts. In this article, we introduce unsupervised and knowledge-free methods for these two tasks. The main novelty of our research is based on the fact that methods are primarily based on distributional similarity, of which we use two flavors: a sparse count-based and a dense neural-based distributional semantic model. First, we introduce DRUID, which is a method for detecting MWEs. The evaluation on MWE-annotated data sets in two languages and newly extracted evaluation data sets for 32 languages shows that DRUID compares favorably over previous methods not utilizing distributional information. Second, we present SECOS, an algorithm for decompounding close compounds. In an evaluation of four dedicated decompounding data sets across four languages and on data sets extracted from Wiktionary for 14 languages, we demonstrate the superiority of our approach over unsupervised baselines, sometimes even matching the performance of previous language-specific and supervised methods. In a final experiment, we show how both decompounding and MWE information can be used in information retrieval. Here, we obtain the best results when combining word information with MWEs and the compound parts in a bag-of-words retrieval set-up. Overall, our methodology paves the way to automatic detection of lexical units beyond standard tokenization techniques without language-specific preprocessing steps such as POS tagging.
|
|
Keyword:
Computational linguistics. Natural language processing; P98-98.5
|
|
URL: https://doaj.org/article/f739e90bb4a24f6794543bcd4b417072 https://doi.org/10.1162/coli_a_00325
|
|
BASE
|
|
Hide details
|
|
4 |
Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Webbasierte linguistische Forschung: Möglichkeiten und Begrenzungen beim Umgang mit Massendaten
|
|
|
|
In: Linguistik Online, Vol 61, Iss 4 (2014) (2014)
|
|
BASE
|
|
Show details
|
|
7 |
SemEval-2013 task 5: Evaluating phrasal semantics
|
|
|
|
In: http://www.aclweb.org/anthology/S/S13/S13-2007.pdf (2013)
|
|
BASE
|
|
Show details
|
|
8 |
Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation
|
|
|
|
In: http://aclweb.org/anthology/C/C12/C12-1109.pdf (2012)
|
|
BASE
|
|
Show details
|
|
9 |
Ukp: Computing semantic textual similarity by combining multiple content similarity measures
|
|
|
|
In: http://aclweb.org/anthology//S/S12/S12-1059.pdf (2012)
|
|
BASE
|
|
Show details
|
|
10 |
Quantifying semantics using complex network analysis
|
|
|
|
In: http://aclweb.org/anthology/C/C12/C12-1017.pdf (2012)
|
|
BASE
|
|
Show details
|
|
11 |
ASV Toolbox – A Modular Collection of Language Exploration Tools
|
|
|
|
In: http://asv.informatik.uni-leipzig.de/publication/file/94/biemann-etal-08-toolbox.pdf (2008)
|
|
BASE
|
|
Show details
|
|
12 |
ASV Toolbox – A Modular Collection of Language Exploration Tools
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2008/pdf/447_paper.pdf (2008)
|
|
BASE
|
|
Show details
|
|
13 |
workshop on Graph-based Algorithms for Natural Language Processing Workshop chairs:
|
|
|
|
In: http://www.aclweb.org/anthology-new/W/W08/W08-20.pdf (2008)
|
|
BASE
|
|
Show details
|
|
14 |
Unsupervised part-of-speech tagging employing efficient graph clustering
|
|
|
|
In: http://wortschatz.uni-leipzig.de/~cbiemann/pub/2006/unsupos_graph_coling06SRW.pdf (2006)
|
|
BASE
|
|
Show details
|
|
15 |
Automatic extension of feature-based semantic lexicons via contextual attributes
|
|
|
|
In: http://pi7.fernuni-hagen.de/osswald/papers/gfkl05.pdf (2006)
|
|
BASE
|
|
Show details
|
|
16 |
Unsupervised part-of-speech tagging employing efficient graph clustering
|
|
|
|
In: http://acl.ldc.upenn.edu/P/P06/P06-3002.pdf (2006)
|
|
BASE
|
|
Show details
|
|
17 |
Unsupervised part-of-speech tagging employing efficient graph clustering
|
|
|
|
In: http://machinelearningtext.pbworks.com/w/file/fetch/48158637/UnsupPOSp7-biemann.pdf (2006)
|
|
BASE
|
|
Show details
|
|
18 |
Rigorous dimensionality reduction through linguistically motivated feature selection for text categorisation
|
|
|
|
In: http://wortschatz.uni-leipzig.de/~fwitschel/papers/nodalida.pdf (2005)
|
|
BASE
|
|
Show details
|
|
19 |
Disentangling from babylonian confusion – unsupervised language identification
|
|
|
|
In: http://wortschatz.uni-leipzig.de/~cbiemann/pub/2005/cicling05.pdf (2005)
|
|
BASE
|
|
Show details
|
|
20 |
Automatic acquisition of paradigmatic relations using iterated co-occurrences
|
|
|
|
In: http://wortschatz.uni-leipzig.de/~sbordag/papers/BiemannBordagQuasthoffAutomatic04.pdf (2004)
|
|
BASE
|
|
Show details
|
|
|
|