DE eng

Search in the Catalogues and Directories

Hits 1 – 11 of 11

1
Text mining at multiple granularity: leveraging subwords, words, phrases, and sentences
BASE
Show details
2
Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications ...
BASE
Show details
3
As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation ...
BASE
Show details
4
Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning ...
BASE
Show details
5
XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment ...
BASE
Show details
6
Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data ...
BASE
Show details
7
XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment ...
BASE
Show details
8
Massively Multilingual Document Alignment with Cross-lingual Sentence-Mover's Distance ...
BASE
Show details
9
Beyond English-Centric Multilingual Machine Translation ...
BASE
Show details
10
An exploratory study on multilingual quality estimation
In: 366 ; 377 (2020)
BASE
Show details
11
Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks
Abstract: One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this paper, we provide an example of using world knowledge for domain dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. In the experiments, we use two existing knowledge bases as our sources of world knowledge. One is Freebase, which is collaboratively collected knowledge about entities and their organizations. The other is YAGO2, a knowledge base automatically extracted from Wikipedia and maps knowledge to the linguistic knowledge base, Word-Net. Experimental results on two text benchmark datasets (20newsgroups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features.
Keyword: Article
URL: https://doi.org/10.1145/2783258.2783374
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4688021/
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
11
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern