Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3

Hits 1 – 20 of 45

1	Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations ...
	Meng, Yu; Zhang, Yunyi; Huang, Jiaxin. - : arXiv, 2022
	BASE
	Show details

2	Semantic pattern discovery in open information extraction
	Chauhan, Aabhas. - 2022
	BASE
	Show details

3	Text mining at multiple granularity: leveraging subwords, words, phrases, and sentences
	El-Kishky, Ahmed Hassan. - 2022
	Abstract: With the rapid digitization of information, large quantities of text-heavy data is being constantly generated in many languages and across domains such as web documents, research papers, business reviews, news, and social posts. As such, efficiently and effectively searching, organizing, and extracting meaningful information and data from these massive unstructured corpora is essential to laying the foundation for many downstream text mining and natural language processing (NLP) tasks. Traditionally, NLP and text mining techniques are applied to the raw texts while treating individual words as the base semantic unit. However the assumption that individual word-tokens are the correct semantic granularity does not hold for many tasks and can lead to many problems and poor task performance. To address this, this work introduces techniques for identifying and utilizing text at different semantic granularity to solve a variety of text mining and NLP tasks. The general idea is to take a text object such as a document, and decompose it to many levels of semantic granularity such as sentences, phrases, words, or subword structures. Once the text in represented at different levels of semantic granularity, we demonstrate techniques that can leverage the properly encoded text to solve a variety of NLP tasks. Specifically, this study focuses on three levels of semantic granularity: (1) subword segmentation with an application to enriching word embeddings to address word sparsity (2) phrase mining with an application to phrase-based topic modeling and (3) leveraging sentence-level granularity for finding parallel cross-lingual data. The first granularity we study is subword-level. We introduce a subword mining problem that aims to segment individual word tokens into smaller subword structures. The motivation is that, often, individual words are too coarse of a granularity and need to be supplemented by a finer semantic granularity. Operating on these fine-grained subwords addresses many important problems in NLP namely the long-tail data-sparsity problem whereby most words in a corpus are infrequent, and the more severe out-of-vocabulary problem. To effectively and efficiently mine these subword structures, we propose an unsupervised segmentation algorithm based off a novel objective: transition entropy. We use ground-truth segmentation to assess the quality of the segmented words and further demonstrate the benefit of jointly leveraging words and subwords for distributed word representations. The second granularity we study is phrase-level and the phrase mining task to transform raw unstructured text from a fine-grained sequence of words into a coarser-granularity sequence of single and multi-word phrases. The motivation is that, often, human language contains idiomatic multi-word expressions and fine-grained words fail to capture the right semantic granularity; proper phrasal segmentation can capture this true appropriate semantic granularity. To address this problem, we propose an unsupervised phrase mining algorithm based on frequent significant contiguous text patterns. We use human-evaluation to assess the quality of the mined phrases and demonstrate the benefit of pre-mining phrases on a downstream topic-modeling task. The third granularity we study is sentence-level granularity. We motivate the need for a sentence-level granularity for capturing more complex semantically complete spans of texts. We introduce several downstream tasks that leverage sentence representations in conjunction with finer-grained units in a cross-lingual text mining task. We experimentally show how leveraging sentence-level data for cross-lingual embeddings can be used to identify cross-lingual document pairs and parallel sentences – data necessary for training machine translation models. ; U of I Only ; Author requested U of Illinois access only (OA after 2yrs) in Vireo ETD system
	Keyword: cross-lingual; data mining; embedding; nlp; phrases; sentences; subwords
	URL: http://hdl.handle.net/2142/108161
	BASE
	Hide details

4	Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Han, Jiawei; Huang, Jiaxin. - : Underline Science Inc., 2021
	BASE
	Show details

5	ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Garg, Shweta; Han, Jiawei. - : Underline Science Inc., 2021
	BASE
	Show details

6	Generation-Augmented Retrieval for Open-Domain Question Answering ...
	The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021; Chen, Weizhu; Gao, Jianfeng. - : Underline Science Inc., 2021
	BASE
	Show details

7	Few-Shot Named Entity Recognition: An Empirical Baseline Study ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Balakrishnan, Shobana; Chen, Weizhu. - : Underline Science Inc., 2021
	BASE
	Show details

8	Reader-Guided Passage Reranking for Open-Domain Question Answering ...
	The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021; Chen, Weizhu; Gao, Jianfeng. - : Underline Science Inc., 2021
	BASE
	Show details

9	The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Cho, Kyunghyun; Han, Jiawei. - : Underline Science Inc., 2021
	BASE
	Show details

10	Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation ...
	Mao, Yuning; Ma, Wenchang; Lei, Deren. - : arXiv, 2021
	BASE
	Show details

11	Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Han, Jiawei; Mao, Yuning. - : Underline Science Inc., 2021
	BASE
	Show details

12	Combating abuse on social media platforms using natural language processing
	Seyler, Dominic. - 2021
	BASE
	Show details

13	Text Classification Using Label Names Only: A Language Model Self-Training Approach ...
	Meng, Yu; Zhang, Yunyi; Huang, Jiaxin. - : arXiv, 2020
	BASE
	Show details

14	COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation ...
	Wang, Qingyun; Li, Manling; Wang, Xuan. - : arXiv, 2020
	BASE
	Show details

15	Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation ...
	Mao, Yuning; Ren, Xiang; Ji, Heng. - : arXiv, 2020
	BASE
	Show details

16	Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion ...
	Huang, Jiaxin; Xie, Yiqing; Meng, Yu. - : arXiv, 2020
	BASE
	Show details

17	Near-imperceptible Neural Linguistic Steganography via Self-Adjusting Arithmetic Coding ...
	Shen, Jiaming; Ji, Heng; Han, Jiawei. - : arXiv, 2020
	BASE
	Show details

18	Cold-start universal information extraction
	Huang, Lifu. - 2020
	BASE
	Show details

19	Cross-lingual entity extraction and linking for 300 languages
	Pan, Xiaoman. - 2020
	BASE
	Show details

20	Text cube: construction, summarization and mining
	Tao, Fangbo. - 2020
	BASE
	Show details

Page: 1 2 3

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern