Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	RRF102: Meeting the TREC-COVID Challenge with a 100+ Runs Ensemble ...
	Bendersky, Michael; Zhuang, Honglei; Ma, Ji. - : arXiv, 2020
	BASE
	Show details

2	Text mining with word embedding for outlier and sentiment analysis
	Zhuang, Honglei. - 2019
	Abstract: The technology today makes it unprecedentedly easy to collect and store massive text data in various domains such as online social networks, medical records and news reports. In contrast to the gigantic volume of text data, human capabilities to read and process text data is limited. Hence, there is an emerging demand for automatic text mining tools to analyze massive text data. Word embedding is an emerging text analysis technique that leverages the fine-grained statistics of context information to map each word to a vector in the embedding space which reflects the semantic proximity between words. Embedding techniques not only enrich the statistical signals to utilize in downstream text mining applications, but also provide the possibility to characterize and represent higher-level objects in the embedding space, such as sentences, documents or topics. This study integrates word embedding techniques into a series of text mining approaches and models. The general idea is to take a text object such as a document or a sentence as a bag of embedding vectors and characterize their distributions in the embedding space. Specifically, this study focuses on two tasks: outlier analysis and weakly-supervised sentiment analysis. Outlier analysis aims to identify documents that topically deviate from the majority of a given corpus. We develop an unsupervised generative model to identify frequent and representative semantic regions in the word embedding space to represent the given corpus. Then we propose a novel outlierness measure to identify outlier documents. We also study the cost-sensitive scenario of outlier analysis. Sentiment analysis typically identifies the subjective opinion (e.g., positive vs. negative) in a piece of text. Despite being extensively studied as a supervised learning task, we tackle the problem in a weakly-supervised fashion, where users only provide a small set of seed words as guidance. We study to identify aspects and corresponding sentiments at both document and sentence level.
	Keyword: outlier analysis; sentiment analysis; text mining; word embedding
	URL: http://hdl.handle.net/2142/105058
	BASE
	Hide details

Search in the Catalogues and Directories