1 |
A Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track)
|
|
|
|
In: DTIC (2008)
|
|
BASE
|
|
Show details
|
|
3 |
Barriers, Bridges, and Progress in Cognitive Modeling for Military Applications
|
|
|
|
In: DTIC (2008)
|
|
BASE
|
|
Show details
|
|
4 |
Integrating a Natural Language Message Pre-Processor with UIMA
|
|
|
|
In: DTIC (2008)
|
|
BASE
|
|
Show details
|
|
5 |
Laboratory for Computational Cultural Dynamics
|
|
|
|
In: DTIC (2008)
|
|
BASE
|
|
Show details
|
|
7 |
IIT Kharagpur at TREC 2008 Blog Track
|
|
|
|
In: DTIC (2008)
|
|
Abstract:
Blogs are often informally written, poorly structured, and filled with spelling and grammatical errors and nontraditional content. Performing linguistic analysis on blogs is plagued by two additional problems: (1) the presence of spam blogs and spam comments, and (2) extraneous noncontent, including blog-rolls, link-rolls, advertisements, and sidebars. Our system of retrieving the documents was made using the Apache Lucene search engine. Lucene was able to index the whole Blog06 dataset and could retrieve the documents very quickly. To decrease the size of the index it was necessary to remove a lot of noise in the HTML. A lot of the documents had malformed html which was corrected using the HTML Tidy utility. We used the qrels of the Blog Track of TREC 2006 and 2007 to train the sentence level subjectivity and polarity classifiers. This paper describes the authors' opinion retrieval system for the TREC 2008 blog track. The system contains five modules. The first module is focused on extracting the blog content from junk html, thereby decreasing the noise in the indexed content. The second module aims at removing various kinds of spam content from real blogs. The third module aims at retrieving relevant documents. The fourth module filters out opinionated documents, and the fifth module calculates the polarity of the sentiments in the documents. The final ranked retrieval runs were based on various combinations of settings in each module so as to study the effects of each. For classification of subjectivity and polarity, they did the predictions by using a complementary naive bayes classifier. ; Presented at the Text REtrieval Conference (17th) (TREC 2008) held in Gaithersburg, MD, on 18-21 Nov 2008. Sponsored in part by the Defense Advanced Research Projects Agency (DARPA) and the Advanced Research and Development Activity (ARDA). The original document contains color images.
|
|
Keyword:
*ATTITUDES(PSYCHOLOGY); *BLOGS; *COMPUTATIONAL LINGUISTICS; *ELECTRONIC PUBLISHING; *EXPERT SYSTEMS; *EXTRACTION; *INFORMATION RETRIEVAL; *INTERNET; *OPINION EXTRACTION; AUTOMATION; BAYES THEOREM; CLASSIFICATION; Cybernetics; DATA BASES; DATA EXTRACTION; DATA PREPROCESSING; FOREIGN REPORTS; HTML(HYPER TECH MARKUP LANGUAGE); INDIA; INFORMATION FILTERS; Information Science; Linguistics; MAP(MEAN AVERAGE PRECISION); MOVIE REVIEWS; NOISE REDUCTION; OPINION FILTERING; OPINION SEARCHES; POLARITY; PRECISION; PREPROCESSING; PUBLIC OPINION; RETRIEVAL PERFORMANCE; SCORING; SPAM FILTERING; SPAM(COMPUTER SCIENCE); SPLOG DETECTION; SYMPOSIA; TREC 2008 BLOG TRACK; WEB LOGS
|
|
URL: http://oai.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA512742 http://www.dtic.mil/docs/citations/ADA512742
|
|
BASE
|
|
Hide details
|
|
8 |
Iterated Class-Specific Subspaces for Speaker-Dependent Phoneme Classification
|
|
|
|
In: DTIC (2008)
|
|
BASE
|
|
Show details
|
|
|
|