1 |
Towards a Simple and Efficient Web Search Framework
|
|
|
|
In: DTIC (2014)
|
|
Abstract:
The Web Track of 2014 Text REtrieval Conference (TREC) addresses the most fundamental problem of Information Retrieval. We did not intend to craft a system that beats the state-of-the-art search engines, but to design a light weight and cost-effective system with comparable performances. We introduce a twopass retrieval framework, with the first pass consisting of a simple and efficient retrieval model that focuses on recall, and the second pass a wave of feature extraction algorithms run on the set of top ranked documents, followed by Learning to Rank (LETOR) algorithms that provide different precision oriented rankings, and their outputs are combined using data fusion. We have focused on using statistical Language Models with novel and well-known smoothing techniques, different LETOR methods and various data fusion techniques. In addition, we have also tried using topic modelling with Hierarchical Dirichlet Allocation for query expansion in the hope of improving diversity of our results. However, the topic modelling approach has turned out to be unsuccessful, and we have not been able to spot the problem and benefit from it in this work. In addition we also present some further analyses demonstrating that our approach is robust against overfitting, and some general studies on overfitting in the context of LETOR. ; Presented at the Twenty-Third Text REtrieval Conference (TREC 2014) held in Gaithersburg, Maryland, November 19-21, 2014. The conference was co-sponsored by the National Institute of Standards and Technology (NIST) and the Defense Advanced Research Projects Agency (DARPA).
|
|
Keyword:
*DATA FUSION; *INFORMATION RETRIEVAL; *INFORMATION SYSTEMS; *LEARNING MACHINES; ALGORITHMS; BORDA COUNT; CLASSIFICATION; COMPUTATIONAL LINGUISTICS; CONDORCET METHOD; Cybernetics; DATA MINING; FEATURE EXTRACTION; Information Science; INTERNET; KNOWLEDGE MANAGEMENT; LANGUAGE MODELLING; LETOR(LEARNING TO RANK); NDCG(DISCOUNTED CUMULATIVE GAIN); PATTERN RECOGNITION; RANKING; RECIPROCAL RANK; SEARCH ENGINES; SEMANTICS; SMOOTHING(MATHEMATICS); STATISTICAL ANALYSIS; TEXT PROCESSING
|
|
URL: http://www.dtic.mil/docs/citations/ADA618578 http://oai.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA618578
|
|
BASE
|
|
Hide details
|
|
2 |
Maritime Domain Awareness via Agent Learning and Collaboration
|
|
|
|
In: DTIC (2010)
|
|
BASE
|
|
Show details
|
|
3 |
A Multi-Disciplinary University Research Initiative in Hard and Soft Information Fusion: Overview, Research Strategies and Initial Results
|
|
|
|
In: DTIC (2010)
|
|
BASE
|
|
Show details
|
|
4 |
Adaptive Multi-Modal Data Mining and Fusion for Autonomous Intelligence Discovery
|
|
|
|
In: DTIC (2009)
|
|
BASE
|
|
Show details
|
|
6 |
Phenomenology-Based Inverse Scattering for Sensor Information Fusion
|
|
|
|
In: DTIC (2006)
|
|
BASE
|
|
Show details
|
|
7 |
Information Fusion for Command Support (Fusion d'informations pour le soutien au commandement) (CD-ROM)
|
|
In: DTIC (2006)
|
|
BASE
|
|
Show details
|
|
9 |
From Unstructured to Structured Information in Military Intelligence - Some Steps to Improve Information Fusion
|
|
|
|
In: DTIC (2004)
|
|
BASE
|
|
Show details
|
|
10 |
The Case for Using Semantic Nets as a Convergence Format for Symbolic Information Fusion
|
|
|
|
In: DTIC AND NTIS (2004)
|
|
BASE
|
|
Show details
|
|
11 |
Ontology for Level-One Sensor Fusion and Knowledge Discovery
|
|
|
|
In: DTIC (2004)
|
|
BASE
|
|
Show details
|
|
12 |
Interface for Fusing Human and Robotic Intelligence Using Scale-Free Small World Structures (CD-ROM)
|
|
|
|
In: DTIC (2003)
|
|
BASE
|
|
Show details
|
|
13 |
Intentional Systems, Intentional Stance, and Explanations of Intentional Behavior
|
|
|
|
In: DTIC AND NTIS (2000)
|
|
BASE
|
|
Show details
|
|
14 |
Automated Interpretation of Topographic Maps
|
|
|
|
In: DTIC AND NTIS (1992)
|
|
BASE
|
|
Show details
|
|
|
|