DE eng

Search in the Catalogues and Directories

Hits 1 – 2 of 2

1
A Polya Urn Document Language Model for Improved Information Retrieval ...
Abstract: The multinomial language model has been one of the most effective models of retrieval for over a decade. However, the multinomial distribution does not model one important linguistic phenomenon relating to term-dependency, that is the tendency of a term to repeat itself within a document (i.e. word burstiness). In this article, we model document generation as a random process with reinforcement (a multivariate Polya process) and develop a Dirichlet compound multinomial language model that captures word burstiness directly. We show that the new reinforced language model can be computed as efficiently as current retrieval models, and with experiments on an extensive set of TREC collections, we show that it significantly outperforms the state-of-the-art language model for a number of standard effectiveness metrics. Experiments also show that the tuning parameter in the proposed model is more robust than in the multinomial language model. Furthermore, we develop a constraint for the verbosity hypothesis and show ... : 37 page journal submission (accepted for publication in TOIS) ...
Keyword: FOS Computer and information sciences; Information Retrieval cs.IR
URL: https://arxiv.org/abs/1502.00804
https://dx.doi.org/10.48550/arxiv.1502.00804
BASE
Hide details
2
DERI at TREC 2008 Enterprise Search Track
In: DTIC (2008)
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
2
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern