DE eng

Search in the Catalogues and Directories

Hits 1 – 11 of 11

1
An improved Bayesian TRIE based model for SMS text normalization ...
Abstract: Normalization of SMS text, commonly known as texting language, is being pursued for more than a decade. A probabilistic approach based on the Trie data structure was proposed in literature which was found to be better performing than HMM based approaches proposed earlier in predicting the correct alternative for an out-of-lexicon word. However, success of the Trie based approach depends largely on how correctly the underlying probabilities of word occurrences are estimated. In this work we propose a structural modification to the existing Trie-based model along with a novel training algorithm and probability generation scheme. We prove two theorems on statistical properties of the proposed Trie and use them to claim that is an unbiased and consistent estimator of the occurrence probabilities of the words. We further fuse our model into the paradigm of noisy channel based error correction and provide a heuristic to go beyond a Damerau Levenshtein distance of one. We also run simulations to support our claims ... : 7 pages, 8 figures, under review at Pattern Recognition Letters ...
Keyword: Computation and Language cs.CL; Data Structures and Algorithms cs.DS; FOS Computer and information sciences
URL: https://dx.doi.org/10.48550/arxiv.2008.01297
https://arxiv.org/abs/2008.01297
BASE
Hide details
2
Sliding window property testing for regular languages ...
BASE
Show details
3
Matroids Hitting Sets and Unsupervised Dependency Grammar Induction ...
BASE
Show details
4
A polynomial time algorithm for the Lambek calculus with brackets of bounded order ...
BASE
Show details
5
Tuned and GPU-accelerated parallel data mining from comparable corpora ...
BASE
Show details
6
Implementation of an Automatic Syllabic Division Algorithm from Speech Files in Portuguese Language ...
BASE
Show details
7
Good parts first - a new algorithm for approximate search in lexica and string databases ...
BASE
Show details
8
An Object-Oriented and Fast Lexicon for Semantic Generation ...
BASE
Show details
9
Incremental Construction of Compact Acyclic NFAs ...
BASE
Show details
10
A Straightforward Approach to Morphological Analysis and Synthesis ...
BASE
Show details
11
A Formal Framework for Linguistic Annotation (revised version) ...
Bird, Steven; Liberman, Mark. - : arXiv, 2000
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
11
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern