1 |
Constructing Strings Avoiding Forbidden Substrings
|
|
|
|
In: CPM 2021 - 32nd Annual Symposium on Combinatorial Pattern Matching ; https://hal.inria.fr/hal-03395386 ; CPM 2021 - 32nd Annual Symposium on Combinatorial Pattern Matching, Jul 2021, Wroclaw, Poland. pp.1-18 (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Bidirectional String Anchors: A New String Sampling Mechanism ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Longest Unbordered Factor in Quasilinear Time
|
|
Kociumaka, Tomasz; Kundu, Ritu; Mohamed, Manal. - : Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. : LIPIcs - Leibniz International Proceedings in Informatics. 29th International Symposium on Algorithms and Computation (ISAAC 2018), 2018
|
|
BASE
|
|
Show details
|
|
10 |
Longest Common Prefixes with $k$-Errors and Applications ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Optimal Computation of Overabundant Words
|
|
|
|
Abstract:
The observed frequency of the longest proper prefix, the longest proper suffix, and the longest infix of a word w in a given sequence x can be used for classifying w as avoided or overabundant. The definitions used for the expectation and deviation of w in this statistical model were described and biologically justified by Brendel et al. (J Biomol Struct Dyn 1986). We have very recently introduced a time-optimal algorithm for computing all avoided words of a given sequence over an integer alphabet (Algorithms Mol Biol 2017). In this article, we extend this study by presenting an O(n)-time and O(n)-space algorithm for computing all overabundant words in a sequence x of length n over an integer alphabet. Our main result is based on a new non-trivial combinatorial property of the suffix tree T of x: the number of distinct factors of x whose longest infix is the label of an explicit node of T is no more than 3n-4. We further show that the presented algorithm is time-optimal by proving that O(n) is a tight upper bound for the number of overabundant words. Finally, we present experimental results, using both synthetic and real data, which justify the effectiveness and efficiency of our approach in practical terms.
|
|
Keyword:
avoided words; Data processing Computer science; DNA sequence analysis; overabundant words; suffix tree
|
|
URN:
urn:nbn:de:0030-drops-76468
|
|
URL: https://drops.dagstuhl.de/opus/volltexte/2017/7646/ https://doi.org/10.4230/LIPIcs.WABI.2017.4
|
|
BASE
|
|
Hide details
|
|
16 |
Efficient Index for Weighted Sequences
|
|
Barton, Carl; Kociumaka, Tomasz; Pissis, Solon P.. - : Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016. : LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), 2016
|
|
BASE
|
|
Show details
|
|
18 |
Linear-time computation of minimal absent words using suffix array
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Order-Preserving Suffix Trees and Their Algorithmic Applications ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|