Average-Case Analysis of Approximate Trie Search

Springer Science and Business Media LLC - Tập 46 - Trang 469-491 - 2006
Moritz G. Maass1
1Fakultat fur Informatik, TU Munchen, Boltzmannstr. 3, D-85748, Garching, Germany

Tóm tắt

For the exact search of a pattern of length m in a database of n strings the trie data structure allows an optimal lookup time of O(m). If mismatches are allowed between the pattern and the database strings, no such structure with reasonable size is known. Some work can be saved using a trie and running times superior to the comparison with every string in the database can be achieved. We investigate a comparison-based model where matches and mismatches are defined between pairs of characters. When comparing two characters, let q be the probability of an error. Between any two strings we bound the number of errors by d, which we consider a function of n. We study the average-case complexity of the number of comparisons for searching in a trie in dependence of the parameters q and d. Our analysis yields the asymptotic behavior for memoryless sources with uniform probabilities. It turns out that there is a jump in the average-case complexity at certain thresholds for q and d. Our results can be applied for any comparison-based error model, for instance, Hamming distance, don't cares, or geometric character distances.