Average-Case Analysis of Approximate Trie Search
Tóm tắt
For the exact search of a pattern of length m in a database of n strings the trie data structure allows an optimal lookup time of O(m). If mismatches are allowed between the pattern and the database strings, no such structure with reasonable size is known. Some work can be saved using a trie and running times superior to the comparison with every string in the database can be achieved. We investigate a comparison-based model where matches and mismatches are defined between pairs of characters. When comparing two characters, let q be the probability of an error. Between any two strings we bound the number of errors by d, which we consider a function of n. We study the average-case complexity of the number of comparisons for searching in a trie in dependence of the parameters q and d. Our analysis yields the asymptotic behavior for memoryless sources with uniform probabilities. It turns out that there is a jump in the average-case complexity at certain thresholds for q and d. Our results can be applied for any comparison-based error model, for instance, Hamming distance, don't cares, or geometric character distances.