合并类似字符串的排序列表

论文标题

合并类似字符串的排序列表

Merging Sorted Lists of Similar Strings

论文作者

Myers, Gene

论文摘要

合并$ t $分类的非冗余列表，其中包含$ m $元素将大小$ n \ ge m/t $的单个分类的，非冗余的结果是一个经典问题，通常是在$ o（m \ log t）$（M \ log t）的情况下实际上解决的，具有优先级数据结构最基本的是简单的 *heap *。在列表元素为 *字符串 *并且列表中包含许多 *相同或几乎相同的元素 *的情况下，我们将重新审视此问题。通过在每个堆节点上保留简单的辅助信息，我们设计了一个$ o（m \ log t+s）$糟糕的方法，该方法的性能不比所有字符串$ s $的长度和另一个$ o（m \ log o（t/ \ bar e）+s $ seption use for n of Firptim/ frations untive y firaker of firagion/ fration untive y firagiation/ fration untial的$ s $的长度的总和更加比较。当列表都相同时，达到线性时间。这些方法在实践中表现出色，而基于TRIE的替代配方。

Merging $T$ sorted, non-redundant lists containing $M$ elements into a single sorted, non-redundant result of size $N \ge M/T$ is a classic problem typically solved practically in $O(M \log T)$ time with a priority-queue data structure the most basic of which is the simple *heap*. We revisit this problem in the situation where the list elements are *strings* and the lists contain many *identical or nearly identical elements*. By keeping simple auxiliary information with each heap node, we devise an $O(M \log T+S)$ worst-case method that performs no more character comparisons than the sum of the lengths of all the strings $S$, and another $O(M \log (T/ \bar e)+S)$ method that becomes progressively more efficient as a function of the fraction of equal elements $\bar e = M/N$ between input lists, reaching linear time when the lists are all identical. The methods perform favorably in practice versus an alternate formulation based on a trie.

下载PDF全文

下载文献需遵守相关版权规定

论文标题