论文标题

合并类似字符串的排序列表

Merging Sorted Lists of Similar Strings

论文作者

Myers, Gene

论文摘要

合并$ t $分类的非冗余列表,其中包含$ m $元素将大小$ n \ ge m/t $的单个分类的,非冗余的结果是一个经典问题,通常是在$ o(m \ log t)$(M \ log t)的情况下实际上解决的,具有优先级数据结构最基本的是简单的 *heap *。在列表元素为 *字符串 *并且列表中包含许多 *相同或几乎相同的元素 *的情况下,我们将重新审视此问题。通过在每个堆节点上保留简单的辅助信息,我们设计了一个$ o(m \ log t+s)$糟糕的方法,该方法的性能不比所有字符串$ s $的长度和另一个$ o(m \ log o(t/ \ bar e)+s $ seption use for n of Firptim/ frations untive y firaker of firagion/ fration untive y firagiation/ fration untial的$ s $的长度的总和更加比较。当列表都相同时,达到线性时间。这些方法在实践中表现出色,而基于TRIE的替代配方。

Merging $T$ sorted, non-redundant lists containing $M$ elements into a single sorted, non-redundant result of size $N \ge M/T$ is a classic problem typically solved practically in $O(M \log T)$ time with a priority-queue data structure the most basic of which is the simple *heap*. We revisit this problem in the situation where the list elements are *strings* and the lists contain many *identical or nearly identical elements*. By keeping simple auxiliary information with each heap node, we devise an $O(M \log T+S)$ worst-case method that performs no more character comparisons than the sum of the lengths of all the strings $S$, and another $O(M \log (T/ \bar e)+S)$ method that becomes progressively more efficient as a function of the fraction of equal elements $\bar e = M/N$ between input lists, reaching linear time when the lists are all identical. The methods perform favorably in practice versus an alternate formulation based on a trie.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源