论文标题
平行最长增加的子序列和Van Emde Boas树
Parallel Longest Increasing Subsequence and van Emde Boas Trees
论文作者
论文摘要
本文研究了最长增加的子序列(LIS)问题的平行算法。令$ n $为输入大小,$ k $是输入的LIS长度。顺便说一句,LIS是一个简单的问题,可以使用$ O(n \ log n)$ Work中的动态编程(DP)来解决。但是,平行LI是一个长期的挑战。我们不知道具有最佳$ O(n \ log n)$工作和非平行并行的任何并行LIS算法(即$ \ tilde {o}(k)$或$ o(n)$ span)。 本文提出了一种平行的LIS算法,该算法的价格为$ o(n \ log k)$ work,$ \ tilde {o}(k)$ span和$ o(n)$ space,并且比以前的平行LIS LIS算法要简单得多。我们还将算法推广到LI的加权版本,该版本在增加的子序列中最大化所有对象的加权总和。为了实现加权LIS算法结合的更好的工作,我们为Van Emde Boas(VEB)树设计了平行的算法,该算法与顺序VEB树具有相同的结构,并支持工作效率的并行批次插入,删除,删除和范围查询。 我们还实施了平行的LIS算法。我们的实施是轻加权,高效且可扩展的。在输入尺寸$ 10^9 $上,我们的LIS算法在带有$ k \ le 3 \ times 10^5 $的输入上优于高度优化的顺序算法(带有$ O(n \ log k)$成本)。我们的算法也比Shen等人最佳现有的平行实现快得多。 (2022)在所有输入实例上。
This paper studies parallel algorithms for the longest increasing subsequence (LIS) problem. Let $n$ be the input size and $k$ be the LIS length of the input. Sequentially, LIS is a simple problem that can be solved using dynamic programming (DP) in $O(n\log n)$ work. However, parallelizing LIS is a long-standing challenge. We are unaware of any parallel LIS algorithm that has optimal $O(n\log n)$ work and non-trivial parallelism (i.e., $\tilde{O}(k)$ or $o(n)$ span). This paper proposes a parallel LIS algorithm that costs $O(n\log k)$ work, $\tilde{O}(k)$ span, and $O(n)$ space, and is much simpler than the previous parallel LIS algorithms. We also generalize the algorithm to a weighted version of LIS, which maximizes the weighted sum for all objects in an increasing subsequence. To achieve a better work bound for the weighted LIS algorithm, we designed parallel algorithms for the van Emde Boas (vEB) tree, which has the same structure as the sequential vEB tree, and supports work-efficient parallel batch insertion, deletion, and range queries. We also implemented our parallel LIS algorithms. Our implementation is light-weighted, efficient, and scalable. On input size $10^9$, our LIS algorithm outperforms a highly-optimized sequential algorithm (with $O(n\log k)$ cost) on inputs with $k\le 3\times 10^5$. Our algorithm is also much faster than the best existing parallel implementation by Shen et al. (2022) on all input instances.