论文标题
用于在多个字符串上完全在线建造后缀树和Dawgs的指针机器算法
Pointer-Machine Algorithms for Fully-Online Construction of Suffix Trees and DAWGs on Multiple Strings
论文作者
论文摘要
我们处理的是,要在多个字符串的全线集合中维护后缀树索引结构,在该集合中可以随时将新字符贴在集合中的任何字符串中。该问题的唯一先前已知的算法,最近由Takagi等人提出。 [算法82(5):1346-1377(2020)],以$ o(n \logσ)$ time和$ o(n)$ space运行,其中$ n $表示字符串的总长度,$σ$表示字母大小。他们的算法大量使用了半动态树上的最接近标记的祖先(NMA)数据结构,可以回答查询并支持在$ o(1)$ o(1)$ a amortized时间中插入节点上的时间。在本文中,我们提出了一种更简单的全线直接直通算法,该算法在$ o(n(\ logσ+ \ log d))中为给定的字符串集合构建后缀树$ o(n(\ logσ+ \ log d))$ time和$ o(n)$ space,其中$ d $是$ d $的最大数量,可最大数量到达后缀树的node node。我们注意到,$ d $受后缀树的高度界定,该后缀是由集合中最长的字符串的长度界定的。这种新算法的优点是它在指针机模型上起作用,即,它不使用涉及表查找的复杂的NMA数据结构。作为副产品,我们还获得了用于构建有向的无环单词图(DAWG)的指针机器算法,用于全线在线的多个字符串的全线收集,该集合以$ o(n(\ log fitσ+ \ log log d d)运行,$ o(n(\ logσ+ \ log d))$ o($ o(n)$ o(n)$($ o(n)$空间无需NMA数据结构。
We deal with the problem of maintaining the suffix tree indexing structure for a fully-online collection of multiple strings, where a new character can be prepended to any string in the collection at any time. The only previously known algorithm for the problem, recently proposed by Takagi et al. [Algorithmica 82(5): 1346-1377 (2020)], runs in $O(N \log σ)$ time and $O(N)$ space on the word RAM model, where $N$ denotes the total length of the strings and $σ$ denotes the alphabet size. Their algorithm makes heavy use of the nearest marked ancestor (NMA) data structure on semi-dynamic trees, that can answer queries and supports insertion of nodes in $O(1)$ amortized time on the word RAM model. In this paper, we present a simpler fully-online right-to-left algorithm that builds the suffix tree for a given string collection in $O(N (\log σ+ \log d))$ time and $O(N)$ space, where $d$ is the maximum number of in-coming Weiner links to a node of the suffix tree. We note that $d$ is bounded by the height of the suffix tree, which is further bounded by the length of the longest string in the collection. The advantage of this new algorithm is that it works on the pointer machine model, namely, it does not use the complicated NMA data structures that involve table look-ups. As a byproduct, we also obtain a pointer-machine algorithm for building the directed acyclic word graph (DAWG) for a fully-online left-to-right collection of multiple strings, which runs in $O(N (\log σ+ \log d))$ time and $O(N)$ space again without the aid of the NMA data structures.