论文标题
参数化DAWG:有效的结构和双向模式搜索
Parameterized DAWGs: efficient constructions and bidirectional pattern searches
论文作者
论文摘要
如果\ emph {参数化匹配}}(\ emph {p-match}),则两个字符串$ x $和$ y $相等长度的$ c \cupπ$,如果重命名的bioject $ f:σ\cupπ\cupπ\ cupt \ rightarrowfightarrowσ\ cuptσ\capπ$,这是$ f in $σ$的标识,并$ x $ x $ y $ y($ x $ y)( \ emph {p匹配}问题是在p匹配给定模式的文本中寻找子字符串。在本文中,我们提出\ emph {参数化后缀自动机}(\ emph {p-suffix automata})和\ emph {参数化有向的acyclic word graphs}(\ emph {pdawgs}),这是p匹配版本的p匹配版本。尽管后缀自动机和DAWG相当于标准字符串,但我们表明p-suffix automata可以具有$θ(n^2)$ nodes and Edges,但pdawgs只有$ o(n)$ nodes and Edges,其中$ n $是输入字符串的长度。我们还给出了$ o(n |π| \ log(|π| + |σ|))$ - 时间$ o(n)$ - 空间算法,该算法以左右的在线方式构建PDAWG。作为副产品,证明\ emph {参数化的后缀树}也可以在左至左的在线方式同时和空间构建相反的字符串。这种双重性也使我们采用了p匹配的两种进一步的有效算法:给定输入字符串$ t $反转的参数化后缀树,一个人可以以离线方式构建$ o(n)$时间的$ t $ $ t $;一个人可以在$ o中执行\ emph {biDirectional} p匹配(m \ log(|π| + | = |σ|) + \ mathit {occ})$ time使用$ o(n)$ space,其中$ m $表示模式长度,$ \ m mathit {occ} $是文本$ t $ t $ t $ t $。
Two strings $x$ and $y$ over $Σ\cup Π$ of equal length are said to \emph{parameterized match} (\emph{p-match}) if there is a renaming bijection $f:Σ\cup Π\rightarrow Σ\cup Π$ that is identity on $Σ$ and transforms $x$ to $y$ (or vice versa). The \emph{p-matching} problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose \emph{parameterized suffix automata} (\emph{p-suffix automata}) and \emph{parameterized directed acyclic word graphs} (\emph{PDAWGs}) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have $Θ(n^2)$ nodes and edges but PDAWGs have only $O(n)$ nodes and edges, where $n$ is the length of an input string. We also give an $O(n |Π| \log (|Π| + |Σ|))$-time $O(n)$-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the \emph{parameterized suffix tree} for the reversed string can also be built in the same time and space, in a right-to-left online manner. This duality also leads us to two further efficient algorithms for p-matching: Given the parameterized suffix tree for the reversal of the input string $T$, one can build the PDAWG of $T$ in $O(n)$ time in an offline manner; One can perform \emph{bidirectional} p-matching in $O(m \log (|Π|+|Σ|) + \mathit{occ})$ time using $O(n)$ space, where $m$ denotes the pattern length and $\mathit{occ}$ is the number of pattern occurrences in the text $T$.