论文标题

通过匹配统计数据的后缀排序

Suffix sorting via matching statistics

论文作者

Lipták, Zsuzsanna, Masillo, Francesco, Puglisi, Simon J.

论文摘要

我们引入了一种新算法,用于构建高度相似字符串集合的广义后缀阵列。作为第一步,我们构建了有关参考字符串的集合量统计量的压缩表示。然后,我们使用此数据结构将后缀分配到部分顺序中,然后加快后缀比较以完成广义后缀阵列。我们使用原型实现的实验证据(我们称为Sacamats的工具)表明,在具有高度相似字符串的字符串集合中,我们可以在时间上构建后缀阵列,或者比最快的可用方法更具竞争力或更快。在此过程中,我们描述了一种快速计算两个字符串匹配统计数据的启发式,这可能引起了独立的关注。

We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源