论文标题
快速检测一组序列的特定片段
Fast detection of specific fragments against a set of sequences
论文作者
论文摘要
我们设计了无对齐的技术,用于比较一个称为目标的序列或单词(称为目标),称为参考。目标$ t $针对参考$ r $的目标特定因素是$ t $中每个单词的因子$ w $,而不是$ r $的一个单词,因此任何适当的因子$ w $都是$ r $的一个元素。我们首先解决了针对目标$ t $的目标特定因素的计算,其中$ t $和$ r $是有限的序列集。结果是构建自动机,该自动机接受所有被认为是目标特异性因素的集合。构造算法根据$ t \ cup r $的大小在线性时间内运行。第二个结果包括设计算法,以计算其目标特定因素的单个序列$ t $与参考$ r $ $。该算法在目标序列上实时运行,独立于目标特异性因素的发生数量。
We design alignment-free techniques for comparing a sequence or word, called a target, against a set of words, called a reference. A target-specific factor of a target $T$ against a reference $R$ is a factor $w$ of a word in $T$ which is not a factor of a word of $R$ and such that any proper factor of $w$ is a factor of a word of $R$. We first address the computation of the set of target-specific factors of a target $T$ against a reference $R$, where $T$ and $R$ are finite sets of sequences. The result is the construction of an automaton accepting the set of all considered target-specific factors. The construction algorithm runs in linear time according to the size of $T\cup R$. The second result consists of the design of an algorithm to compute all the occurrences in a single sequence $T$ of its target-specific factors against a reference $R$. The algorithm runs in real-time on the target sequence, independently of the number of occurrences of target-specific factors.