论文标题
低差额代码,用于纠正多个缩写和编辑错误
Low-redundancy codes for correcting multiple short-duplication and edit errors
论文作者
论文摘要
由于其较高的数据密度,寿命,能源效率和易于生成副本的易用性,DNA被认为是满足未来需求的有希望的存储技术。但是,在数据存储和检索的不同阶段,在DNA中可能会出现一系列多种错误,包括删除,插入,重复和替换。当前的论文构建了错误校正的代码,以同时校正简短(串联)重复,最多最多$ p $编辑,其中简短复制生成带有长度$ \ leq 3 $的子字符串的副本,并在原始子弦之后插入副本,并且编辑是替代,删除,删除,删除,或插入。与仅重复的最新代码相比,提出的代码校正最高$ p $编辑(重复),其额外成本约为$ 8p $ 8p(\ log_q n)(1+o(1+o(1))$冗余的符号,从而达到了相同的渐近率,其中$ q \ g ge 4 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $。此外,当$ p $相对于代码长度是常数时,编码和解码过程的时间复杂性都是多项式的。
Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising storage technology for satisfying future needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simultaneously correcting short (tandem) duplications and at most $p$ edits, where a short duplication generates a copy of a substring with length $\leq 3$ and inserts the copy following the original substring, and an edit is a substitution, deletion, or insertion. Compared to the state-of-the-art codes for duplications only, the proposed codes correct up to $p$ edits (in addition to duplications) at the additional cost of roughly $8p(\log_q n)(1+o(1))$ symbols of redundancy, thus achieving the same asymptotic rate, where $q\ge 4$ is the alphabet size and $p$ is a constant. Furthermore, the time complexities of both the encoding and decoding processes are polynomial when $p$ is a constant with respect to the code length.