低差额代码，用于纠正多个缩写和编辑错误

论文标题

低差额代码，用于纠正多个缩写和编辑错误

Low-redundancy codes for correcting multiple short-duplication and edit errors

论文作者

Tang, Yuanyuan, Wang, Shuche, Lou, Hao, Gabrys, Ryan, Farnoud, Farzad

论文摘要

由于其较高的数据密度，寿命，能源效率和易于生成副本的易用性，DNA被认为是满足未来需求的有希望的存储技术。但是，在数据存储和检索的不同阶段，在DNA中可能会出现一系列多种错误，包括删除，插入，重复和替换。当前的论文构建了错误校正的代码，以同时校正简短（串联）重复，最多最多$ p $编辑，其中简短复制生成带有长度$ \ leq 3 $的子字符串的副本，并在原始子弦之后插入副本，并且编辑是替代，删除，删除，删除，或插入。与仅重复的最新代码相比，提出的代码校正最高$ p $编辑（重复），其额外成本约为$ 8p $ 8p（\ log_q n）（1+o（1+o（1））$冗余的符号，从而达到了相同的渐近率，其中$ q \ g ge 4 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $。此外，当$ p $相对于代码长度是常数时，编码和解码过程的时间复杂性都是多项式的。

Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising storage technology for satisfying future needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simultaneously correcting short (tandem) duplications and at most $p$ edits, where a short duplication generates a copy of a substring with length $\leq 3$ and inserts the copy following the original substring, and an edit is a substitution, deletion, or insertion. Compared to the state-of-the-art codes for duplications only, the proposed codes correct up to $p$ edits (in addition to duplications) at the additional cost of roughly $8p(\log_q n)(1+o(1))$ symbols of redundancy, thus achieving the same asymptotic rate, where $q\ge 4$ is the alphabet size and $p$ is a constant. Furthermore, the time complexities of both the encoding and decoding processes are polynomial when $p$ is a constant with respect to the code length.

下载PDF全文

下载文献需遵守相关版权规定

论文标题