论文标题
涉及拷贝数概况的基因组问题:复杂性和算法
Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms
论文作者
论文摘要
最近,由于几种类型的癌症的基因组序列分析,基于{\ em拷贝数剖面}的基因组数据(简称{\ em cnp}简称)越来越流行。 CNP是一个向量,其中每个组件是一个非负整数,代表特定基因或目标段的副本数量。 在本文中,我们提出了两个结果流。第一个是关于两个开放问题的负面结果,该问题涉及Qingge等人提出的最小拷贝数(MCNG)问题的计算复杂性。在2018年。如果复制是串联的,那么问题是NP,并且如果使用任意重复,则将问题是否仍然是np-hard留下的问题。我们在本文中肯定地回答了这个问题。实际上,我们证明即使获得一个恒定因子近似是NP-HARD。我们还证明了参数化版本是w [1] -hard,回答了Qingge等人的另一个打开问题。 另一个结果是肯定的,是基于关于CNP的新(且更普遍的)问题。 \ emph {复制号码配置符合(CNPC)}问题正式定义如下:给定两个CNP的$ C_1 $和$ C_2 $,计算两个字符串$ S_1 $和$ S_2 $和$ CNP(S_1) $ d(s_1,s_2)$被最小化。在这里,$ d(s_1,s_2)$是一个非常通用的术语,这意味着它可能是任何基因组重排距离(例如逆转,换位和串联重复等)。我们通过证明$ d(s_1,s_2)$由断点距离衡量,那么问题是多项式解决的。
Recently, due to the genomic sequence analysis in several types of cancer, the genomic data based on {\em copy number profiles} ({\em CNP} for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific gene or segment of interest. In this paper, we present two streams of results. The first is the negative results on two open problems regarding the computational complexity of the Minimum Copy Number Generation (MCNG) problem posed by Qingge et al. in 2018. It was shown by Qingge et al. that the problem is NP-hard if the duplications are tandem and they left the open question of whether the problem remains NP-hard if arbitrary duplications are used. We answer this question affirmatively in this paper; in fact, we prove that it is NP-hard to even obtain a constant factor approximation. We also prove that the parameterized version is W[1]-hard, answering another open question by Qingge et al. The other result is positive and is based on a new (and more general) problem regarding CNP's. The \emph{Copy Number Profile Conforming (CNPC)} problem is formally defined as follows: given two CNP's $C_1$ and $C_2$, compute two strings $S_1$ and $S_2$ with $cnp(S_1)=C_1$ and $cnp(S_2)=C_2$ such that the distance between $S_1$ and $S_2$, $d(S_1,S_2)$, is minimized. Here, $d(S_1,S_2)$ is a very general term, which means it could be any genome rearrangement distance (like reversal, transposition, and tandem duplication, etc). We make the first step by showing that if $d(S_1,S_2)$ is measured by the breakpoint distance then the problem is polynomially solvable.