论文标题
通过量化的变异自动编码器,通过改进的转录组特征的癌症亚型亚型
Cancer Subtyping by Improved Transcriptomic Features Using Vector Quantized Variational Autoencoder
论文作者
论文摘要
定义和分离癌症亚型对于促进个性化治疗方式和患者预后至关重要。由于我们深入了解,子类型的定义一直在经常重新校准。在此重新校准期间,研究人员通常依靠癌症数据的聚类来提供直观的视觉参考,以揭示亚型的内在特征。聚集的数据通常是OMICS数据,例如与基本生物学机制有很强相关性的转录组学。但是,尽管现有的研究表现出了令人鼓舞的结果,但它们却遭受了与OMICS数据相关的问题:样本稀缺性和高维度。因此,现有方法通常会施加不切实际的假设,以从数据中提取有用的特征,同时避免过度适合虚假相关性。在本文中,我们建议利用最近的强生成模型量化量化变量自动编码器(VQ-VAE),以解决数据问题并提取信息的潜在特征,这些特征对于后续聚类的质量至关重要,仅保留与重建输入相关的信息。 VQ-VAE不会施加严格的假设,因此其潜在特征是输入的更好表示,能够使用任何主流聚类方法产生出色的聚类性能。在包括10种不同癌症的多个数据集上进行了广泛的实验和医学分析,证明了VQ-VAE聚类结果可以显着,稳健地改善对普遍的亚型亚型系统的预后。
Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality. As such, existing methods often impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. In this paper, we propose to leverage a recent strong generative model, Vector Quantized Variational AutoEncoder (VQ-VAE), to tackle the data issues and extract informative latent features that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. VQ-VAE does not impose strict assumptions and hence its latent features are better representations of the input, capable of yielding superior clustering performance with any mainstream clustering method. Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the VQ-VAE clustering results can significantly and robustly improve prognosis over prevalent subtyping systems.