基词：带有变量自动编码器的翻译不变特征级聚类

论文标题

基词：带有变量自动编码器的翻译不变特征级聚类

BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders

论文作者

Märtens, Kaspar, Yau, Christopher

论文摘要

变异自动编码器（VAE）为非线性维度降低提供了灵活且可扩展的框架。但是，在应用程序域（例如基因组学）中，数据集通常是表格和高维的，而降低维度的黑框方法则不能提供足够的见解。常见数据分析工作流程还使用聚类技术来识别类似特征的组。这通常会导致两个阶段的过程，但是，希望构建一个联合建模框架，以同时降低尺寸降低和特征的聚类。在本文中，我们建议通过基础案例实现这一目标：VAE和概率聚类的组合，这使我们可以学习作为解码器网络的一部分的单次基础函数表示。此外，对于并非所有功能都对齐的方案，我们开发了一个扩展程序来处理翻译不变的基函数。我们展示了崩溃的变异推理方案如何在各种玩具示例以及单细胞基因表达数据上证明，对基础vae的可扩展性和有效的推断。

Variational Autoencoders (VAEs) provide a flexible and scalable framework for non-linear dimensionality reduction. However, in application domains such as genomics where data sets are typically tabular and high-dimensional, a black-box approach to dimensionality reduction does not provide sufficient insights. Common data analysis workflows additionally use clustering techniques to identify groups of similar features. This usually leads to a two-stage process, however, it would be desirable to construct a joint modelling framework for simultaneous dimensionality reduction and clustering of features. In this paper, we propose to achieve this through the BasisVAE: a combination of the VAE and a probabilistic clustering prior, which lets us learn a one-hot basis function representation as part of the decoder network. Furthermore, for scenarios where not all features are aligned, we develop an extension to handle translation-invariant basis functions. We show how a collapsed variational inference scheme leads to scalable and efficient inference for BasisVAE, demonstrated on various toy examples as well as on single-cell gene expression data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题