论文标题
力矩匹配深层的潜在变量模型
Moment Matching Deep Contrastive Latent Variable Models
论文作者
论文摘要
在对比度分析(CA)设置中,与从与手头的任务无关的变异源生成的背景数据集相比,机器学习从业人员对发现富含目标数据集中的模式特别感兴趣。例如,生物医学数据分析师可能寻求了解仅在给定疾病患者中存在的基因组数据的变化,而不是在健康对照组中也存在的差异。这种情况促使对比度可变模型的开发,以隔离这些目标数据集独特的变化与跨目标和背景数据集共享的变化,并具有基于变异自动编码器(VAE)框架的当前最新模型。但是,先前提出的模型并未明确对CA的潜在变量的约束,这可能会导致两组潜在变量之间的信息泄漏。在这里,我们提出了匹配对比VAE(MM-CVAE)的时刻,这是CA的VAE重新制定的,它使用最大平均差异来明确执行CA上的两个至关重要的潜在变量约束。在三个具有挑战性的CA任务上,我们发现我们的方法在定性上和一组定量指标上都优于先前的最先进。
In the contrastive analysis (CA) setting, machine learning practitioners are specifically interested in discovering patterns that are enriched in a target dataset as compared to a background dataset generated from sources of variation irrelevant to the task at hand. For example, a biomedical data analyst may seek to understand variations in genomic data only present among patients with a given disease as opposed to those also present in healthy control subjects. Such scenarios have motivated the development of contrastive latent variable models to isolate variations unique to these target datasets from those shared across the target and background datasets, with current state of the art models based on the variational autoencoder (VAE) framework. However, previously proposed models do not explicitly enforce the constraints on latent variables underlying CA, potentially leading to the undesirable leakage of information between the two sets of latent variables. Here we propose the moment matching contrastive VAE (MM-cVAE), a reformulation of the VAE for CA that uses the maximum mean discrepancy to explicitly enforce two crucial latent variable constraints underlying CA. On three challenging CA tasks we find that our method outperforms the previous state-of-the-art both qualitatively and on a set of quantitative metrics.