与Rosetta vae的可重现，增量表示学习

论文标题

与Rosetta vae的可重现，增量表示学习

Reproducible, incremental representation learning with Rosetta VAE

论文作者

Martinez, Miles, Pearson, John

论文摘要

变异自动编码器是将低维结构从高维数据中提取的最流行的方法之一，使其成为数据探索和科学发现的工具越来越有价值。但是，与典型的机器学习问题不同，在单个大型数据集中对单个模型进行了一次训练，科学工作流人特权学习的功能可再现，可在实验室中移植，并且能够逐步添加新数据。理想情况下，不同研究小组使用的方法也应产生可比的结果，即使不共享训练有素的模型或整个数据集。在这里，我们通过引入Rosetta Vae（R-VAE）来应对这一挑战，Rosetta vae（R-VAE）是一种提炼先前学到的表示形式并重新培训新模型以再现和建立先前结果的方法。 R-VAE在训练有素的模型的潜在空间上使用事后聚类来识别少数Rosetta点（输入，潜在对），以作为训练未来模型的锚点。可调节的超参数，$ρ$，平衡了以前学到的潜在空间，以与新数据的适应性相比。我们证明，R-VAE重建数据以及VAE和$β$ -VAE，在顺序训练环境中恢复了目标潜在空间的两种方法，并大大提高了整个训练运行中学习表现的一致性。

Variational autoencoders are among the most popular methods for distilling low-dimensional structure from high-dimensional data, making them increasingly valuable as tools for data exploration and scientific discovery. However, unlike typical machine learning problems in which a single model is trained once on a single large dataset, scientific workflows privilege learned features that are reproducible, portable across labs, and capable of incrementally adding new data. Ideally, methods used by different research groups should produce comparable results, even without sharing fully trained models or entire data sets. Here, we address this challenge by introducing the Rosetta VAE (R-VAE), a method of distilling previously learned representations and retraining new models to reproduce and build on prior results. The R-VAE uses post hoc clustering over the latent space of a fully-trained model to identify a small number of Rosetta Points (input, latent pairs) to serve as anchors for training future models. An adjustable hyperparameter, $ρ$, balances fidelity to the previously learned latent space against accommodation of new data. We demonstrate that the R-VAE reconstructs data as well as the VAE and $β$-VAE, outperforms both methods in recovery of a target latent space in a sequential training setting, and dramatically increases consistency of the learned representation across training runs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题