邻居嵌入中的吸引力 - 抑制光谱

论文标题

邻居嵌入中的吸引力 - 抑制光谱

Attraction-Repulsion Spectrum in Neighbor Embeddings

论文作者

Böhm, Jan Niklas, Berens, Philipp, Kobak, Dmitry

论文摘要

邻居嵌入是一种使用$ k $ nn图可视化复杂高维数据集的方法家族。为了找到低维嵌入，这些算法将相邻点之间的吸引力与所有点之间的排斥力结合在一起。这种算法最受欢迎的例子之一是T-SNE。在这里，我们从经验上表明，使用夸张参数改变T-SNE中有吸引力的和排斥力之间的平衡产生了一系列嵌入，其特征是简单的权衡：更强的吸引力可以更好地表示连续的流形结构，而更强大的排斥力可以更好地表示离散的集群结构并产生更高的$ K $ nn $ NN。我们发现UMAP嵌入对应于T-SNE，吸引力增加。数学分析表明，这是因为UMAP采用的负抽样优化策略强烈降低了有效的排斥。同样，通常用于可视化发育单细胞转录组数据的ForceAtlas2产生与T-SNE相对应的嵌入，吸引吸引力增加了更多。在这个频谱的极端是拉普拉斯征本图。我们的结果表明，可以将许多突出的邻居嵌入算法放在吸引力抑制范围上，并突出显示它们之间固有的权衡。

Neighbor embeddings are a family of methods for visualizing complex high-dimensional datasets using $k$NN graphs. To find the low-dimensional embedding, these algorithms combine an attractive force between neighboring pairs of points with a repulsive force between all points. One of the most popular examples of such algorithms is t-SNE. Here we empirically show that changing the balance between the attractive and the repulsive forces in t-SNE using the exaggeration parameter yields a spectrum of embeddings, which is characterized by a simple trade-off: stronger attraction can better represent continuous manifold structures, while stronger repulsion can better represent discrete cluster structures and yields higher $k$NN recall. We find that UMAP embeddings correspond to t-SNE with increased attraction; mathematical analysis shows that this is because the negative sampling optimisation strategy employed by UMAP strongly lowers the effective repulsion. Likewise, ForceAtlas2, commonly used for visualizing developmental single-cell transcriptomic data, yields embeddings corresponding to t-SNE with the attraction increased even more. At the extreme of this spectrum lie Laplacian Eigenmaps. Our results demonstrate that many prominent neighbor embedding algorithms can be placed onto the attraction-repulsion spectrum, and highlight the inherent trade-offs between them.

下载PDF全文

下载文献需遵守相关版权规定

论文标题