论文标题

带有原始确定性信息瓶颈的帕累托最佳聚类

Pareto-optimal clustering with the primal deterministic information bottleneck

论文作者

Tan, Andrew K., Tegmark, Max, Chuang, Isaac L.

论文摘要

有损压缩和聚类的核心是学识渊博表示的忠​​诚度和规模之间的权衡。我们的目标是绘制并研究量化这种权衡的帕累托前沿。我们专注于确定性信息瓶颈(DIB)目标的优化。为此,我们介绍了原始的DIB问题,当在离散搜索空间上优化时,我们显示的结果比先前研究的Lagrangian放松更丰富。我们提出了一种算法,用于绘制原始DIB权衡的帕累托前沿,该算法也适用于其他两项目标聚类问题。我们研究了帕累托前沿的一般特性,并提供了总体上对数稀疏性的分析和数值证据。我们提供的证据表明,尽管有超过指数的搜索空间,但我们的算法具有多项式缩放,此外,我们提出了对算法的修改,该算法可以在预期采样噪声显着的情况下使用。最后,我们使用算法来绘制三个不同任务的DIB前沿:压缩英语字母,从自然图像中提取信息性的颜色类,并压缩一个以群体理论为灵感的数据集,揭示Frontier的有趣特征,并揭示Frontier的结构如何用于以前隐藏在Conve of conve hull的cloak hull上的焦点上的焦点。

At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the optimization of the Deterministic Information Bottleneck (DIB) objective over the space of hard clusterings. To this end, we introduce the primal DIB problem, which we show results in a much richer frontier than its previously studied Lagrangian relaxation when optimized over discrete search spaces. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to other two-objective clustering problems. We study general properties of the Pareto frontier, and we give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space, and additionally, we propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory-inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源