一个明确的本地和全局表示解散框架，并具有深度聚类和无监督的对象检测的应用

论文标题

一个明确的本地和全局表示解散框架，并具有深度聚类和无监督的对象检测的应用

An Explicit Local and Global Representation Disentanglement Framework with Applications in Deep Clustering and Unsupervised Object Detection

论文作者

Charakorn, Rujikorn, Thawornwattana, Yuttapong, Itthipuripat, Sirawaj, Pawlowski, Nick, Manoonpong, Poramate, Dilokthanakul, Nat

论文摘要

视觉数据可以在不同级别的粒度上理解，其中全局特征对应于语义级别的信息，而本地特征则对应于纹理模式。在这项工作中，我们提出了一个称为Split的框架，该框架使我们可以将本地和全局信息置于变化自动编码器（VAE）框架内的两个单独的潜在变量。我们的框架通过要求潜在变量的子集生成一组可观察的数据，从而为VAE增加了生成性假设。这种额外的生成假设素数是对本地信息的潜在变量，并鼓励其他潜在变量代表全局信息。我们检查了具有不同生成假设的VAE的三种不同风味。我们表明，该框架可以有效地解开这些模型中的本地和全局信息，从而改善了表示形式，并具有更好的聚类和无监督的对象检测基准。最后，我们在认知神经科学方面的分裂与最新研究之间建立了有关人类视觉感知的分离的联系。我们实验的代码是在https://github.com/51616/split-vae上。

Visual data can be understood at different levels of granularity, where global features correspond to semantic-level information and local features correspond to texture patterns. In this work, we propose a framework, called SPLIT, which allows us to disentangle local and global information into two separate sets of latent variables within the variational autoencoder (VAE) framework. Our framework adds generative assumption to the VAE by requiring a subset of the latent variables to generate an auxiliary set of observable data. This additional generative assumption primes the latent variables to local information and encourages the other latent variables to represent global information. We examine three different flavours of VAEs with different generative assumptions. We show that the framework can effectively disentangle local and global information within these models leads to improved representation, with better clustering and unsupervised object detection benchmarks. Finally, we establish connections between SPLIT and recent research in cognitive neuroscience regarding the disentanglement in human visual perception. The code for our experiments is at https://github.com/51616/split-vae .

下载PDF全文

下载文献需遵守相关版权规定

论文标题