语义引导的表示学习，并应用于视觉合成

论文标题

语义引导的表示学习，并应用于视觉合成

Semantics-Guided Representation Learning with Applications to Visual Synthesis

论文作者

Yan, Jia-Wei, Lin, Ci-Siang, Yang, Fu-En, Li, Yu-Jhe, Wang, Yu-Chiang Frank

论文摘要

学习可解释和可介入的潜在表示已成为一个新兴的研究方向，使研究人员能够理解并利用派生的潜在空间来进行进一步的应用，例如视觉合成或识别。尽管大多数现有方法都会得出可插值的潜在空间并引起图像外观的平滑过渡，但仍不清楚如何观察到所需的表示，这些表示包含感兴趣的语义信息。在本文中，我们旨在学习有意义的表示，并同时执行面向语义的和视觉平滑的插值。为此，我们提出了一个角度的三胞胎 - 邻居损失（ATNL），该损失（ATNL）可以学习一个潜在表示，其分布与感兴趣的语义信息相匹配。在ATNL引导的潜在空间中，我们进一步利用球形语义插值来生成图像的语义翘曲，从而综合了理想的视觉数据。对MNIST和CMU多PIE数据集进行定性和定量验证我们方法的有效性的实验。

Learning interpretable and interpolatable latent representations has been an emerging research direction, allowing researchers to understand and utilize the derived latent space for further applications such as visual synthesis or recognition. While most existing approaches derive an interpolatable latent space and induces smooth transition in image appearance, it is still not clear how to observe desirable representations which would contain semantic information of interest. In this paper, we aim to learn meaningful representations and simultaneously perform semantic-oriented and visually-smooth interpolation. To this end, we propose an angular triplet-neighbor loss (ATNL) that enables learning a latent representation whose distribution matches the semantic information of interest. With the latent space guided by ATNL, we further utilize spherical semantic interpolation for generating semantic warping of images, allowing synthesis of desirable visual data. Experiments on MNIST and CMU Multi-PIE datasets qualitatively and quantitatively verify the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题