学习多模式非线性嵌入：性能界限和算法

论文标题

学习多模式非线性嵌入：性能界限和算法

Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm

论文作者

Kaya, Semih, Vural, Elif

论文摘要

尽管文献中存在许多方法，以学习多种模式的数据收集的低维表示，但多模式非线性嵌入到以前看不见的数据中的推广性是一个相当忽略的主题。在这项工作中，我们首先介绍了在监督环境中学习多模式非线性嵌入的理论分析。我们的性能范围表明，对于在多模式分类和检索问题中成功概括的术语，将嵌入到整个数据空间的插值函数的规律性与类别间隔和跨模式对准标准一样重要。然后，我们提出了一种由这些理论发现激发的多模式非线性表示学习算法，其中训练样本的嵌入与插入器的Lipschitz规律性共同优化。与最近的多模式和单模式学习算法的实验比较表明，该方法在多模式图像分类和跨模式图像 - 文本检索应用中产生了有希望的性能。

While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of the interpolation functions extending the embedding to the whole data space is as important as the between-class separation and cross-modal alignment criteria. We then propose a multi-modal nonlinear representation learning algorithm that is motivated by these theoretical findings, where the embeddings of the training samples are optimized jointly with the Lipschitz regularity of the interpolators. Experimental comparison to recent multi-modal and single-modal learning algorithms suggests that the proposed method yields promising performance in multi-modal image classification and cross-modal image-text retrieval applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题