论文标题
跨模式匹配的通用加权度量学习
Universal Weighting Metric Learning for Cross-Modal Matching
论文作者
论文摘要
跨模式匹配一直是视觉和语言领域的突出研究主题。学习适当的采矿策略来采样和体重信息对,对于跨模式匹配性能至关重要。但是,大多数现有的度量学习方法都是用于单峰匹配的,这不适合具有具有异质特征的多模式数据的跨模式匹配。为了解决此问题,我们为跨模式匹配提出了一个简单且可解释的通用加权框架,该匹配框架提供了一个工具来分析各种损失功能的可解释性。此外,我们在通用加权框架下引入了一种新的多项式损失,该损失分别定义了积极和负信息对的权重函数。在两个匹配基准和两个视频文本匹配基准的两个图像文本上的实验结果验证了所提出方法的功效。
Cross-modal matching has been a highlighted research topic in both vision and language areas. Learning appropriate mining strategy to sample and weight informative pairs is crucial for the cross-modal matching performance. However, most existing metric learning methods are developed for unimodal matching, which is unsuitable for cross-modal matching on multimodal data with heterogeneous features. To address this problem, we propose a simple and interpretable universal weighting framework for cross-modal matching, which provides a tool to analyze the interpretability of various loss functions. Furthermore, we introduce a new polynomial loss under the universal weighting framework, which defines a weight function for the positive and negative informative pairs respectively. Experimental results on two image-text matching benchmarks and two video-text matching benchmarks validate the efficacy of the proposed method.