论文标题
相反:多模式模型的数据效率框架
Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models
论文作者
论文摘要
生成模型的多模式学习通常是指从多种方式(例如视觉和语言)中信息的共同点学习抽象概念。尽管事实证明它可以有效地学习可通用的表示,但对此类模型的培训通常需要大量具有共同点的“相关”多模式数据,这可能很昂贵。为了减轻这种情况,我们为生成模型学习开发了一种新颖的对比框架,使我们不仅可以通过模式之间的共同点来训练模型,还可以通过“相关”和“无关”的多模式数据之间的区别来训练模型。我们在实验中表明,我们的方法可以针对各种多模式VAE模型的挑战性数据集进行数据有效的多模式学习。我们还表明,在我们提出的框架下,生成模型可以准确地从无关的样本中识别出相关样本,从而可以利用丰富的未标记,未配对的多模式数据。
Multimodal learning for generative models often refers to the learning of abstract concepts from the commonality of information in multiple modalities, such as vision and language. While it has proven effective for learning generalisable representations, the training of such models often requires a large amount of "related" multimodal data that shares commonality, which can be expensive to come by. To mitigate this, we develop a novel contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data. We show in experiments that our method enables data-efficient multimodal learning on challenging datasets for various multimodal VAE models. We also show that under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.