论文标题
MCSE:多模式对比度学习句子嵌入
MCSE: Multimodal Contrastive Learning of Sentence Embeddings
论文作者
论文摘要
在语义上学习有意义的句子嵌入是自然语言处理中的一个开放问题。在这项工作中,我们提出了一种嵌入学习方法的句子,该方法通过多模式对比目标利用视觉和文本信息。通过对各种语义文本相似性任务的实验,我们证明我们的方法一致地改善了各种数据集和预训练的编码器的性能。特别是,将少量的多模式数据与纯文本语料库结合起来,我们将最新的平均长矛人的相关性提高了1.7%。通过分析文本嵌入空间的属性,我们表明我们的模型在使语义上相似的句子对齐方面表现出色,从而提供了改善性能的解释。
Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal contrastive objective. Through experiments on a variety of semantic textual similarity tasks, we demonstrate that our approach consistently improves the performance across various datasets and pre-trained encoders. In particular, combining a small amount of multimodal data with a large text-only corpus, we improve the state-of-the-art average Spearman's correlation by 1.7%. By analyzing the properties of the textual embedding space, we show that our model excels in aligning semantically similar sentences, providing an explanation for its improved performance.