论文标题
MMGA:图形对齐方式的多模式学习
MMGA: Multimodal Learning with Graph Alignment
论文作者
论文摘要
多模式的预训练打破了模态障碍,并允许将各个方式与信息相互扩展,从而在表示学习方面取得了重大进展。但是,作为数据的一种非常通用和重要的数据形式,由于其非规范性质,因此无法轻易与其他模式相互作用。在本文中,我们提出了MMGA(图形对齐方式的多模式学习),这是一个新颖的多模式预训练框架,以合并社交媒体上的图(社交网络),图像和文本模式中的信息,以增强用户表示学习。在MMGA中,提出了一种多步图对齐机制,以添加从图形模态的自学方法来优化图像和文本编码器,同时使用图像和文本模式中的信息来指导图形编码器学习。我们在Instagram爬行的数据集上进行实验。实验结果表明,MMGA在数据集上运行良好,并改善了粉丝预测任务的性能。我们发布了数据集,这是第一个带有图形的社交媒体多模式数据集,该数据集由60,000个用户标记为基于200万个帖子的特定主题,以促进未来的研究。
Multimodal pre-training breaks down the modality barriers and allows the individual modalities to be mutually augmented with information, resulting in significant advances in representation learning. However, graph modality, as a very general and important form of data, cannot be easily interacted with other modalities because of its non-regular nature. In this paper, we propose MMGA (Multimodal learning with Graph Alignment), a novel multimodal pre-training framework to incorporate information from graph (social network), image and text modalities on social media to enhance user representation learning. In MMGA, a multi-step graph alignment mechanism is proposed to add the self-supervision from graph modality to optimize the image and text encoders, while using the information from the image and text modalities to guide the graph encoder learning. We conduct experiments on the dataset crawled from Instagram. The experimental results show that MMGA works well on the dataset and improves the fans prediction task's performance. We release our dataset, the first social media multimodal dataset with graph, of 60,000 users labeled with specific topics based on 2 million posts to facilitate future research.