论文标题

部分可观测时空混沌系统的无模型预测

On Analyzing the Role of Image for Visual-enhanced Relation Extraction

论文作者

Li, Lei, Chen, Xiang, Qiao, Shuofei, Xiong, Feiyu, Chen, Huajun, Zhang, Ningyu

论文摘要

多模式关系提取是知识图构建的重要任务。在本文中,我们进行了深入的经验分析,表明视觉场景图中的信息不准确,导致模态对准权重差,进一步降低了性能。此外,视觉混乱实验表明,当前的方法可能无法充分利用视觉信息。基于上述观察,我们进一步提出了一个强大的基线,其基于隐式细粒的多模式对准基于变压器进行多模式关系提取。实验结果表明我们方法的性能更好。代码可从https://github.com/zjunlp/deepke/tree/main/main/example/re/multimodal获得。

Multimodal relation extraction is an essential task for knowledge graph construction. In this paper, we take an in-depth empirical analysis that indicates the inaccurate information in the visual scene graph leads to poor modal alignment weights, further degrading performance. Moreover, the visual shuffle experiments illustrate that the current approaches may not take full advantage of visual information. Based on the above observation, we further propose a strong baseline with an implicit fine-grained multimodal alignment based on Transformer for multimodal relation extraction. Experimental results demonstrate the better performance of our method. Codes are available at https://github.com/zjunlp/DeepKE/tree/main/example/re/multimodal.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源