论文标题
单模式和多模式表示培训用于关系提取
Unimodal and Multimodal Representation Training for Relation Extraction
论文作者
论文摘要
文本,布局和视觉信息的多模式集成已实现了SOTA的视觉文档理解(VRDU)任务,包括关系提取(RE)。然而,尽管其重要性,但评估这些方式的相对预测能力并不那么普遍。在这里,我们通过进行训练过程中每种数据类型迭代排除的每个数据类型的实验来证明共享表示对RE任务的价值。另外,隔离评估文本和布局数据。虽然双峰文本和布局方法表现最好(F1 = 0.684),但我们表明文本是实体关系的最重要的单一预测指标。此外,布局几何形状具有高度预测性,甚至可能是一种可行的单峰方法。尽管效果较低,但我们重点介绍了视觉信息可以增强性能的情况。总的来说,我们的结果证明了训练联合表示对RE的功效。
Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1=0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results demonstrate the efficacy of training joint representations for RE.