文档接地对话的布局感知信息提取：数据集，方法和演示

论文标题

文档接地对话的布局感知信息提取：数据集，方法和演示

Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration

论文作者

Zhang, Zhenyu, Yu, Bowen, Yu, Haiyang, Liu, Tingwen, Fu, Cheng, Li, Jingyang, Tang, Chengguang, Sun, Jian, Li, Yongbin

论文摘要

由于文件传达了大量人类知识，并且通常存在于企业中，因此建筑文档的对话系统已经越来越兴趣。其中，如何理解和从文档中检索信息是一个具有挑战性的研究问题。以前的工作忽略了文档的视觉属性，并将其视为纯文本，从而导致不完整的方式。在本文中，我们提出了一个布局感知文档级信息提取数据集，以促进从视觉上丰富文档（VRD）中提取结构和语义知识的研究，以在对话系统中产生准确的响应。谎言包含来自4,061页的产品和官方文档的三项提取任务的62K注释，成为我们最大的知识，成为最大的基于VRD的信息提取数据集。我们还开发了基准方法，该方法扩展了基于令牌的语言模型，以考虑像人类这样的布局功能。经验结果表明，布局对于基于VRD的提取至关重要，系统演示还验证了提取的知识可以帮助找到用户关心的答案。

Building document-grounded dialogue systems have received growing interest as documents convey a wealth of human knowledge and commonly exist in enterprises. Wherein, how to comprehend and retrieve information from documents is a challenging research problem. Previous work ignores the visual property of documents and treats them as plain text, resulting in incomplete modality. In this paper, we propose a Layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents (VRDs), so as to generate accurate responses in dialogue systems. LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents, becoming the largest VRD-based information extraction dataset to the best of our knowledge. We also develop benchmark methods that extend the token-based language model to consider layout features like humans. Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.

下载PDF全文

下载文献需遵守相关版权规定

论文标题