论文标题
半结构化文档信息提取的空间依赖性解析
Spatial Dependency Parsing for Semi-Structured Document Information Extraction
论文作者
论文摘要
通过将每个识别输入令牌分类为IOB(内部,外部,外部和开始)类别之一,通常将半结构化文档图像的信息提取(IE)作为序列标记问题。但是,此类问题设置具有两个固有的局限性,即(1)它不容易处理复杂的空间关系,并且(2)它不适合高度结构化信息,尽管如此,这些信息仍然在现实世界文档的图像中经常观察到。为了解决这些问题,我们首先将IE任务制定为空间依赖解析问题,该问题着重于文档中文本令牌之间的关系。在此设置下,我们建议以端到端方式对文档中高度复杂的空间关系和任意数量的信息层进行建模。我们在各种文档(例如收据,名卡,表格和发票)上对其进行了评估,并表明与基于Bert的IOB Taggger(包括Bert的IOB Taggger)相比,它具有相似或更好的性能。
Information Extraction (IE) for semi-structured document images is often approached as a sequence tagging problem by classifying each recognized input token into one of the IOB (Inside, Outside, and Beginning) categories. However, such problem setup has two inherent limitations that (1) it cannot easily handle complex spatial relationships and (2) it is not suitable for highly structured information, which are nevertheless frequently observed in real-world document images. To tackle these issues, we first formulate the IE task as spatial dependency parsing problem that focuses on the relationship among text tokens in the documents. Under this setup, we then propose SPADE (SPAtial DEpendency parser) that models highly complex spatial relationships and an arbitrary number of information layers in the documents in an end-to-end manner. We evaluate it on various kinds of documents such as receipts, name cards, forms, and invoices, and show that it achieves a similar or better performance compared to strong baselines including BERT-based IOB taggger.