tablenet：从扫描文档图像中提取端到端表检测和表格数据的深度学习模型

论文标题

tablenet：从扫描文档图像中提取端到端表检测和表格数据的深度学习模型

TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images

论文作者

Paliwal, Shubham, D, Vishwanath, Rahul, Rohit, Sharma, Monika, Vig, Lovekesh

论文摘要

随着移动电话和扫描仪的广泛使用来拍摄和上传文档，需要提取被困在非结构化文档图像（例如零售收据，保险索赔表和财务发票）中的信息变得越来越严重。这个目标的一个主要障碍是，这些图像通常包含表格的形式，并从表格子图像中提取数据提出了一套独特的挑战。这包括对图像中表格区域的准确检测，然后从检测到的表的行和列中检测和提取信息。尽管在表检测中已经取得了一些进展，但提取表内容仍然是一个挑战，因为这涉及更细的谷物表结构（行和列）识别。先前的方法已尝试使用两个单独的模型独立地解决表检测和结构识别问题。在本文中，我们提出了Tablenet：一种用于表检测和结构识别的新型端到端深度学习模型。该模型利用了表检测的双任务和表结构识别的双重依赖性，以分割表和列区域。接下来是从确定的表格子区域中基于语义规则的行提取。在公开可用的ICDAR 2013和获得最新结果状态的Marmot Table数据集上评估了所提出的模型和提取方法。此外，我们证明了喂食其他语义特征进一步改善了模型性能，并且该模型在跨数据集中展示了转移学习。本文的另一个贡献是为Marmot数据提供其他表结构注释，目前仅具有表检测注释。

With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题