全局表提取器（GTE）：使用视觉上下文的联合表识别和单元结构识别的框架

论文标题

全局表提取器（GTE）：使用视觉上下文的联合表识别和单元结构识别的框架

Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context

论文作者

Zheng, Xinyi, Burdick, Doug, Popa, Lucian, Zhong, Xu, Wang, Nancy Xin Ru

论文摘要

文档通常用于商业和科学中的知识共享和保存，其中是捕获大多数关键数据的表。不幸的是，大多数文档被存储并分布为PDF或扫描图像，这些图像无法保留逻辑表结构。已经提出了最近基于视觉的深度学习方法来解决这一差距，但大多数人仍然无法实现最新的结果。我们提出了全局表提取器（GTE），这是一个视觉引导的联合表检测系统框架和细胞结构化识别，可以在任何对象检测模型的顶部构建。使用GTE表，我们基于表的自然电池遏制限制来发明新的惩罚，以训练我们的表网络在细胞位置预测的帮助下。 GTE细胞是一个利用表样式的新的分层单元格检测网络。此外，我们设计了一种在现有文档中自动标记表和单元结构的方法，以便宜地创建大量的培训和测试数据。我们使用它来增强使用细胞标签的PubTabnet，并创建Fintabnet，现实世界以及复杂的科学和金融数据集，并具有详细的表结构注释，以帮助培训和测试结构识别。我们的框架超过了ICDAR 2013和ICDAR 2019桌面竞争的先前最新结果，在表检测和细胞结构识别中，完整表提取系统的提高了5.8％。进一步的实验表明，与我们新的室外FINTABNET中的香草视网膜对象检测模型相比，细胞结构识别率提高了45％。

Documents are often used for knowledge sharing and preservation in business and science, within which are tables that capture most of the critical data. Unfortunately, most documents are stored and distributed as PDF or scanned images, which fail to preserve logical table structure. Recent vision-based deep learning approaches have been proposed to address this gap, but most still cannot achieve state-of-the-art results. We present Global Table Extractor (GTE), a vision-guided systematic framework for joint table detection and cell structured recognition, which could be built on top of any object detection model. With GTE-Table, we invent a new penalty based on the natural cell containment constraint of tables to train our table network aided by cell location predictions. GTE-Cell is a new hierarchical cell detection network that leverages table styles. Further, we design a method to automatically label table and cell structure in existing documents to cheaply create a large corpus of training and test data. We use this to enhance PubTabNet with cell labels and create FinTabNet, real-world and complex scientific and financial datasets with detailed table structure annotations to help train and test structure recognition. Our framework surpasses previous state-of-the-art results on the ICDAR 2013 and ICDAR 2019 table competition in both table detection and cell structure recognition with a significant 5.8% improvement in the full table extraction system. Further experiments demonstrate a greater than 45% improvement in cell structure recognition when compared to a vanilla RetinaNet object detection model in our new out-of-domain FinTabNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题