论文标题
使用基于行的相似性度量,从文档图像中从文档图像中检测到资源约束设备的表格结构
Tabular Structure Detection from Document Images for Resource Constrained Devices Using A Row Based Similarity Measure
论文作者
论文摘要
表格结构用于以结构化和清晰的方式呈现关键信息。对此类区域的检测对于正确理解文件至关重要。表格结构可以具有各种布局和类型。因此,对这些区域的检测是一个棘手的问题。大多数现有技术通过使用表结构的先验知识从文档图像中检测到表。但是,这些方法不适用于通用的表格结构。在这项工作中,我们提出了一个相似性度量,以在表格结构中找到一对行之间的相似性。这种相似性度量用于识别表格区域。由于检测到表格区域以利用所有行之间的相似性,因此该方法固有地独立于训练数据中存在的表格区域的布局。此外,提出的相似性度量可用于识别表格区域,而无需使用与最近基于深度学习的方法相关的大量参数。因此,所提出的方法可以轻松地与资源约束设备(例如移动设备)(而没有太多开销)中使用。
Tabular structures are used to present crucial information in a structured and crisp manner. Detection of such regions is of great importance for proper understanding of a document. Tabular structures can be of various layouts and types. Therefore, detection of these regions is a hard problem. Most of the existing techniques detect tables from a document image by using prior knowledge of the structures of the tables. However, these methods are not applicable for generalized tabular structures. In this work, we propose a similarity measure to find similarities between pairs of rows in a tabular structure. This similarity measure is utilized to identify a tabular region. Since the tabular regions are detected exploiting the similarities among all rows, the method is inherently independent of layouts of the tabular regions present in the training data. Moreover, the proposed similarity measure can be used to identify tabular regions without using large sets of parameters associated with recent deep learning based methods. Thus, the proposed method can easily be used with resource constrained devices such as mobile devices without much of an overhead.