论文标题

GFTE:基于图的财务表提取

GFTE: Graph-based Financial Table Extraction

论文作者

Li, Yiren, Huang, Zheng, Yan, Junchi, Zhou, Yi, Ye, Fan, Liu, Xianhui

论文摘要

表达式数据是信息表达的一种至关重要的形式,可以在标准结构中组织数据,以便于信息检索和比较。但是,在金融业和许多其他领域表中,通常在非结构化数字文件中披露,例如便携式文档格式(PDF)和图像,很难直接提取。在本文中,为了促进从非结构化数字文件中提取基于深度学习的表格,我们发布了一个名为FINTAB的标准中文数据集,该数据集包含1,600多种不同类型的财务表及其在JSON中的相应结构表示。此外,我们提出了一种新型的基于图的卷积神经网络模型,名为GFTE作为将来比较的基线。 GFTE将图像功能,位置功能和文本功能集成在一起,以进行精确的边缘预测,并达到总体良好的结果。

Tabular data is a crucial form of information expression, which can organize data in a standard structure for easy information retrieval and comparison. However, in financial industry and many other fields tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images, which are difficult to be extracted directly. In this paper, to facilitate deep learning based table extraction from unstructured digital files, we publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds and their corresponding structure representation in JSON. In addition, we propose a novel graph-based convolutional neural network model named GFTE as a baseline for future comparison. GFTE integrates image feature, position feature and textual feature together for precise edge prediction and reaches overall good results.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源