跨域文档对象检测：基准套件和方法

论文标题

跨域文档对象检测：基准套件和方法

Cross-Domain Document Object Detection: Benchmark Suite and Method

论文作者

Li, Kai, Wigington, Curtis, Tensmeyer, Chris, Zhao, Handong, Barmpalios, Nikolaos, Morariu, Vlad I., Manjunatha, Varun, Sun, Tong, Fu, Yun

论文摘要

将文档页面的图像分解为高级语义区域（例如，图，表，段落），文档对象检测（DOD）对于智能文档编辑和理解等下游任务是基础。 DOD仍然是一个具有挑战性的问题，因为文档对象在布局，尺寸，宽高比，纹理等方面有很大差异。实践中出现了另一个挑战，因为大型标记的培训数据集仅适用于与目标域不同的域。我们研究了跨域DOD，其中的目标是使用来自源域的标记数据学习目标域的检测器，并且仅来自目标域的未标记数据。来自两个领域的文档可能在布局，语言和类型上有很大差异。我们建立了一个由不同类型的PDF文档数据集组成的基准套件，可用于跨域DOD模型培训和评估。对于每个数据集，我们提供页面图像，边界框注释，PDF文件以及从PDF文件中提取的渲染层。此外，我们提出了一种新型的跨域DOD模型，该模型基于标准检测模型，并通过合并三个新型的对齐模块来解决域的转移：特征金字塔对准（FPA）模块，区域比对（RA）模块和渲染层对准（RLA）模块。基准套件上的广泛实验证实了这三个提出的模块的功效，而拟议的方法显着超过了基线方法。项目页面位于\ url {https://github.com/kailigo/cddod}。

Decomposing images of document pages into high-level semantic regions (e.g., figures, tables, paragraphs), document object detection (DOD) is fundamental for downstream tasks like intelligent document editing and understanding. DOD remains a challenging problem as document objects vary significantly in layout, size, aspect ratio, texture, etc. An additional challenge arises in practice because large labeled training datasets are only available for domains that differ from the target domain. We investigate cross-domain DOD, where the goal is to learn a detector for the target domain using labeled data from the source domain and only unlabeled data from the target domain. Documents from the two domains may vary significantly in layout, language, and genre. We establish a benchmark suite consisting of different types of PDF document datasets that can be utilized for cross-domain DOD model training and evaluation. For each dataset, we provide the page images, bounding box annotations, PDF files, and the rendering layers extracted from the PDF files. Moreover, we propose a novel cross-domain DOD model which builds upon the standard detection model and addresses domain shifts by incorporating three novel alignment modules: Feature Pyramid Alignment (FPA) module, Region Alignment (RA) module and Rendering Layer alignment (RLA) module. Extensive experiments on the benchmark suite substantiate the efficacy of the three proposed modules and the proposed method significantly outperforms the baseline methods. The project page is at \url{https://github.com/kailigo/cddod}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题