Relatext：使用图形卷积网络利用视觉关系进行任意形状的场景检测

论文标题

Relatext：使用图形卷积网络利用视觉关系进行任意形状的场景检测

ReLaText: Exploiting Visual Relationships for Arbitrary-Shaped Scene Text Detection with Graph Convolutional Networks

论文作者

Ma, Chixiang, Sun, Lei, Zhong, Zhuoyao, Huo, Qiang

论文摘要

我们通过将文本检测作为视觉关系检测问题提出，介绍了一种名为Relatext的新的任意形状的文本检测方法。为了证明这种新公式的有效性，我们首先使用“链接”关系来解决具有挑战性的文本线分组问题。关键思想是将文本检测分解为两个子问题，即对文本原语的检测以及对附近文本原始对之间的链接关系的预测。具体而言，首先使用了基于无锚的区域提案网络的文本检测器，用于检测特征金字塔网络的不同特征图的不同尺度的文本原始图，从中，通过链接从同一特征图中检测到的附近的每对附近的文本原始图来构建文本原始图。然后，基于图形卷积网络（GCN）的链接关系预测模块用于修剪文本原始图中错误链接的边缘，以生成许多脱节子图，每个子图都代表检测到的文本实例。由于GCN可以有效利用上下文信息来提高链接预测准确性，因此，基于GCN的文本线分组方法比以前的文本线分组方法可以实现更好的文本检测准确性，尤其是在处理具有大型跨间隔或非常小的跨线间距的文本实例时。因此，拟议中的Relatext可以在五个公共文本检测基准（即RCTW-17，MSRA-TD500，Total-Text，CTW1500和DAST1500）上实现最先进的性能。

We introduce a new arbitrary-shaped text detection approach named ReLaText by formulating text detection as a visual relationship detection problem. To demonstrate the effectiveness of this new formulation, we start from using a "link" relationship to address the challenging text-line grouping problem firstly. The key idea is to decompose text detection into two subproblems, namely detection of text primitives and prediction of link relationships between nearby text primitive pairs. Specifically, an anchor-free region proposal network based text detector is first used to detect text primitives of different scales from different feature maps of a feature pyramid network, from which a text primitive graph is constructed by linking each pair of nearby text primitives detected from a same feature map with an edge. Then, a Graph Convolutional Network (GCN) based link relationship prediction module is used to prune wrongly-linked edges in the text primitive graph to generate a number of disjoint subgraphs, each representing a detected text instance. As GCN can effectively leverage context information to improve link prediction accuracy, our GCN based text-line grouping approach can achieve better text detection accuracy than previous text-line grouping methods, especially when dealing with text instances with large inter-character or very small inter-line spacings. Consequently, the proposed ReLaText achieves state-of-the-art performance on five public text detection benchmarks, namely RCTW-17, MSRA-TD500, Total-Text, CTW1500 and DAST1500.

下载PDF全文

下载文献需遵守相关版权规定

论文标题