基于全球级别和单词级别的实时场景文本检测

论文标题

基于全球级别和单词级别的实时场景文本检测

Real-time Scene Text Detection Based on Global Level and Word Level Features

论文作者

Zhao, Fuqiang, Yu, Jionghua, Xing, Enjun, Song, Wenming, Xu, Xue

论文摘要

以高精度和效率检测自然场景中的任意形状文本是一项极具挑战性的任务。在本文中，我们提出了一个场景文本检测框架，即GWNET，该框架主要包括两个模块：全局模块和RCNN模块。具体而言，全局模块通过添加k subsodule和shift subsodule来改善DB（可区分二进制）模块的自适应性能。两个子模型增强了扩增因子K的适应性，加速了模型的收敛性，并有助于产生更准确的检测结果。 RCNN模块融合了全局级别和单词级功能。单词级标签是通过获得缩小多边形的最小轴对准矩形盒来生成的。在推理期间，GWNET仅使用全局级特征来输出简单的多边形检测。在四个基准数据集上进行的实验，包括MSRA-TD500，Teal-Text，ICDAR2015和CTW-1500，这表明我们的GWNET表现优于最先进的检测器。具体而言，在Resnet-50的骨架上，我们在MSRA-TD500上获得了88.6％的F量，总文本率为87.9％，ICDAR2015的F量为89.2％，CTW-1500的F量为87.2％。

It is an extremely challenging task to detect arbitrary shape text in natural scenes on high accuracy and efficiency. In this paper, we propose a scene text detection framework, namely GWNet, which mainly includes two modules: Global module and RCNN module. Specifically, Global module improves the adaptive performance of the DB (Differentiable Binarization) module by adding k submodule and shift submodule. Two submodules enhance the adaptability of amplifying factor k, accelerate the convergence of models and help to produce more accurate detection results. RCNN module fuses global-level and word-level features. The word-level label is generated by obtaining the minimum axis-aligned rectangle boxes of the shrunk polygon. In the inference period, GWNet only uses global-level features to output simple polygon detections. Experiments on four benchmark datasets, including the MSRA-TD500, Total-Text, ICDAR2015 and CTW-1500, demonstrate that our GWNet outperforms the state-of-the-art detectors. Specifically, with a backbone of ResNet-50, we achieve an F-measure of 88.6% on MSRA- TD500, 87.9% on Total-Text, 89.2% on ICDAR2015 and 87.5% on CTW-1500.

下载PDF全文

下载文献需遵守相关版权规定

论文标题