论文标题
实时场景文本检测具有可区分的二元化和自适应尺度融合
Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion
论文作者
论文摘要
最近,基于细分的场景文本检测方法在场景文本检测字段中引起了广泛的关注,因为它们在检测任意形状和极端纵横比的文本实例方面具有优势,从像素级描述中获利。但是,绝大多数现有的基于分割的方法仅限于它们复杂的后处理算法以及其分割模型的规模鲁棒性,在这种算法中,后处理算法不仅隔离到模型优化中,而且还可以通过直接融合多数尺度的功能图来隔离模型优化,还可以增强尺度鲁棒性。在本文中,我们提出了一个可区分的二进制化(DB)模块,该模块集成了二进制过程,这是后处理过程中最重要的步骤之一,即进入分割网络。分割网络与建议的DB模块一起优化,可以产生更准确的结果,从而通过简单的管道提高了文本检测的准确性。此外,提出了有效的自适应量表融合(ASF)模块,以通过自适应地融合不同尺度的特征来提高尺度鲁棒性。通过将提出的DB和ASF与分割网络合并,我们提出的场景文本检测器始终在五个标准基准上,在检测准确性和速度方面始终取得最新的结果。
Recently, segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field, because of their superiority in detecting the text instances of arbitrary shapes and extreme aspect ratios, profiting from the pixel-level descriptions. However, the vast majority of the existing segmentation-based approaches are limited to their complex post-processing algorithms and the scale robustness of their segmentation models, where the post-processing algorithms are not only isolated to the model optimization but also time-consuming and the scale robustness is usually strengthened by fusing multi-scale feature maps directly. In this paper, we propose a Differentiable Binarization (DB) module that integrates the binarization process, one of the most important steps in the post-processing procedure, into a segmentation network. Optimized along with the proposed DB module, the segmentation network can produce more accurate results, which enhances the accuracy of text detection with a simple pipeline. Furthermore, an efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively. By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.