SwintextSpotter：通过文本检测和文本识别之间的更好协同作用的场景文本斑点

论文标题

SwintextSpotter：通过文本检测和文本识别之间的更好协同作用的场景文本斑点

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

论文作者

Huang, Mingxin, Liu, Yuliang, Peng, Zhenghao, Liu, Chongyu, Lin, Dahua, Zhu, Shenggao, Yuan, Nicholas, Ding, Kai, Jin, Lianwen

论文摘要

近年来，由于挖掘了现场文本检测和识别的内在协同作用，近年来，近年端到端的场景斑点引起了极大的关注。但是，最新的最新方法通常仅通过共享骨干而不直接利用这两个任务之间的特征交互来结合检测和识别。在本文中，我们提出了一个新的端到端场景文本斑点框架，称为SwintextSpotter。使用具有动态头部的变压器编码器作为检测器，我们将这两个任务统一了新型识别转换机制，以通过识别损失明确指导文本定位。直接的设计导致简洁的框架，该框架既不需要额外的整流模块，也不需要任意形状的文本的字符级注释。在多方向数据集的ROIC13和ICDAR 2015，任意形状的数据集和CTW1500的定性和定量实验，以及多语言数据集矩形（中文）和Vintext（越南）（越南人）表明，SwintExtSpotter表现出了SwintExtSpotter的现有方法的效率显着超过现有方法。代码可在https://github.com/mxin262/swintextspotter上找到。

End-to-end scene text spotting has attracted great attention in recent years due to the success of excavating the intrinsic synergy of the scene text detection and recognition. However, recent state-of-the-art methods usually incorporate detection and recognition simply by sharing the backbone, which does not directly take advantage of the feature interaction between the two tasks. In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter. Using a transformer encoder with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism to explicitly guide text localization through recognition loss. The straightforward design results in a concise framework that requires neither additional rectification module nor character-level annotation for the arbitrarily-shaped text. Qualitative and quantitative experiments on multi-oriented datasets RoIC13 and ICDAR 2015, arbitrarily-shaped datasets Total-Text and CTW1500, and multi-lingual datasets ReCTS (Chinese) and VinText (Vietnamese) demonstrate SwinTextSpotter significantly outperforms existing methods. Code is available at https://github.com/mxin262/SwinTextSpotter.

下载PDF全文

下载文献需遵守相关版权规定

论文标题