玻璃：全球至当地的关注场景文本发现

论文标题

玻璃：全球至当地的关注场景文本发现

GLASS: Global to Local Attention for Scene-Text Spotting

论文作者

Ronen, Roi, Tsiper, Shahar, Anschel, Oron, Lavi, Inbal, Markovitz, Amir, Manmatha, R.

论文摘要

近年来，文本斑点的主要范式是将文本检测和识别的任务结合到一个端到端的框架中。在此范式下，这两个任务都是通过从输入图像中提取的共享全局特征图操作来完成的。端到端方法面临的主要挑战之一是识别跨音阶变化（较小或较大的文本）和任意单词旋转角的文本时的性能退化。在这项工作中，我们通过提出一种新型的全球到本地关注机制来解决这些挑战，用于文本斑点，称为玻璃，将全球和本地特征融合在一起。全局功能是从共享骨干线中提取的，从整个图像中保留上下文信息，而本地功能则在调整大小的高分辨率旋转的单词作物上单独计算。从当地农作物中提取的信息减轻了尺度和单词旋转的许多固有困难。我们在尺度和角度展示了性能分析，突出了尺度和角度的肢体的改善。此外，我们引入了一个方向感知的损失项，以监督检测任务，并显示其对所有角度的检测和识别性能的贡献。最后，我们通过将玻璃纳入其他领先的文本斑点架构，改善其文本斑点性能来表明玻璃是一般的。我们的方法在包括新发布的Textocr在内的多个基准上实现了最先进的结果。

In recent years, the dominant paradigm for text spotting is to combine the tasks of text detection and recognition into a single end-to-end framework. Under this paradigm, both tasks are accomplished by operating over a shared global feature map extracted from the input image. Among the main challenges that end-to-end approaches face is the performance degradation when recognizing text across scale variations (smaller or larger text), and arbitrary word rotation angles. In this work, we address these challenges by proposing a novel global-to-local attention mechanism for text spotting, termed GLASS, that fuses together global and local features. The global features are extracted from the shared backbone, preserving contextual information from the entire image, while the local features are computed individually on resized, high-resolution rotated word crops. The information extracted from the local crops alleviates much of the inherent difficulties with scale and word rotation. We show a performance analysis across scales and angles, highlighting improvement over scale and angle extremities. In addition, we introduce an orientation-aware loss term supervising the detection task, and show its contribution to both detection and recognition performance across all angles. Finally, we show that GLASS is general by incorporating it into other leading text spotting architectures, improving their text spotting performance. Our method achieves state-of-the-art results on multiple benchmarks, including the newly released TextOCR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题