论文标题

高斯限制了场景文本识别的注意力网络

Gaussian Constrained Attention Network for Scene Text Recognition

论文作者

Qiao, Zhi, Qin, Xugong, Zhou, Yu, Yang, Fei, Wang, Weiping

论文摘要

场景文本识别一直是计算机视觉中的热门话题。最近的方法采用了序列预测的注意机制,从而实现了令人信服的结果。但是,我们认为现有的注意机制面临着关注扩散的问题,在该问题中,模型可能不关注某个角色领域。在本文中,我们提出了高斯受限的注意网络来解决这个问题。这是一种基于2D注意的方法,该方法与新型高斯约束改进模块集成在一起,该模块预测了额外的高斯面膜以优化注意力重量。与简单地对注意力的额外监督相比,我们提出的方法引入了明确的改进。这样,注意力的重量将变得更加集中,基于注意力的识别网络可实现更好的性能。所提出的高斯约束改进模块是灵活的,可以直接应用于现有的基于注意的方法。几个基准数据集的实验证明了我们提出的方法的有效性。我们的代码已在https://github.com/pay20y/gcan上找到。

Scene text recognition has been a hot topic in computer vision. Recent methods adopt the attention mechanism for sequence prediction which achieve convincing results. However, we argue that the existing attention mechanism faces the problem of attention diffusion, in which the model may not focus on a certain character area. In this paper, we propose Gaussian Constrained Attention Network to deal with this problem. It is a 2D attention-based method integrated with a novel Gaussian Constrained Refinement Module, which predicts an additional Gaussian mask to refine the attention weights. Different from adopting an additional supervision on the attention weights simply, our proposed method introduces an explicit refinement. In this way, the attention weights will be more concentrated and the attention-based recognition network achieves better performance. The proposed Gaussian Constrained Refinement Module is flexible and can be applied to existing attention-based methods directly. The experiments on several benchmark datasets demonstrate the effectiveness of our proposed method. Our code has been available at https://github.com/Pay20Y/GCAN.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源