论文标题
SIMAN:通过相似性探索场景文本学习的自我监督表示形式 -
SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization
论文作者
论文摘要
最近,自我监督的表示学习吸引了现场文本识别社区的广泛关注。与以前使用对比度学习的研究不同,我们从替代角度(即通过生成方式制定代表性学习方案)来解决该问题。通常,一条文本行之间的相邻图像补丁往往具有相似的样式,包括笔触,纹理,颜色等。受此常识的动机,我们增强了一个图像补丁,并使用其相邻的补丁作为指导来恢复自身。具体而言,我们提出了一个相似性意识的归一化(SIMAN)模块,以识别不同的模式并从引导贴片中对应相应的样式。通过这种方式,网络具有区分复杂模式(例如混乱的笔触和混乱背景)的表示能力。实验表明,拟议的Siman显着提高了表示质量并实现了有希望的性能。此外,我们出人意料地发现,我们的自我监管的生成网络具有令人印象深刻的数据综合,文本图像编辑和字体插值的潜力,这表明所提出的Siman具有广泛的实用应用。
Recently self-supervised representation learning has drawn considerable attention from the scene text recognition community. Different from previous studies using contrastive learning, we tackle the issue from an alternative perspective, i.e., by formulating the representation learning scheme in a generative manner. Typically, the neighboring image patches among one text line tend to have similar styles, including the strokes, textures, colors, etc. Motivated by this common sense, we augment one image patch and use its neighboring patch as guidance to recover itself. Specifically, we propose a Similarity-Aware Normalization (SimAN) module to identify the different patterns and align the corresponding styles from the guiding patch. In this way, the network gains representation capability for distinguishing complex patterns such as messy strokes and cluttered backgrounds. Experiments show that the proposed SimAN significantly improves the representation quality and achieves promising performance. Moreover, we surprisingly find that our self-supervised generative network has impressive potential for data synthesis, text image editing, and font interpolation, which suggests that the proposed SimAN has a wide range of practical applications.