论文标题
APRNET:基于注意力的像素渲染网络,用于照片真实文本图像生成
APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic Text Image Generation
论文作者
论文摘要
样式引导的文本图像生成试图通过模仿参考图像的外观来综合文本图像,同时保持文本内容不变。文本图像外观包括许多方面。在本文中,我们专注于将样式图像的背景和前景颜色模式转移到内容图像中,以生成照片真实的文本图像。为了实现这一目标,我们提出了1)一种内容风格的基于注意力的像素采样方法,以大致模仿样式文本图像的背景; 2)像素风格的调制技术,可将风格图像的不同颜色模式传递到内容图像空间适时; 3)基于跨注意的多尺度样式融合方法,用于求解样式和内容图像之间的文本前景未对准问题; 4)图像补丁策略,以创建样式,内容和地面真相图像元素进行训练。与Scut-HCCDOC和CASIA-OLHWDB数据集有关中国手写文本图像合成的实验结果表明,所提出的方法可以提高合成文本图像的质量,并使它们更加现实。
Style-guided text image generation tries to synthesize text image by imitating reference image's appearance while keeping text content unaltered. The text image appearance includes many aspects. In this paper, we focus on transferring style image's background and foreground color patterns to the content image to generate photo-realistic text image. To achieve this goal, we propose 1) a content-style cross attention based pixel sampling approach to roughly mimicking the style text image's background; 2) a pixel-wise style modulation technique to transfer varying color patterns of the style image to the content image spatial-adaptively; 3) a cross attention based multi-scale style fusion approach to solving text foreground misalignment issue between style and content images; 4) an image patch shuffling strategy to create style, content and ground truth image tuples for training. Experimental results on Chinese handwriting text image synthesis with SCUT-HCCDoc and CASIA-OLHWDB datasets demonstrate that the proposed method can improve the quality of synthetic text images and make them more photo-realistic.