XMP-FONT：自我监督的交叉模式预训练，用于几杆字体一代

论文标题

XMP-FONT：自我监督的交叉模式预训练，用于几杆字体一代

XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation

论文作者

Liu, Wei, Liu, Fangyue, Ding, Fei, He, Qian, Yi, Zili

论文摘要

生成一个新的字体库是一项非常富裕的脚本的劳动密集型且耗时的工作。因此，几乎不需要字体生成，因为它在测试过程中只需要几个字形参考而无需进行微调。现有的方法遵循样式的符合性分解范式，并期望通过将参考字形的样式代码和来源的内容表示形式结合来产生新颖的字体。但是，这些少数字体生成方法无法捕获与内容无关的样式表示形式，或者采用局部组件样式表示，这不足以建模许多涉及超组件特征（例如组分间距和“连接式）”等超组件功能的中国字体样式。为了解决这些缺点并使样式表示更可靠，我们提出了一种自我保护的交叉模式预训练策略和基于跨模式变压器的编码器，该编码器与Glyph图像和相应的冲程标签共同条件。交叉模式编码器以自我监督的方式进行了预训练，以有效捕获跨模式和模式内相关性，这有助于所有尺度（Stroke-Level，组件级别和角色级别和角色级别和角色级别和角色级别）的内容样式的分离和建模样式表示。然后将预训练的编码器应用于下游字体生成任务，而无需微调。我们方法与最先进方法的实验比较证明了我们的方法成功地传输了所有尺度的样式。此外，它仅需要一个参考字形，并在几杆字体生成任务中达到最低的不良案例率，比第二好的第二好28％

Generating a new font library is a very labor-intensive and time-consuming job for glyph-rich scripts. Few-shot font generation is thus required, as it requires only a few glyph references without fine-tuning during test. Existing methods follow the style-content disentanglement paradigm and expect novel fonts to be produced by combining the style codes of the reference glyphs and the content representations of the source. However, these few-shot font generation methods either fail to capture content-independent style representations, or employ localized component-wise style representations, which is insufficient to model many Chinese font styles that involve hyper-component features such as inter-component spacing and "connected-stroke". To resolve these drawbacks and make the style representations more reliable, we propose a self-supervised cross-modality pre-training strategy and a cross-modality transformer-based encoder that is conditioned jointly on the glyph image and the corresponding stroke labels. The cross-modality encoder is pre-trained in a self-supervised manner to allow effective capture of cross- and intra-modality correlations, which facilitates the content-style disentanglement and modeling style representations of all scales (stroke-level, component-level and character-level). The pre-trained encoder is then applied to the downstream font generation task without fine-tuning. Experimental comparisons of our method with state-of-the-art methods demonstrate our method successfully transfers styles of all scales. In addition, it only requires one reference glyph and achieves the lowest rate of bad cases in the few-shot font generation task 28% lower than the second best

下载PDF全文

下载文献需遵守相关版权规定

论文标题