Unicolor：具有变压器的多模式着色的统一框架

论文标题

Unicolor：具有变压器的多模式着色的统一框架

UniColor: A Unified Framework for Multi-Modal Colorization with Transformer

论文作者

Huang, Zhitong, Zhao, Nanxuan, Liao, Jing

论文摘要

我们提出了第一个统一的框架Unicolor，以支持多种方式上的着色，包括无条件和条件的框架，例如中风，示例，文本，甚至是它们的混合。我们没有为每种类型的条件学习单独的模型，而是引入了一个两阶段的着色框架，将各种条件纳入单个模型。在第一阶段，多模式条件将转换为提示点的共同表示。特别是，我们提出了一种基于剪辑的新方法，将文本转换为提示点。在第二阶段，我们提出了一个基于变压器的网络，该网络由Chroma-vqgan和Hybrid-Transformer组成，以生成以提示点为条件的多样化和高质量的色彩结果。定性和定量比较都表明，我们的方法在每种控制方式中都优于最先进的方法，并进一步启用了以前不可行的多模式着色。此外，我们设计了一个交互式界面，显示了我们统一框架在实际用法中的有效性，包括自动着色，混合控制着色，局部再现和迭代色彩编辑。我们的代码和模型可在https://luckyhzt.github.io/unicolor上找到。

We propose the first unified framework UniColor to support colorization in multiple modalities, including both unconditional and conditional ones, such as stroke, exemplar, text, and even a mix of them. Rather than learning a separate model for each type of condition, we introduce a two-stage colorization framework for incorporating various conditions into a single model. In the first stage, multi-modal conditions are converted into a common representation of hint points. Particularly, we propose a novel CLIP-based method to convert the text to hint points. In the second stage, we propose a Transformer-based network composed of Chroma-VQGAN and Hybrid-Transformer to generate diverse and high-quality colorization results conditioned on hint points. Both qualitative and quantitative comparisons demonstrate that our method outperforms state-of-the-art methods in every control modality and further enables multi-modal colorization that was not feasible before. Moreover, we design an interactive interface showing the effectiveness of our unified framework in practical usage, including automatic colorization, hybrid-control colorization, local recolorization, and iterative color editing. Our code and models are available at https://luckyhzt.github.io/unicolor.

下载PDF全文

下载文献需遵守相关版权规定

论文标题