副层：使用并联生成网络现实的图像着色

论文标题

副层：使用并联生成网络现实的图像着色

ParaColorizer: Realistic Image Colorization using Parallel Generative Networks

论文作者

Kumar, Himanshu, Banerjee, Abeer, Saurav, Sumeet, Singh, Sanjay

论文摘要

灰度图像着色是AI在信息恢复中的引人入胜的应用。该问题的本质性质固有的性质使其更具挑战性，因为输出可能是多模式的。目前正在使用的基于学习的方法为直接情况下可接受的结果，但通常在没有明确的图形分离的情况下无法恢复上下文信息。同样，由于在完整图像特征上训练的单个模型不足以学习各种数据模式，因此图像遭受了颜色出血和去饱和背景。为了解决这些问题，我们提出了一个基于GAN的配色框架。在我们的方法中，每个量身定制的GAN管道都会使前景（使用对象级特征）或背景（使用全图像功能）染色。前景管道采用了带有自我注意事项的残留无UNET作为其发电机，使用了全图像功能和可可数据集中的相应对象级特征训练。背景管道依赖于该位置数据集的全图像功能和其他培训示例。我们设计了一个基于密集的融合网络，通过基于特征的融合来获得最终的有色图像，该图像的基于特征的融合。我们显示了通常用于评估多模式问题（例如图像着色）并使用多个感知指标对我们的框架进行广泛的绩效评估的非感知评估指标的缺点。我们的方法的表现优于大多数基于学习的方法，并且产生的结果与最新的方法相当。此外，我们进行了运行时分析，并获得了每个图像的平均推理时间24ms。

Grayscale image colorization is a fascinating application of AI for information restoration. The inherently ill-posed nature of the problem makes it even more challenging since the outputs could be multi-modal. The learning-based methods currently in use produce acceptable results for straightforward cases but usually fail to restore the contextual information in the absence of clear figure-ground separation. Also, the images suffer from color bleeding and desaturated backgrounds since a single model trained on full image features is insufficient for learning the diverse data modes. To address these issues, we present a parallel GAN-based colorization framework. In our approach, each separately tailored GAN pipeline colorizes the foreground (using object-level features) or the background (using full-image features). The foreground pipeline employs a Residual-UNet with self-attention as its generator trained using the full-image features and the corresponding object-level features from the COCO dataset. The background pipeline relies on full-image features and additional training examples from the Places dataset. We design a DenseFuse-based fusion network to obtain the final colorized image by feature-based fusion of the parallelly generated outputs. We show the shortcomings of the non-perceptual evaluation metrics commonly used to assess multi-modal problems like image colorization and perform extensive performance evaluation of our framework using multiple perceptual metrics. Our approach outperforms most of the existing learning-based methods and produces results comparable to the state-of-the-art. Further, we performed a runtime analysis and obtained an average inference time of 24ms per image.

下载PDF全文

下载文献需遵守相关版权规定

论文标题