语义指导的多面罩图像协调

论文标题

语义指导的多面罩图像协调

Semantic-guided Multi-Mask Image Harmonization

论文作者

Ren, Xuqian, Liu, Yifan

论文摘要

先前的协调方法着重于基于输入掩码的图像中调整一个无量的区域。在处理不同语义区域的不同扰动时，他们可能会遇到问题，而没有可用的输入口罩。为了处理一个图像粘贴到来自不同图像的几个前景的问题，并且需要将它们朝着不同的域方向进行协调，而无需任何掩码作为输入，我们提出了一个新的语义引导的多掩码图像图像协调任务。与以前的单掩模图像协调任务不同，每个非火山图像都根据语义分割掩码的方式扰动不同的方法。分别基于$ 150 $和19美元的语义类别构建了两个具有挑战性的基准HSCENE和HLIP。此外，以前的基线专注于回归统一图像的每个像素的确切值。生成的结果在“黑匣子”中，无法编辑。在这项工作中，我们提出了一种新颖的方式来通过预测一系列操作员面具来编辑inharmonious图像。蒙版指示应用特定尺寸的亮度，饱和度和颜色的水平和位置。操作员面罩为用户提供了更大的灵活性，以进一步编辑图像。广泛的实验验证了基于操作员掩模的网络可以进一步改善那些最新的方法，这些方法在扰动是结构性时直接回归RGB图像。已经在我们的构造基准上进行了实验，以验证我们所提出的基于掩护的框架可以在更复杂的场景中定位和修改inharmonious区域。我们的代码和模型可在https://github.com/xuqianren/semantic-guided-multi-mask-image-harmonization.git上找到。

Previous harmonization methods focus on adjusting one inharmonious region in an image based on an input mask. They may face problems when dealing with different perturbations on different semantic regions without available input masks. To deal with the problem that one image has been pasted with several foregrounds coming from different images and needs to harmonize them towards different domain directions without any mask as input, we propose a new semantic-guided multi-mask image harmonization task. Different from the previous single-mask image harmonization task, each inharmonious image is perturbed with different methods according to the semantic segmentation masks. Two challenging benchmarks, HScene and HLIP, are constructed based on $150$ and $19$ semantic classes, respectively. Furthermore, previous baselines focus on regressing the exact value for each pixel of the harmonized images. The generated results are in the `black box' and cannot be edited. In this work, we propose a novel way to edit the inharmonious images by predicting a series of operator masks. The masks indicate the level and the position to apply a certain image editing operation, which could be the brightness, the saturation, and the color in a specific dimension. The operator masks provide more flexibility for users to edit the image further. Extensive experiments verify that the operator mask-based network can further improve those state-of-the-art methods which directly regress RGB images when the perturbations are structural. Experiments have been conducted on our constructed benchmarks to verify that our proposed operator mask-based framework can locate and modify the inharmonious regions in more complex scenes. Our code and models are available at https://github.com/XuqianRen/Semantic-guided-Multi-mask-Image-Harmonization.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题