资产：高分辨率的变压器的自回归语义场景编辑

论文标题

资产：高分辨率的变压器的自回归语义场景编辑

ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions

论文作者

Liu, Difan, Shetty, Sandesh, Hinz, Tobias, Fisher, Matthew, Zhang, Richard, Park, Taesung, Kalogerakis, Evangelos

论文摘要

我们提供资产，这是一种神经体系结构，用于根据用户在其语义分割图上的编辑自动修改输入高分辨率图像。我们的架构基于具有新颖的注意机制的变压器。我们的关键思想是在高分辨率下以高分辨率的指导下，在较低图像分辨率下提取了高分辨率。尽管以前的注意机制在计算上对于处理高分辨率图像的计算太昂贵，或者在特定图像区域内过度约束，阻碍了长期相互作用，但我们的新型注意力机制既有效又有效。我们的稀少注意机制能够捕获长期的相互作用和环境，从而导致场景中的有趣现象，例如与其他景观一致的景观对水或植物群的反射，这些景观是无法可靠地与先前的Convnets和Transformer方法可靠产生的。我们提出了定性和定量结果以及用户研究，证明了我们方法的有效性。

We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map. Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions. While previous attention mechanisms are computationally too expensive for handling high-resolution images or are overly constrained within specific image regions hampering long-range interactions, our novel attention mechanism is both computationally efficient and effective. Our sparsified attention mechanism is able to capture long-range interactions and context, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or flora consistent with the rest of the landscape, that were not possible to generate reliably with previous convnets and transformer approaches. We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题