分层和渐进的图像垫

论文标题

分层和渐进的图像垫

Hierarchical and Progressive Image Matting

论文作者

Qiao, Yu, Liu, Yuhao, Wei, Ziqi, Wang, Yuxin, Cai, Qiang, Zhang, Guofeng, Yang, Xin

论文摘要

大多数Matting研究都采用先进的语义来实现高质量的Alpha哑光，并且通常会探索直接的低级功能组合以补充α细节。但是，我们认为外观不足的整合只能提供有偏见的前景细节，而α哑光需要不同级别的特征聚合，以更好地通过像素的不透明度感知。在本文中，我们提出了一个端到端的层次结构和渐进的注意力疗程网络（Hattmatting ++），该网络可以更好地预测单个RGB图像中前景的不透明度，而无需其他输入。具体而言，我们利用渠道的注意力来提高锥体特征，并在不同级别上采用空间注意力来滤波出现线索。这种渐进的注意机制可以从适应性语义和语义指示的边界中估算α哑光。我们还引入了混合损耗函数，融合结构相似性（SSIM），均方根误差（MSE），对抗性损失和哨兵监督，以指导网络进一步改善整体前景结构。此外，我们构建了一个大规模且具有挑战性的图像垫数据集，该数据集由59个，600个培训图像和1000张测试图像（总共646个不同的前景α哑光）组成，可以进一步改善我们的层次和渐进式汇总模型的鲁棒性。广泛的实验表明，提出的Hattmatting ++可以捕获复杂的前景结构，并以单个RGB图像作为输入来实现最新性能。

Most matting researches resort to advanced semantics to achieve high-quality alpha mattes, and direct low-level features combination is usually explored to complement alpha details. However, we argue that appearance-agnostic integration can only provide biased foreground details and alpha mattes require different-level feature aggregation for better pixel-wise opacity perception. In this paper, we propose an end-to-end Hierarchical and Progressive Attention Matting Network (HAttMatting++), which can better predict the opacity of the foreground from single RGB images without additional input. Specifically, we utilize channel-wise attention to distill pyramidal features and employ spatial attention at different levels to filter appearance cues. This progressive attention mechanism can estimate alpha mattes from adaptive semantics and semantics-indicated boundaries. We also introduce a hybrid loss function fusing Structural SIMilarity (SSIM), Mean Square Error (MSE), Adversarial loss, and sentry supervision to guide the network to further improve the overall foreground structure. Besides, we construct a large-scale and challenging image matting dataset comprised of 59, 600 training images and 1000 test images (a total of 646 distinct foreground alpha mattes), which can further improve the robustness of our hierarchical and progressive aggregation model. Extensive experiments demonstrate that the proposed HAttMatting++ can capture sophisticated foreground structures and achieve state-of-the-art performance with single RGB images as input.

下载PDF全文

下载文献需遵守相关版权规定

论文标题