参考图像垫子

论文标题

参考图像垫子

Referring Image Matting

论文作者

Li, Jizhizi, Zhang, Jing, Tao, Dacheng

论文摘要

不同的是，与传统的图像垫子不同，这要么需要用户定义的涂鸦/内饰来提取特定的前景对象，要么直接提取图像中的所有前景对象，我们介绍了本文中的一个新任务，旨在提取自然图像的特定对象，从而符合自然语言的特定对象。首先，我们通过设计全面的图像组成和表达式生成引擎来自动生产高质量的图像以及基于公共数据集的不同文本属性来建立一个大规模的挑战数据集refmatte。 Refmatte由230个对象类别，47,500张图像，118,749个表达区域和474,996个表达式组成。此外，我们构建了一个具有100个高分辨率自然图像的现实世界测试集，并手动注释复杂短语，以评估RIM方法的室外概括能力。此外，我们为轮辋提供了一种新颖的基线方法剪贴画，包括上下文插入的提示，文本驱动的语义弹出窗口和多层细节提取器。对关键字和表达设置中的Refmatte进行的广泛实验验证了夹具优于代表性方法的优越性。我们希望这项工作可以提供有关图像含量的新见解，并鼓励更多的后续研究。数据集，代码和模型可在https://github.com/jizhizili/rim上找到。

Different from conventional image matting, which either requires user-defined scribbles/trimap to extract a specific foreground object or directly extracts all the foreground objects in the image indiscriminately, we introduce a new task named Referring Image Matting (RIM) in this paper, which aims to extract the meticulous alpha matte of the specific object that best matches the given natural language description, thus enabling a more natural and simpler instruction for image matting. First, we establish a large-scale challenging dataset RefMatte by designing a comprehensive image composition and expression generation engine to automatically produce high-quality images along with diverse text attributes based on public datasets. RefMatte consists of 230 object categories, 47,500 images, 118,749 expression-region entities, and 474,996 expressions. Additionally, we construct a real-world test set with 100 high-resolution natural images and manually annotate complex phrases to evaluate the out-of-domain generalization abilities of RIM methods. Furthermore, we present a novel baseline method CLIPMat for RIM, including a context-embedded prompt, a text-driven semantic pop-up, and a multi-level details extractor. Extensive experiments on RefMatte in both keyword and expression settings validate the superiority of CLIPMat over representative methods. We hope this work could provide novel insights into image matting and encourage more follow-up studies. The dataset, code and models are available at https://github.com/JizhiziLi/RIM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题