剪辑网格：使用验证的图像文本模型从文本中生成纹理网格

论文标题

剪辑网格：使用验证的图像文本模型从文本中生成纹理网格

CLIP-Mesh: Generating textured meshes from text using pretrained image-text models

论文作者

Khalid, Nasir Mohammad, Xie, Tianhao, Belilovsky, Eugene, Popa, Tiberiu

论文摘要

我们提出了仅使用目标文本提示的3D模型零击生成的技术。在没有任何3D监督的情况下，我们的方法变形了极限细分表面的控制形状及其纹理图和正常地图，以获得与输入文本提示相对应的3D资产，并且可以轻松地部署到游戏或建模应用程序中。我们仅依靠预先训练的剪辑模型，该模型将输入文本提示与我们3D模型的渲染图像进行了分化。虽然先前的作品集中在风格化或需要对生成模型进行培训时，我们直接对网格参数进行优化，以生成形状，纹理或两者兼而有之。为了限制优化以产生合理的网格和纹理，我们使用图像增强量引入了许多技术，并使用预审计的先验，该技术在给定文本嵌入式的情况下生成了剪贴图像嵌入。

We present a technique for zero-shot generation of a 3D model using only a target text prompt. Without any 3D supervision our method deforms the control shape of a limit subdivided surface along with its texture map and normal map to obtain a 3D asset that corresponds to the input text prompt and can be easily deployed into games or modeling applications. We rely only on a pre-trained CLIP model that compares the input text prompt with differentiably rendered images of our 3D model. While previous works have focused on stylization or required training of generative models we perform optimization on mesh parameters directly to generate shape, texture or both. To constrain the optimization to produce plausible meshes and textures we introduce a number of techniques using image augmentations and the use of a pretrained prior that generates CLIP image embeddings given a text embedding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题