人性化：在3D场景中具有语言条件的人类运动产生

论文标题

人性化：在3D场景中具有语言条件的人类运动产生

HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes

论文作者

Wang, Zan, Chen, Yixin, Liu, Tengyu, Zhu, Yixin, Liang, Wei, Huang, Siyuan

论文摘要

由于现有数据集的平庸特征（HSI）的平庸特征（HSI），学会在3D场景中产生各种场景感知和面向目标的人类动作仍然具有挑战性；他们只有有限的规模/质量，缺乏语义。为了填补空白，我们提出了一个大规模和语义丰富的合成HSI数据集，称为人性化，通过将捕获的人类运动序列与各种3D室内场景对齐。我们用语言描述自动注释对齐动作，这些动作描述了场景中的动作和独特的交互对象；例如，坐在桌子附近的扶手椅上。因此，人性化可以在3D场景中实现新一代任务，具有语言条件的人类运动生成。提出的任务具有挑战性，因为它需要对3D场景，人类运动和自然语言进行联合建模。为了解决这项任务，我们提出了一种新颖的场景和语言条件生成模型，该模型可以产生与指定对象相互作用的理想动作的3D人体动作。我们的实验表明，我们的模型在3D场景中产生了多样化和语义上一致的人类动作。

Learning to generate diverse scene-aware and goal-oriented human motions in 3D scenes remains challenging due to the mediocre characteristics of the existing datasets on Human-Scene Interaction (HSI); they only have limited scale/quality and lack semantics. To fill in the gap, we propose a large-scale and semantic-rich synthetic HSI dataset, denoted as HUMANISE, by aligning the captured human motion sequences with various 3D indoor scenes. We automatically annotate the aligned motions with language descriptions that depict the action and the unique interacting objects in the scene; e.g., sit on the armchair near the desk. HUMANISE thus enables a new generation task, language-conditioned human motion generation in 3D scenes. The proposed task is challenging as it requires joint modeling of the 3D scene, human motion, and natural language. To tackle this task, we present a novel scene-and-language conditioned generative model that can produce 3D human motions of the desirable action interacting with the specified objects. Our experiments demonstrate that our model generates diverse and semantically consistent human motions in 3D scenes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题