论文标题
姿势引导的人类动画来自野外的单个图像
Pose-Guided Human Animation from a Single Image in the Wild
论文作者
论文摘要
我们提出了一种新的姿势转移方法,用于从一个由身体姿势序列控制的人的单个图像中综合人类动画。现有的姿势转移方法在应用于新颖的场景时表现出重要的视觉伪像,从而导致时间不一致和保留人的身份和纹理的失败。为了解决这些局限性,我们设计了一个组成神经网络,可预测轮廓,服装标签和纹理。每个模块化网络都明确专用于可以从合成数据中学到的子任务。在推理时,我们利用训练有素的网络在紫外线坐标中产生统一的外观及其标签,这在整个姿势之间保持恒定。统一表示为响应姿势变化而产生外观不完整而强大的指导。我们使用训练有素的网络来完成外观并在背景下渲染。有了这些策略,我们能够综合人类动画,这些动画可以以时间连贯的方式保留人的身份和外观,而不会在测试场景中对网络进行任何微调。实验表明,我们的方法在综合质量,时间连贯性和概括能力方面优于最先进的方法。
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses. Existing pose transfer methods exhibit significant visual artifacts when applying to a novel scene, resulting in temporal inconsistency and failures in preserving the identity and textures of the person. To address these limitations, we design a compositional neural network that predicts the silhouette, garment labels, and textures. Each modular network is explicitly dedicated to a subtask that can be learned from the synthetic data. At the inference time, we utilize the trained network to produce a unified representation of appearance and its labels in UV coordinates, which remains constant across poses. The unified representation provides an incomplete yet strong guidance to generating the appearance in response to the pose change. We use the trained network to complete the appearance and render it with the background. With these strategies, we are able to synthesize human animations that can preserve the identity and appearance of the person in a temporally coherent way without any fine-tuning of the network on the testing scene. Experiments show that our method outperforms the state-of-the-arts in terms of synthesis quality, temporal coherence, and generalization ability.