论文标题
什么包括一个好的话题的视频?:调查和基准测试
What comprises a good talking-head video generation?: A Survey and Benchmark
论文作者
论文摘要
多年来,绩效评估在计算机视觉中已变得至关重要,从而在许多子场中实现了切实的进步。虽然说话头的视频生成已成为一个新兴的研究主题,但对该主题的现有评估却有许多局限性。例如,大多数方法使用人类受试者(例如,通过亚马逊MTURK)直接评估其研究主张。这种主观的评估很麻烦,不可复制,可能会刺激新研究的发展。在这项工作中,我们提出了一个精心设计的基准测试,用于评估使用标准化数据集预处理策略的说话头视频生成。至于评估,我们要么提出新的指标,要么选择最合适的指标来评估我们认为所需属性的结果,即良好的说话头视频,即保留身份,唇部同步,高视频质量和自然风度。通过对几种最先进的谈话生成方法进行周到的分析,我们旨在揭示当前方法的优点和缺点,并指出未来工作的有希望的方向。所有评估代码均可在以下网址提供:https://github.com/lelechen63/talking-head-generation-survey。
Over the years, performance evaluation has become essential in computer vision, enabling tangible progress in many sub-fields. While talking-head video generation has become an emerging research topic, existing evaluations on this topic present many limitations. For example, most approaches use human subjects (e.g., via Amazon MTurk) to evaluate their research claims directly. This subjective evaluation is cumbersome, unreproducible, and may impend the evolution of new research. In this work, we present a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies. As for evaluation, we either propose new metrics or select the most appropriate ones to evaluate results in what we consider as desired properties for a good talking-head video, namely, identity preserving, lip synchronization, high video quality, and natural-spontaneous motion. By conducting a thoughtful analysis across several state-of-the-art talking-head generation approaches, we aim to uncover the merits and drawbacks of current methods and point out promising directions for future work. All the evaluation code is available at: https://github.com/lelechen63/talking-head-generation-survey.