从代表到推理：朝着视频提问的证据和常识性推理迈

论文标题

从代表到推理：朝着视频提问的证据和常识性推理迈

From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

论文作者

Li, Jiangtong, Niu, Li, Zhang, Liqing

论文摘要

视频理解在表示学习方面取得了巨大的成功，例如视频字幕，视频对象接地和视频描述性问题解答。但是，当前的方法仍在视频推理上挣扎，包括证据推理和常识性推理。为了促进对视频推理的更深入的视频理解，我们介绍了因果关系的任务，其中包括四种类型的问题，包括场景描述（描述）到证据推理（说明）和常识性推理（预测和反事实）。对于常识性推理，我们通过回答问题并提供适当的原因来建立一个两步解决方案。通过对现有VideoQA方法的广泛实验，我们发现最先进的方法在描述中很强，但推理却很弱。我们希望因果关系可以指导从表示学习到更深入推理的视频理解的研究。数据集和相关资源可在\ url {https://github.com/bcmi/causal-vidqa.git}中获得。

Video understanding has achieved great success in representation learning, such as video caption, video object grounding, and video descriptive question-answer. However, current methods still struggle on video reasoning, including evidence reasoning and commonsense reasoning. To facilitate deeper video understanding towards video reasoning, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual). For commonsense reasoning, we set up a two-step solution by answering the question and providing a proper reason. Through extensive experiments on existing VideoQA methods, we find that the state-of-the-art methods are strong in descriptions but weak in reasoning. We hope that Causal-VidQA can guide the research of video understanding from representation learning to deeper reasoning. The dataset and related resources are available at \url{https://github.com/bcmi/Causal-VidQA.git}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题