论文标题
视频动作理解
Video Action Understanding
论文作者
论文摘要
许多人认为,深度学习对图像理解问题的成功可以在视频理解的领域复制。但是,由于视频的规模和时间性质,视频理解问题和提议的深度学习解决方案的跨度比其2D图像兄弟姐妹的范围更宽,更多样化。在这个新兴和迅速发展的领域中,发现,识别和预测动作是一些最突出的任务。本教程以教学的重点介绍了监督视频动作理解中的基本主题,基本概念和著名示例,并将其系统化。具体来说,我们阐明了动作问题的分类法,目录和突出显示视频数据集,描述常见的视频数据准备方法,介绍最先进的深度学习模型体系结构的基础,并将特定于域的特定于域的指标与基线拟议的解决方案化。本教程旨在通用计算机科学受众访问,并对监督学习具有概念性的理解。
Many believe that the successes of deep learning on image understanding problems can be replicated in the realm of video understanding. However, due to the scale and temporal nature of video, the span of video understanding problems and the set of proposed deep learning solutions is arguably wider and more diverse than those of their 2D image siblings. Finding, identifying, and predicting actions are a few of the most salient tasks in this emerging and rapidly evolving field. With a pedagogical emphasis, this tutorial introduces and systematizes fundamental topics, basic concepts, and notable examples in supervised video action understanding. Specifically, we clarify a taxonomy of action problems, catalog and highlight video datasets, describe common video data preparation methods, present the building blocks of state-of-the art deep learning model architectures, and formalize domain-specific metrics to baseline proposed solutions. This tutorial is intended to be accessible to a general computer science audience and assumes a conceptual understanding of supervised learning.