无意采取行动预测的自我监督学习

论文标题

无意采取行动预测的自我监督学习

Self-supervised Learning for Unintentional Action Prediction

论文作者

Zatsarynna, Olga, Farha, Yazan Abu, Gall, Juergen

论文摘要

区分动作是按预期执行的，还是预期的动作失败是人类不仅具有的重要技能，而且对于在人类环境中运行的智能系统也很重要。但是，由于缺乏带注释的数据，认识到一项行动是无意的还是预期的，如果操作是否会失败。尽管可以在互联网上发现无意或失败动作的视频，但高注释成本是学习网络的主要瓶颈。因此，在这项工作中，我们研究了无意采取行动预测的自我监督表示学习的问题。虽然先前的作品学习基于本地临时社区的表示形式，但我们表明，视频的全局上下文是为了学习三个下游任务的良好表示：无意的行动分类，本地化和预期。在补充材料中，我们表明，学到的表示形式也可用于检测视频中的异常情况。

Distinguishing if an action is performed as intended or if an intended action fails is an important skill that not only humans have, but that is also important for intelligent systems that operate in human environments. Recognizing if an action is unintentional or anticipating if an action will fail, however, is not straightforward due to lack of annotated data. While videos of unintentional or failed actions can be found in the Internet in abundance, high annotation costs are a major bottleneck for learning networks for these tasks. In this work, we thus study the problem of self-supervised representation learning for unintentional action prediction. While previous works learn the representation based on a local temporal neighborhood, we show that the global context of a video is needed to learn a good representation for the three downstream tasks: unintentional action classification, localization and anticipation. In the supplementary material, we show that the learned representation can be used for detecting anomalies in videos as well.

下载PDF全文

下载文献需遵守相关版权规定

论文标题