论文标题
知识指导学习:在零监督下朝着以零监督为中心行动识别
Knowledge Guided Learning: Towards Open Domain Egocentric Action Recognition with Zero Supervision
论文作者
论文摘要
深度学习的进步使得能够发展模型,这些模型表现出了非凡的倾向,甚至可以在视频中识别甚至本地化行动。但是,当面对最初的培训环境以外的场景或示例时,他们倾向于遇到错误。因此,他们无法适应新的域而没有大量注释数据的重大再培训。在本文中,我们建议通过将识别和推理的思想解耦,克服这些局限性。在格林纳的模式理论形式主义提供的基础上,我们表明,可以使用注意力和常识性知识在开放世界中的自我中心视频中对自我监督的新作用进行自我监督的发现,在开放环境中的数据中,来自观察到的环境(目标领域)(目标领域)的数据是开放的。我们表明,我们的方法可以通过零监督来推断和学习新颖的词汇分类和新颖的对象检测的开放词汇分类。广泛的实验表明,在开放世界条件下,在两个公开可用的以自我为中心的动作识别数据集(GTEA和GTEA凝视+)上表明了其竞争性能。
Advances in deep learning have enabled the development of models that have exhibited a remarkable tendency to recognize and even localize actions in videos. However, they tend to experience errors when faced with scenes or examples beyond their initial training environment. Hence, they fail to adapt to new domains without significant retraining with large amounts of annotated data. In this paper, we propose to overcome these limitations by moving to an open-world setting by decoupling the ideas of recognition and reasoning. Building upon the compositional representation offered by Grenander's Pattern Theory formalism, we show that attention and commonsense knowledge can be used to enable the self-supervised discovery of novel actions in egocentric videos in an open-world setting, where data from the observed environment (the target domain) is open i.e., the vocabulary is partially known and training examples (both labeled and unlabeled) are not available. We show that our approach can infer and learn novel classes for open vocabulary classification in egocentric videos and novel object detection with zero supervision. Extensive experiments show its competitive performance on two publicly available egocentric action recognition datasets (GTEA Gaze and GTEA Gaze+) under open-world conditions.