通过建模抽象目标来预测下一个动作

论文标题

通过建模抽象目标来预测下一个动作

Predicting the Next Action by Modeling the Abstract Goal

论文作者

Roy, Debaditya, Fernando, Basura

论文摘要

预期人类行为的问题是一种固有的不确定问题。但是，如果我们对演员试图实现的目标有一种了解，我们可以减少这种不确定性。在这里，我们提出了一个动作预期模型，该模型利用目标信息，以减少未来预测中的不确定性。由于我们没有目标信息或推断期间观察到的动作，因此我们诉诸视觉表示，以封装有关动作和目标的信息。通过此，我们得出了一个名为“抽象目标”的新颖概念，该概念基于观察到的视觉特征序列进行动作预期。我们将抽象目标设计为一个分布，其参数是使用变异复发网络估算的。我们为下一个动作采样了多个候选人，并引入目标一致性度量，以确定从抽象目标中遵循的最佳候选人。我们的方法对非常具有挑战性的Epic-Kitchens55（EK55），EK100和EGTEA凝视+数据集获得了令人印象深刻的结果。对于TOP-1动词，TOP-1名词和TOP-1动作预期准确性，我们获得了+13.69，+11.24和+5.19的绝对改进，而EK55的可见厨房（S1）的最新方法的绝对改进。同样，我们还可以在Top-1动词（+10.75），名词（+5.84）和Action（+2.87）预期的Top-1动词（+10.75）设置的Uney Kitchens（S2）中获得显着改进。对于EGTEA凝视 +数据集，观察到类似的趋势，其中对于名词，动词和动作预期，获得了+9.9，+13.1和+6.8的绝对改进。通过本文的提交，我们的方法目前是EK55和EGTEA凝视+ https://competitions.codalab.org/competitions/20071#Results代码可在https://github.com./github.com.com./github.com.com.com/debadtorts-ggostrats-goalay/abstratpttermttermttermttermttermttermttern code of ek55和https://codalab.org/competitions/20071#Results代码中进行动作预期的新最新最新。

The problem of anticipating human actions is an inherently uncertain one. However, we can reduce this uncertainty if we have a sense of the goal that the actor is trying to achieve. Here, we present an action anticipation model that leverages goal information for the purpose of reducing the uncertainty in future predictions. Since we do not possess goal information or the observed actions during inference, we resort to visual representation to encapsulate information about both actions and goals. Through this, we derive a novel concept called abstract goal which is conditioned on observed sequences of visual features for action anticipation. We design the abstract goal as a distribution whose parameters are estimated using a variational recurrent network. We sample multiple candidates for the next action and introduce a goal consistency measure to determine the best candidate that follows from the abstract goal. Our method obtains impressive results on the very challenging Epic-Kitchens55 (EK55), EK100, and EGTEA Gaze+ datasets. We obtain absolute improvements of +13.69, +11.24, and +5.19 for Top-1 verb, Top-1 noun, and Top-1 action anticipation accuracy respectively over prior state-of-the-art methods for seen kitchens (S1) of EK55. Similarly, we also obtain significant improvements in the unseen kitchens (S2) set for Top-1 verb (+10.75), noun (+5.84) and action (+2.87) anticipation. Similar trend is observed for EGTEA Gaze+ dataset, where absolute improvement of +9.9, +13.1 and +6.8 is obtained for noun, verb, and action anticipation. It is through the submission of this paper that our method is currently the new state-of-the-art for action anticipation in EK55 and EGTEA Gaze+ https://competitions.codalab.org/competitions/20071#results Code available at https://github.com/debadityaroy/Abstract_Goal

下载PDF全文

下载文献需遵守相关版权规定

论文标题