论文标题

以自我为中心的对象操纵图

Egocentric Object Manipulation Graphs

论文作者

Dessalene, Eadom, Maynord, Michael, Devaraj, Chinmaya, Fermuller, Cornelia, Aloimonos, Yiannis

论文摘要

我们介绍了以自我为中心的对象操纵图(EGO-omg) - 一种用于活动建模和预期的新颖表示,对未来的动作整合了三个组成部分:1)活动的语义时间结构,2)短期动力学和3)外观表示。语义时间结构是通过图模型的,该图是通过图形卷积网络嵌入的,该网络的状态模型特征以及手与对象之间的关系。这些状态表示源自所有三个级别的抽象级别,并通过手动对象触点的制作和破坏来界定。短期动力学以两种方式建模:a)通过3D卷积,b)通过预测手轨迹的时空终点,双手与物体接触。外观是通过通过现有方法产生的深时空特征对外观进行建模的。我们注意到,在Ego-omg中,可以简单地交换这些外观特征,因此EGO-OMG与大多数现有的动作预期方法互补。我们在Epic Kitchens的行动预期挑战中评估了自我AMG。 Epic Kitchens的以主为中心视角的一致性允许利用以自我依赖的手动性线索。我们展示了最先进的性能,以大幅度的优势超出了所有其他已发表的方法,并在看不见的测试集中排名第一,在Epic Kitchens Action Effientation挑战的可见测试集中排名第二。我们将自我-OMG的成功归因于长时间捕获的语义结构的建模。我们评估了通过几项消融研究做出的设计选择。代码将在接受后发布

We introduce Egocentric Object Manipulation Graphs (Ego-OMG) - a novel representation for activity modeling and anticipation of near future actions integrating three components: 1) semantic temporal structure of activities, 2) short-term dynamics, and 3) representations for appearance. Semantic temporal structure is modeled through a graph, embedded through a Graph Convolutional Network, whose states model characteristics of and relations between hands and objects. These state representations derive from all three levels of abstraction, and span segments delimited by the making and breaking of hand-object contact. Short-term dynamics are modeled in two ways: A) through 3D convolutions, and B) through anticipating the spatiotemporal end points of hand trajectories, where hands come into contact with objects. Appearance is modeled through deep spatiotemporal features produced through existing methods. We note that in Ego-OMG it is simple to swap these appearance features, and thus Ego-OMG is complementary to most existing action anticipation methods. We evaluate Ego-OMG on the EPIC Kitchens Action Anticipation Challenge. The consistency of the egocentric perspective of EPIC Kitchens allows for the utilization of the hand-centric cues upon which Ego-OMG relies. We demonstrate state-of-the-art performance, outranking all other previous published methods by large margins and ranking first on the unseen test set and second on the seen test set of the EPIC Kitchens Action Anticipation Challenge. We attribute the success of Ego-OMG to the modeling of semantic structure captured over long timespans. We evaluate the design choices made through several ablation studies. Code will be released upon acceptance

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源