论文标题

PASTANET:迈向人类活动知识引擎

PaStaNet: Toward Human Activity Knowledge Engine

论文作者

Li, Yong-Lu, Xu, Liang, Liu, Xinpeng, Huang, Xijie, Xu, Yue, Wang, Shiyi, Fang, Hao-Shu, Ma, Ze, Chen, Mingyang, Lu, Cewu

论文摘要

现有的基于图像的活动理解方法主要采用直接映射,即从图像到活动概念,这可能会遇到自巨大差距以来遇到性能瓶颈。鉴于此,我们提出了一条新的道路:首先推断人类部分,然后推理基于部分语义的活动。人体部分状态(面食)是细粒的语义令牌,例如<手,握住,可以构成活动并帮助我们走向人类活动知识引擎的某些东西。为了充分利用面食的力量,我们构建了一个大规模的知识库面食,其中包含7m+意大利面注释。并提出了两个相应的模型:首先,我们设计了一个名为Activity2VEC的模型以提取面食特征,该模型旨在成为各种活动的一般表示。其次,我们使用基于面食的推理方法来推断活动。由Pastanet促进,我们的方法取得了重大改进,例如6.4和13.9在监督学习中的完整和单次打HICO以及3.2和4.2的映射上,在转移学习中基于V-Coco和基于图像的AVA。代码和数据可在http://hake-mvig.cn/上找到。

Existing image-based activity understanding methods mainly adopt direct mapping, i.e. from image to activity concepts, which may encounter performance bottleneck since the huge gap. In light of this, we propose a new path: infer human part states first and then reason out the activities based on part-level semantics. Human Body Part States (PaSta) are fine-grained action semantic tokens, e.g. <hand, hold, something>, which can compose the activities and help us step toward human activity knowledge engine. To fully utilize the power of PaSta, we build a large-scale knowledge base PaStaNet, which contains 7M+ PaSta annotations. And two corresponding models are proposed: first, we design a model named Activity2Vec to extract PaSta features, which aim to be general representations for various activities. Second, we use a PaSta-based Reasoning method to infer activities. Promoted by PaStaNet, our method achieves significant improvements, e.g. 6.4 and 13.9 mAP on full and one-shot sets of HICO in supervised learning, and 3.2 and 4.2 mAP on V-COCO and images-based AVA in transfer learning. Code and data are available at http://hake-mvig.cn/.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源