GID-NET：检测与全局和实例依赖关系的人类对象相互作用

论文标题

GID-NET：检测与全局和实例依赖关系的人类对象相互作用

GID-Net: Detecting Human-Object Interaction with Global and Instance Dependency

论文作者

Yang, Dongming, Zou, YueXian, Zhang, Jian, Li, Ge

论文摘要

由于检测和识别个人或对象不足以了解视觉世界，因此学习人类与周围物体的互动方式成为一种核心技术。但是，卷积操作描绘了实例之间的视觉相互作用，因为它们仅构建一个一次处理一个本地社区的块。为了解决这个问题，我们从人类的看法中吸取了观察HOI的知识，以引入一种两阶段的可训练的推理机制，称为GID块。 GID块突破了当地的社区，并从场景中捕获了全球级别和实例级别的像素的长距离依赖性，以帮助检测实例之间的相互作用。此外，我们进行了一个称为GID-NET的多流网络，该网络是一个由人分支，对象分支和相互作用分支组成的人类对象相互作用检测框架。在每个分支机构中，有效地对全球层面和本地级别的语义信息进行了有效的推理和汇总。我们已经将我们提出的GID-NET与两个公共基准的现有最新方法进行了比较，包括V-Coco和Hico-det。结果表明，GID-NET在以上两个基准上的表现优于现有表现最佳的方法，从而验证了其在检测人类对象相互作用方面的疗效。

Since detecting and recognizing individual human or object are not adequate to understand the visual world, learning how humans interact with surrounding objects becomes a core technology. However, convolution operations are weak in depicting visual interactions between the instances since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human perception in observing HOIs to introduce a two-stage trainable reasoning mechanism, referred to as GID block. GID block breaks through the local neighborhoods and captures long-range dependency of pixels both in global-level and instance-level from the scene to help detecting interactions between instances. Furthermore, we conduct a multi-stream network called GID-Net, which is a human-object interaction detection framework consisting of a human branch, an object branch and an interaction branch. Semantic information in global-level and local-level are efficiently reasoned and aggregated in each of the branches. We have compared our proposed GID-Net with existing state-of-the-art methods on two public benchmarks, including V-COCO and HICO-DET. The results have showed that GID-Net outperforms the existing best-performing methods on both the above two benchmarks, validating its efficacy in detecting human-object interactions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题