dirv：端到端人类对象互动检测的密集交互区域投票

论文标题

dirv：端到端人类对象互动检测的密集交互区域投票

DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

论文作者

Fang, Hao-Shu, Xie, Yichen, Shao, Dian, Lu, Cewu

论文摘要

近年来，人类对象的相互作用（HOI）检测取得了令人印象深刻的进步。但是，常规的两阶段方法通常的推理缓慢。另一方面，现有的一阶段方法主要集中于互动的联盟区域，这些方法将不必要的视觉信息作为对HOI检测的干扰。为了解决上述问题，我们在本文中提出了一种新型的单阶段HOI检测方法DIRV，基于一个新概念，称为HOI问题的相互作用区域。与以前的方法不同，我们的方法集中在每个人类对象对的不同尺度上的密集采样相互作用区域上，以捕获相互作用最重要的微妙视觉特征。此外，为了弥补单个相互作用区域的检测缺陷，我们引入了一种新颖的投票策略，该策略充分利用那些重叠的相互作用区域代替常规的非最大抑制（NMS）。在两个流行的基准上进行的广泛实验：V-Coco和Hico-Det表明，我们的方法以最高的推理速度和最轻的网络体系结构的幅度优于现有的最先进。我们在没有附加输入的情况下实现了56.1映射。我们的代码可在以下网址公开获取：https：//github.com/mvig-sjtu/dirv

Recent years, human-object interaction (HOI) detection has achieved impressive advances. However, conventional two-stage methods are usually slow in inference. On the other hand, existing one-stage methods mainly focus on the union regions of interactions, which introduce unnecessary visual information as disturbances to HOI detection. To tackle the problems above, we propose a novel one-stage HOI detection approach DIRV in this paper, based on a new concept called interaction region for the HOI problem. Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair, so as to capture the subtle visual features that is most essential to the interaction. Moreover, in order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy that makes full use of those overlapped interaction regions in place of conventional Non-Maximal Suppression (NMS). Extensive experiments on two popular benchmarks: V-COCO and HICO-DET show that our approach outperforms existing state-of-the-arts by a large margin with the highest inference speed and lightest network architecture. We achieved 56.1 mAP on V-COCO without addtional input. Our code is publicly available at: https://github.com/MVIG-SJTU/DIRV

下载PDF全文

下载文献需遵守相关版权规定

论文标题