整体互动变压器网络用于动作检测

论文标题

整体互动变压器网络用于动作检测

Holistic Interaction Transformer Network for Action Detection

论文作者

Faure, Gueter Josmy, Chen, Min-Hung, Lai, Shang-Hong

论文摘要

行动是关于我们如何与环境互动，包括其他人，对象和我们自己。在本文中，我们提出了一种新型的多模式整体相互作用变压器网络（HIT），该网络利用了大多数人类行为必不可少的被忽视但批判性的信息。提出的“命中”网络是一个全面的双模式框架，包括RGB流和姿势流。他们每个人都会单独建模人，对象和手相互作用。在每个子网络中，引入了模式内聚合模块（IMA），以选择性合并单个交互单位。然后，使用细心的融合机制（AFM）胶粘到每种模式的产生特征。最后，我们从时间上下文中提取线索，以更好地使用缓存的内存分类。我们的方法在J-HMDB，UCF101-24和Multisports数据集上的先前方法大大优于以前的方法。我们还在AVA上取得了竞争成果。该代码将在https://github.com/joslefaure/hit上找到。

Actions are about how we interact with the environment, including other people, objects, and ourselves. In this paper, we propose a novel multi-modal Holistic Interaction Transformer Network (HIT) that leverages the largely ignored, but critical hand and pose information essential to most human actions. The proposed "HIT" network is a comprehensive bi-modal framework that comprises an RGB stream and a pose stream. Each of them separately models person, object, and hand interactions. Within each sub-network, an Intra-Modality Aggregation module (IMA) is introduced that selectively merges individual interaction units. The resulting features from each modality are then glued using an Attentive Fusion Mechanism (AFM). Finally, we extract cues from the temporal context to better classify the occurring actions using cached memory. Our method significantly outperforms previous approaches on the J-HMDB, UCF101-24, and MultiSports datasets. We also achieve competitive results on AVA. The code will be available at https://github.com/joslefaure/HIT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题