通过解码路径的扩展来学习人类物体互动检测中的变压器

论文标题

通过解码路径的扩展来学习人类物体互动检测中的变压器

Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection

论文作者

Park, Jihwan, Lee, SeungJun, Heo, Hwan, Choi, Hyeong Kyu, Kim, Hyunwoo J.

论文摘要

人类对象的相互作用检测是一项整体视觉识别任务，需要对象检测以及相互作用分类。 HOI检测的先前作品已通过子集预测的各种组成，例如图像 - > ho-> i，图像 - > hi-> o。我们提出了横路一致性学习（CPC）的动机，这是一种新型的端到端学习策略，旨在通过利用增强解码路径来改善变压器的HOI检测。 CPC学习强制执行从排列的推理序列中实现所有可能的预测，以保持一致。这个简单的方案使模型学习一致的表示，从而改善了概括而不增加模型容量。我们的实验证明了我们方法的有效性，与基线模型相比，我们在V-Coco和Hico-Det上取得了显着改善。我们的代码可在https://github.com/mlvlab/cpchoi上找到。

Human-Object Interaction detection is a holistic visual recognition task that entails object detection as well as interaction classification. Previous works of HOI detection has been addressed by the various compositions of subset predictions, e.g., Image -> HO -> I, Image -> HI -> O. Recently, transformer based architecture for HOI has emerged, which directly predicts the HOI triplets in an end-to-end fashion (Image -> HOI). Motivated by various inference paths for HOI detection, we propose cross-path consistency learning (CPC), which is a novel end-to-end learning strategy to improve HOI detection for transformers by leveraging augmented decoding paths. CPC learning enforces all the possible predictions from permuted inference sequences to be consistent. This simple scheme makes the model learn consistent representations, thereby improving generalization without increasing model capacity. Our experiments demonstrate the effectiveness of our method, and we achieved significant improvement on V-COCO and HICO-DET compared to the baseline models. Our code is available at https://github.com/mlvlab/CPChoi.

下载PDF全文

下载文献需遵守相关版权规定

论文标题