论文标题

MGTR:用变压器端到端的相互视线检测

MGTR: End-to-End Mutual Gaze Detection with Transformer

论文作者

Guo, Hang, Hu, Zhengxi, Liu, Jingtai

论文摘要

人们在我们的日常互动中互相看待彼此或相互视线是无处不在的,并且发现相互观察对于理解人类社交场景具有重要意义。当前的相互视线检测方法集中在两阶段方法上,其推理速度受到两阶段管道的限制,第二阶段的性能受第一阶段的影响。在本文中,我们提出了一个新型的一阶段相互视线检测框架,称为相互视线变压器或MGTR,以端到端的方式执行相互视线检测。通过设计相互视线实例三元,MGTR可以检测每个人头边界框,并基于全局图像信息同时推断相互视线的关系,从而简化整个过程。两个相互视线数据集的实验结果表明,我们的方法能够加速相互视线检测过程而不会失去性能。消融研究表明,MGTR的不同成分可以捕获图像中不同级别的语义信息。代码可在https://github.com/gmbition/mgtr上找到

People's looking at each other or mutual gaze is ubiquitous in our daily interactions, and detecting mutual gaze is of great significance for understanding human social scenes. Current mutual gaze detection methods focus on two-stage methods, whose inference speed is limited by the two-stage pipeline and the performance in the second stage is affected by the first one. In this paper, we propose a novel one-stage mutual gaze detection framework called Mutual Gaze TRansformer or MGTR to perform mutual gaze detection in an end-to-end manner. By designing mutual gaze instance triples, MGTR can detect each human head bounding box and simultaneously infer mutual gaze relationship based on global image information, which streamlines the whole process with simplicity. Experimental results on two mutual gaze datasets show that our method is able to accelerate mutual gaze detection process without losing performance. Ablation study shows that different components of MGTR can capture different levels of semantic information in images. Code is available at https://github.com/Gmbition/MGTR

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源