论文标题

探索基于锚定的EGO4D自然语言查询的检测

Exploring Anchor-based Detection for Ego4D Natural Language Query

论文作者

Zheng, Sipeng, Zhang, Qi, Liu, Bei, Jin, Qin, Fu, Jianlong

论文摘要

在本文中,我们提供了CVPR 2022中EGO4D自然语言查询挑战的技术报告。由于对视频内容的全面了解,自然语言查询任务是具有挑战性的。以前的大多数工作基于第三人称视图数据集解决了此任务,而在以自我为中心的视图中,很少有研究兴趣。不过,已经取得了长足的进步,我们注意到以前的作品无法很好地适应以自我为中心的视图数据集,例如,ego4d主要是因为两个原因:1)ego4d中的大多数查询的时间持续时间过多(例如,小于5秒); 2)EGO4D中的查询面临着对长期时间订单的更复杂的视频理解。考虑到这些,我们提出了解决这一挑战的解决方案,以解决上述问题。

In this paper we provide the technique report of Ego4D natural language query challenge in CVPR 2022. Natural language query task is challenging due to the requirement of comprehensive understanding of video contents. Most previous works address this task based on third-person view datasets while few research interest has been placed in the ego-centric view by far. Great progress has been made though, we notice that previous works can not adapt well to ego-centric view datasets e.g., Ego4D mainly because of two reasons: 1) most queries in Ego4D have a excessively small temporal duration (e.g., less than 5 seconds); 2) queries in Ego4D are faced with much more complex video understanding of long-term temporal orders. Considering these, we propose our solution of this challenge to solve the above issues.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源