以egipentric视频语言预读 @ ego4d挑战2022

论文标题

以egipentric视频语言预读 @ ego4d挑战2022

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

论文作者

Lin, Kevin Qinghong, Wang, Alex Jinpeng, Soldan, Mattia, Wray, Michael, Yan, Rui, Xu, Eric Zhongcong, Gao, Difei, Tu, Rongcheng, Zhao, Wenzhe, Kong, Weijie, Cai, Chengfei, Wang, Hongfa, Damen, Dima, Ghanem, Bernard, Liu, Wei, Shou, Mike Zheng

论文摘要

在本报告中，我们建议针对四个EGO4D挑战任务，包括自然语言查询（NLQ），MOMMS COLERY（MQ），对象状态变化分类（OSCC）和PNR本地化（PNR），提出了基于视频语言预处理（VLP）解决方案\ cite {kevin2022egovlp}。尤其是，我们将最近发布的EGO4D数据集\ cite {grauman2021ego4d}从验证数据集，预处理目标和开发集中从egecentric vlp中进行了先驱。基于以上三种设计，我们开发了一个验证的视频语言模型，该模型能够将其以自我为中心的视频文本表示或仅视频表示形式转移到几个视频下游任务中。我们的Egentric VLP在NLQ上实现10.46r@1&iou @0.3，MQ上的10.33 MAP，OSCC上的74％ACC，PNR上的0.67秒错误。该代码可在https://github.com/showlab/egovlp上找到。

In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR). Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretrained video-language model that is able to transfer its egocentric video-text representation or video-only representation to several video downstream tasks. Our Egocentric VLP achieves 10.46R@1&IoU @0.3 on NLQ, 10.33 mAP on MQ, 74% Acc on OSCC, 0.67 sec error on PNR. The code is available at https://github.com/showlab/EgoVLP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题