论文标题

reler@zju提交到EGO4D Moment查询挑战2022

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

论文作者

Shao, Jiayi, Wang, Xiaohan, Yang, Yi

论文摘要

在本报告中,我们将Reler@ZJU1提交到ECCV 2022中的EGO4D Moment Chareies Challenge挑战。在此任务中,目标是在以自我为中心的视频中检索和本地化所有可能活动的实例。 EGO4D数据集在时间动作本地化任务方面具有挑战性,因为视频的时间持续时间很长,每个视频都包含具有细粒度动作类的多个动作实例。为了解决这些问题,我们利用多尺度变压器对不同的操作类别进行分类并预测每个实例的边界。此外,为了更好地捕获长视频中的长期时间依赖性,我们提出了一个细分级的复发机制。与将所有视频功能直接馈送到变压器编码器相比,提出的段级复发机制减轻了优化困难,并取得了更好的性能。最终提交的召回@1,tiou = 0.5分数为37.24,平均地图得分为17.67,在排行榜上获得了3-RD。

In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment Queries Challenge in ECCV 2022. In this task, the goal is to retrieve and localize all instances of possible activities in egocentric videos. Ego4D dataset is challenging for the temporal action localization task as the temporal duration of the videos is quite long and each video contains multiple action instances with fine-grained action classes. To address these problems, we utilize a multi-scale transformer to classify different action categories and predict the boundary of each instance. Moreover, in order to better capture the long-term temporal dependencies in the long videos, we propose a segment-level recurrence mechanism. Compared with directly feeding all video features to the transformer encoder, the proposed segment-level recurrence mechanism alleviates the optimization difficulties and achieves better performance. The final submission achieved Recall@1,tIoU=0.5 score of 37.24, average mAP score of 17.67 and took 3-rd place on the leaderboard.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源