论文标题
探索视频的改编,以进行视听诊断和社交 @ ego4d看我挑战
Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @ Ego4d Looking at me Challenge
论文作者
论文摘要
在本报告中,我们介绍了经过验证的视频蒙版自动编码器(Videomae),以使Ego4D挑战为中心的任务。 Videomae是自我监督视频预训练的数据效率预处理模型,可以轻松地转移到下游任务。我们表明,从视频传输的表示形式具有良好的时空建模和捕获小动作的能力。我们只需要使用以自我为中心的数据来训练10个时代,这些视频是由从第三人称观点获得的普通视频鉴定的,而我们可以比Ego4D上的基线更好地取得了挑战。
In this report, we present the transferring pretrained video mask autoencoders(VideoMAE) to egocentric tasks for Ego4d Looking at me Challenge. VideoMAE is the data-efficient pretraining model for self-supervised video pre-training and can easily transfer to downstream tasks. We show that the representation transferred from VideoMAE has good Spatio-temporal modeling and the ability to capture small actions. We only need to use egocentric data to train 10 epochs based on VideoMAE which pretrained by the ordinary videos acquired from a third person's view, and we can get better results than the baseline on Ego4d Looking at me Challenge.