论文标题
对象状态变化分类在以时代为中心的视频中使用分开的时空注意机制
Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism
论文作者
论文摘要
本报告描述了我们的提交称为“ tarheels”的EGO4D:对象状态变更分类挑战。我们使用基于变压器的视频识别模型,并利用分隔的时空注意机制来分类以中心视频中的对象状态变化。我们的提交在挑战中取得了第二好的表现。此外,我们进行了一项消融研究,以表明识别以egintric视频中的对象状态变化需要时间建模能力。最后,我们提出了几个积极和负面的例子,以可视化模型的预测。该代码可公开可用:https://github.com/md-mohaiminul/ObjectStateChange
This report describes our submission called "TarHeels" for the Ego4D: Object State Change Classification Challenge. We use a transformer-based video recognition model and leverage the Divided Space-Time Attention mechanism for classifying object state change in egocentric videos. Our submission achieves the second-best performance in the challenge. Furthermore, we perform an ablation study to show that identifying object state change in egocentric videos requires temporal modeling ability. Lastly, we present several positive and negative examples to visualize our model's predictions. The code is publicly available at: https://github.com/md-mohaiminul/ObjectStateChange