论文标题

在时间视频接地中朝着预先训练的语言模型的参数集成

Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding

论文作者

Shimomoto, Erica K., Marrese-Taylor, Edison, Takamura, Hiroya, Kobayashi, Ichiro, Nakayama, Hideki, Miyao, Yusuke

论文摘要

本文探讨了时间视频接地(TVG)的任务,在该任务中,如果没有修剪的视频和自然语言句子查询,目标是识别和确定查询描述的视频中动作实例的时间界。最近的作品通过以更昂贵的培训为代价改善了具有大型预训练的语言模型(PLM)的查询输入来解决这项任务。但是,这种整合的效果尚不清楚,因为这些作品还提出了视觉输入的改进。因此,本文研究了PLM在TVG中的影响,并评估了使用NLP适配器进行参数有效培训的适用性。我们将流行的PLM与许多现有方法选择并测试不同的适配器,以减少其他参数的影响。我们在三个具有挑战性的数据集上的结果表明,在没有更改视觉输入的情况下,TVG模型从PLM集成和微调大大受益,这强调了句子查询表示在此任务中的重要性。此外,NLP适配器是完全微调的有效替代方法,即使它们不是根据我们的任务量身定制的,可以在较大的TVG模型中进行PLM集成,并提供与SOTA模型相当的结果。最后,我们的结果阐明了哪些适配器在不同情况下最有效的方法。

This paper explores the task of Temporal Video Grounding (TVG) where, given an untrimmed video and a natural language sentence query, the goal is to recognize and determine temporal boundaries of action instances in the video described by the query. Recent works tackled this task by improving query inputs with large pre-trained language models (PLM) at the cost of more expensive training. However, the effects of this integration are unclear, as these works also propose improvements in the visual inputs. Therefore, this paper studies the effects of PLMs in TVG and assesses the applicability of parameter-efficient training with NLP adapters. We couple popular PLMs with a selection of existing approaches and test different adapters to reduce the impact of the additional parameters. Our results on three challenging datasets show that, without changing the visual inputs, TVG models greatly benefited from the PLM integration and fine-tuning, stressing the importance of sentence query representation in this task. Furthermore, NLP adapters were an effective alternative to full fine-tuning, even though they were not tailored to our task, allowing PLM integration in larger TVG models and delivering results comparable to SOTA models. Finally, our results shed light on which adapters work best in different scenarios.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源