论文标题

线性视频变压器具有功能固定

Linear Video Transformer with Feature Fixation

论文作者

Lu, Kaiyue, Liu, Zexiang, Wang, Jianyuan, Sun, Weixuan, Qin, Zhen, Li, Dong, Shen, Xuyang, Deng, Hui, Han, Xiaodong, Dai, Yuchao, Zhong, Yiran

论文摘要

视觉变形金刚在视频分类中取得了令人印象深刻的性能,同时遭受了由软马克斯注意机制引起的二次复杂性。一些研究通过减少注意力计算中的令牌数量来减轻计算成本,但复杂性仍然是二次。另一种有希望的方法是用线性注意力替换SoftMax注意,该注意具有线性复杂性,但表现出明显的性能下降。我们发现,线性注意力的下降是由于缺乏关注关键特征的注意力集中。因此,我们提出了一个功能固定模块,以在计算线性注意力之前重新重量查询和键的特征。具体来说,我们将查询,键和价值视为输入令牌的各种潜在表示,并通过汇总查询键值信息来学习功能固定比率。这对于全面衡量特征的重要性是有益的。此外,我们通过邻里关联增强了特征固定,该协会利用了空间和临时邻近令牌的其他指导。提出的方法可显着提高线性注意力基线,并在三个流行的视频分类基准上实现线性视频变压器的最先进性能。随着参数较少和效率更高,我们的性能甚至可以与某些基于软磁性的二次变压器相媲美。

Vision Transformers have achieved impressive performance in video classification, while suffering from the quadratic complexity caused by the Softmax attention mechanism. Some studies alleviate the computational costs by reducing the number of tokens in attention calculation, but the complexity is still quadratic. Another promising way is to replace Softmax attention with linear attention, which owns linear complexity but presents a clear performance drop. We find that such a drop in linear attention results from the lack of attention concentration on critical features. Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention. Specifically, we regard the query, key, and value as various latent representations of the input token, and learn the feature fixation ratio by aggregating Query-Key-Value information. This is beneficial for measuring the feature importance comprehensively. Furthermore, we enhance the feature fixation by neighborhood association, which leverages additional guidance from spatial and temporal neighbouring tokens. The proposed method significantly improves the linear attention baseline and achieves state-of-the-art performance among linear video Transformers on three popular video classification benchmarks. With fewer parameters and higher efficiency, our performance is even comparable to some Softmax-based quadratic Transformers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源