TDAF：视觉任务的自上而下的注意框架

论文标题

TDAF：视觉任务的自上而下的注意框架

TDAF: Top-Down Attention Framework for Vision Tasks

论文作者

Pang, Bo, Li, Yizhuo, Li, Jiefeng, Li, Muchen, Cao, Hanwen, Lu, Cewu

论文摘要

人类的注意机制通常以自上而下的方式起作用，但在视觉研究中并没有很好地探索它。在这里，我们提出了自上而下的注意框架（TDAF）来捕捉自上而下的注意力，在大多数现有模型中都可以轻松采用。设计的递归双向嵌套结构形成了两组正交路径，递归和结构性的路径，分别提取自下而上的空间特征和自上而下的注意特征。因此，这种空间和注意力特征是深层嵌套的，因此，所提出的框架以混合自上而下和自下而上的方式起作用。经验证据表明，我们的TDAF可以捕获有效的分层注意信息并提高性能。带有TDAF的重新连接可在ImageNet上提高2.0％。为了检测对象，比FCO相比，性能提高了2.7％。对于姿势估计，TDAF将基线提高了1.6％。为了采取行动识别，采用TDAF的3D-RESNET可提高准确性1.7％。

Human attention mechanisms often work in a top-down manner, yet it is not well explored in vision research. Here, we propose the Top-Down Attention Framework (TDAF) to capture top-down attentions, which can be easily adopted in most existing models. The designed Recursive Dual-Directional Nested Structure in it forms two sets of orthogonal paths, recursive and structural ones, where bottom-up spatial features and top-down attention features are extracted respectively. Such spatial and attention features are nested deeply, therefore, the proposed framework works in a mixed top-down and bottom-up manner. Empirical evidence shows that our TDAF can capture effective stratified attention information and boost performance. ResNet with TDAF achieves 2.0% improvements on ImageNet. For object detection, the performance is improved by 2.7% AP over FCOS. For pose estimation, TDAF improves the baseline by 1.6%. And for action recognition, the 3D-ResNet adopting TDAF achieves improvements of 1.7% accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题