论文标题
TDAF:视觉任务的自上而下的注意框架
TDAF: Top-Down Attention Framework for Vision Tasks
论文作者
论文摘要
人类的注意机制通常以自上而下的方式起作用,但在视觉研究中并没有很好地探索它。在这里,我们提出了自上而下的注意框架(TDAF)来捕捉自上而下的注意力,在大多数现有模型中都可以轻松采用。设计的递归双向嵌套结构形成了两组正交路径,递归和结构性的路径,分别提取自下而上的空间特征和自上而下的注意特征。因此,这种空间和注意力特征是深层嵌套的,因此,所提出的框架以混合自上而下和自下而上的方式起作用。经验证据表明,我们的TDAF可以捕获有效的分层注意信息并提高性能。带有TDAF的重新连接可在ImageNet上提高2.0%。为了检测对象,比FCO相比,性能提高了2.7%。对于姿势估计,TDAF将基线提高了1.6%。为了采取行动识别,采用TDAF的3D-RESNET可提高准确性1.7%。
Human attention mechanisms often work in a top-down manner, yet it is not well explored in vision research. Here, we propose the Top-Down Attention Framework (TDAF) to capture top-down attentions, which can be easily adopted in most existing models. The designed Recursive Dual-Directional Nested Structure in it forms two sets of orthogonal paths, recursive and structural ones, where bottom-up spatial features and top-down attention features are extracted respectively. Such spatial and attention features are nested deeply, therefore, the proposed framework works in a mixed top-down and bottom-up manner. Empirical evidence shows that our TDAF can capture effective stratified attention information and boost performance. ResNet with TDAF achieves 2.0% improvements on ImageNet. For object detection, the performance is improved by 2.7% AP over FCOS. For pose estimation, TDAF improves the baseline by 1.6%. And for action recognition, the 3D-ResNet adopting TDAF achieves improvements of 1.7% accuracy.