WW-NET：用于对象检测的双神经网络

论文标题

WW-NET：用于对象检测的双神经网络

WW-Nets: Dual Neural Networks for Object Detection

论文作者

Ebrahimpour, Mohammad K., Falandays, J. Ben, Spevack, Samuel, Yang, Ming-Hsuan, Noelle, David C.

论文摘要

我们提出了一个新的深卷积神经网络框架，该框架使用网络连接权重中隐含的对象位置知识来指导对象检测任务中的选择性注意。我们的方法称为网络（WW-NETS），它受到人类视觉途径的结构的启发。在大脑中，视觉分别结合了两个单独的流，一个在颞叶中，另一个在顶叶中，分别称为腹侧流和背流。来自主视觉皮层的腹侧路径由“什么”信息主导，而背途径则由“ Where”信息主导。受此结构的启发，我们提出了一个对象检测框架，涉及“网络”和“网络”的集成。什么网络的目的是对输入图像的相关部分进行选择性关注。网络使用此信息来定位和对感兴趣的对象进行分类。在本文中，我们将这种方法与Pascal VOC 2007和2012和可可对象检测挑战数据集的最新算法进行了比较。另外，我们将方法与人类的“地面真实”的关注进行了比较。我们使用Pascal VOC 2007中的图像报告了对人类受试者的引人注目的实验的结果，我们在WW-NET中展示了人类公开关注与信息处理之间的有趣关系。最后，我们提供了证据表明，与其他对象检测方法相比，我们提出的方法经常通过很大的边缘进行比较。代码和眼睛跟踪的地面真相数据集可在以下网址找到：https：//github.com/mkebrahimpour。

We propose a new deep convolutional neural network framework that uses object location knowledge implicit in network connection weights to guide selective attention in object detection tasks. Our approach is called What-Where Nets (WW-Nets), and it is inspired by the structure of human visual pathways. In the brain, vision incorporates two separate streams, one in the temporal lobe and the other in the parietal lobe, called the ventral stream and the dorsal stream, respectively. The ventral pathway from primary visual cortex is dominated by "what" information, while the dorsal pathway is dominated by "where" information. Inspired by this structure, we have proposed an object detection framework involving the integration of a "What Network" and a "Where Network". The aim of the What Network is to provide selective attention to the relevant parts of the input image. The Where Network uses this information to locate and classify objects of interest. In this paper, we compare this approach to state-of-the-art algorithms on the PASCAL VOC 2007 and 2012 and COCO object detection challenge datasets. Also, we compare out approach to human "ground-truth" attention. We report the results of an eye-tracking experiment on human subjects using images from PASCAL VOC 2007, and we demonstrate interesting relationships between human overt attention and information processing in our WW-Nets. Finally, we provide evidence that our proposed method performs favorably in comparison to other object detection approaches, often by a large margin. The code and the eye-tracking ground-truth dataset can be found at: https://github.com/mkebrahimpour.

下载PDF全文

下载文献需遵守相关版权规定

论文标题