单拍视频对象检测器

论文标题

单拍视频对象检测器

Single Shot Video Object Detector

论文作者

Deng, Jiajun, Pan, Yingwei, Yao, Ting, Zhou, Wengang, Li, Houqiang, Mei, Tao

论文摘要

与两阶段探测器相比，潜在更快，更简单的单射击检测器往往更适合视频中的对象检测。然而，此类对象检测器从图像到视频的扩展并不是微不足道的，尤其是当视频中存在外观恶化时，\ emph {e.g。}，运动模糊或闭塞。一个有效的问题是如何探索跨帧的时间连贯性，以增强检测。在本文中，我们建议通过通过相邻帧的聚合来增强人均功能来解决问题。具体而言，我们提出了单镜头视频对象检测器（SSVD） - 一种新的体系结构，将特征聚合整合到视频中的对象检测中，将功能聚合整合到一个阶段检测器中。从技术上讲，SSVD将特征金字塔网络（FPN）作为骨干网络，以产生多尺度功能。与现有的特征聚合方法不同，SSVD一方面估算运动，并沿运动路径沿着附近的特征聚集，另一方面，通过直接从相邻框架中采样两流结构中的相邻帧的特征来幻觉。在Imagenet VID数据集上进行了广泛的实验，并在与最新方法进行比较时报告了竞争结果。更值得注意的是，以$ 448 \ times 448 $输入，SSVD在Imagenet VID上获得了79.2％的地图，通过在Nvidia Titan X Pascal GPU上处理85毫秒的一帧。该代码可在\ url {https://github.com/ddjiajun/ssvd}上获得。

Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos. Nevertheless, the extension of such object detectors from image to video is not trivial especially when appearance deterioration exists in videos, \emph{e.g.}, motion blur or occlusion. A valid question is how to explore temporal coherence across frames for boosting detection. In this paper, we propose to address the problem by enhancing per-frame features through aggregation of neighboring frames. Specifically, we present Single Shot Video Object Detector (SSVD) -- a new architecture that novelly integrates feature aggregation into a one-stage detector for object detection in videos. Technically, SSVD takes Feature Pyramid Network (FPN) as backbone network to produce multi-scale features. Unlike the existing feature aggregation methods, SSVD, on one hand, estimates the motion and aggregates the nearby features along the motion path, and on the other, hallucinates features by directly sampling features from the adjacent frames in a two-stream structure. Extensive experiments are conducted on ImageNet VID dataset, and competitive results are reported when comparing to state-of-the-art approaches. More remarkably, for $448 \times 448$ input, SSVD achieves 79.2% mAP on ImageNet VID, by processing one frame in 85 ms on an Nvidia Titan X Pascal GPU. The code is available at \url{https://github.com/ddjiajun/SSVD}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题