用于有效视频处理的增量蒸馏

论文标题

用于有效视频处理的增量蒸馏

Delta Distillation for Efficient Video Processing

论文作者

Habibian, Amirhossein, Yahia, Haitam Ben, Abati, Davide, Gavves, Efstratios, Porikli, Fatih

论文摘要

本文旨在通过利用视频帧之间存在的时间冗余来加速视频流处理，例如对象检测和语义分割。我们提出了一种新颖的知识蒸馏图架，而不是使用运动对齐（例如光流）传播和翘曲特征，将其呈现为三角洲蒸馏。在我们的建议中，学生可以随着时间的推移了解教师中级功能的变化。我们证明，由于视频帧中的时间冗余，这些时间变化可以有效地蒸馏出来。在推论期间，教师和学生都合作提供预测：前者仅提供仅在键框架上提取的初始表示形式，而后者则通过迭代估算和应用三角洲的连续框架来提供预测。此外，我们考虑各种设计选择，以学习最佳的学生体系结构，包括端到端可学习的架构搜索。通过对包括最有效的体系结构（包括最有效的架构）进行的广泛实验，我们证明了Delta蒸馏以准确性与视频中语义细分和对象检测的效率折衷方面设定了新的最新状态。最后，我们表明，作为副产品，增量蒸馏提高了教师模型的时间一致性。

This paper aims to accelerate video stream processing, such as object detection and semantic segmentation, by leveraging the temporal redundancies that exist between video frames. Instead of propagating and warping features using motion alignment, such as optical flow, we propose a novel knowledge distillation schema coined as Delta Distillation. In our proposal, the student learns the variations in the teacher's intermediate features over time. We demonstrate that these temporal variations can be effectively distilled due to the temporal redundancies within video frames. During inference, both teacher and student cooperate for providing predictions: the former by providing initial representations extracted only on the key-frame, and the latter by iteratively estimating and applying deltas for the successive frames. Moreover, we consider various design choices to learn optimal student architectures including an end-to-end learnable architecture search. By extensive experiments on a wide range of architectures, including the most efficient ones, we demonstrate that delta distillation sets a new state of the art in terms of accuracy vs. efficiency trade-off for semantic segmentation and object detection in videos. Finally, we show that, as a by-product, delta distillation improves the temporal consistency of the teacher model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题