论文标题

知识蒸馏:调查

Knowledge Distillation: A Survey

论文作者

Gou, Jianping, Yu, Baosheng, Maybank, Stephen John, Tao, Dacheng

论文摘要

近年来,深层神经网络在行业和学术界都取得了成功,尤其是对于计算机视觉任务。深度学习的巨大成功主要是由于其可扩展性编码大规模数据和操纵数十亿个模型参数。但是,在资源有限的设备(例如手机和嵌入式设备)上部署这些笨拙的深层模型是一个挑战,这不仅是因为计算复杂性很高,而且还因为较大的存储要求。为此,已经开发了多种模型压缩和加速技术。作为模型压缩和加速的代表性类型,知识蒸馏有效地从大型教师模型中学习了一个小型学生模型。它受到了社区的迅速关注。本文从知识类别,培训方案,教师建筑,蒸馏算法,绩效比较和应用的角度进行了全面的知识蒸馏调查。此外,简要审查了知识蒸馏的挑战,并讨论和转发了对未来研究的评论。

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源