论文标题

重量蒸馏:转移神经网络参数的知识

Weight Distillation: Transferring the Knowledge in Neural Network Parameters

论文作者

Lin, Ye, Li, Yanyang, Wang, Ziyang, Li, Bei, Du, Quan, Xiao, Tong, Zhu, Jingbo

论文摘要

知识蒸馏已被证明在模型加速和压缩方面有效。它允许小型网络学会以与大型网络相同的方式概括。训练前的最新成功表明了转移模型参数的有效性。受此启发的启发,我们研究了另一线研究中的模型加速和压缩方法。我们提出了权重蒸馏,以通过参数发生器传输大型网络参数中的知识。我们对WMT16 EN-RO,NIST12 ZH-EN和WMT14 EN-DE机器翻译任务的实验表明,重量蒸馏可以训练比大型网络快1.88〜2.94倍的小型网络,但具有竞争性能。使用相同大小的小型网络,重量蒸馏可以胜过0.51〜1.82 BLEU点的知识蒸馏。

Knowledge distillation has been proven to be effective in model acceleration and compression. It allows a small network to learn to generalize in the same way as a large network. Recent successes in pre-training suggest the effectiveness of transferring model parameters. Inspired by this, we investigate methods of model acceleration and compression in another line of research. We propose Weight Distillation to transfer the knowledge in the large network parameters through a parameter generator. Our experiments on WMT16 En-Ro, NIST12 Zh-En, and WMT14 En-De machine translation tasks show that weight distillation can train a small network that is 1.88~2.94x faster than the large network but with competitive performance. With the same sized small network, weight distillation can outperform knowledge distillation by 0.51~1.82 BLEU points.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源