论文标题
有效的属于辅助知识蒸馏
Efficient Sub-structured Knowledge Distillation
论文作者
论文摘要
结构化预测模型旨在解决输出是复杂结构而不是单个变量的问题的类型。由于其指数较大的输出空间,对此类模型进行知识蒸馏并不是微不足道的。在这项工作中,我们提出了一种方法,该方法的表述要简单得多,并且在培训方面比现有方法更有效。具体而言,我们通过将知识从教师模型转移到其学生模型,通过将其在所有子结构上的预测(而不是整个输出空间)上进行匹配。通过这种方式,我们避免采用一些耗时的技术,例如解码输出结构的动态编程(DP),这允许并行计算,并使训练过程在实践中更快。此外,它鼓励学生模型更好地模仿教师模型的内部行为。对两个结构化预测任务的实验表明,我们的方法表现优于先前的方法,并将一个训练时期的时间成本减半。
Structured prediction models aim at solving a type of problem where the output is a complex structure, rather than a single variable. Performing knowledge distillation for such models is not trivial due to their exponentially large output space. In this work, we propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches. Specifically, we transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space. In this manner, we avoid adopting some time-consuming techniques like dynamic programming (DP) for decoding output structures, which permits parallel computation and makes the training process even faster in practice. Besides, it encourages the student model to better mimic the internal behavior of the teacher model. Experiments on two structured prediction tasks demonstrate that our approach outperforms previous methods and halves the time cost for one training epoch.