论文标题

特征结构蒸馏,带中心内核对准伯特转移

Feature Structure Distillation with Centered Kernel Alignment in BERT Transferring

论文作者

Jung, Hee-Jun, Kim, Doyeon, Na, Seung-Hoon, Kim, Kangil

论文摘要

知识蒸馏是一种通过减少差异来将有关陈述信息从教师转移到学生的方法。这种方法的一个挑战是减少学生表征的灵活性,从而导致对教师知识的学习不准确。为了解决它的转移,我们研究了指定为三种类型的表示结构的蒸馏:功能内,局部场间,全局功能间结构。要转移它们,我们基于中心内核对齐方式引入了特征结构蒸馏方法,该方法为相似的特征结构分配了一致的价值,并揭示了更有用的关系。特别是,针对全局结构实现了带有聚类的内存调节方法。这些方法对九个任务进行了经验分析,以使用来自Transformers(Bert)的双向编码器表示胶数据集的语言理解,这是代表性的神经语言模型。在结果中,与最新的蒸馏方法相比,提出的方法有效地传递了三种类型的结构并提高性能。实际上,该方法的代码可在https://github.com/maroo-sky/fsd中获得。

Knowledge distillation is an approach to transfer information on representations from a teacher to a student by reducing their difference. A challenge of this approach is to reduce the flexibility of the student's representations inducing inaccurate learning of the teacher's knowledge. To resolve it in transferring, we investigate distillation of structures of representations specified to three types: intra-feature, local inter-feature, global inter-feature structures. To transfer them, we introduce feature structure distillation methods based on the Centered Kernel Alignment, which assigns a consistent value to similar features structures and reveals more informative relations. In particular, a memory-augmented transfer method with clustering is implemented for the global structures. The methods are empirically analyzed on the nine tasks for language understanding of the GLUE dataset with Bidirectional Encoder Representations from Transformers (BERT), which is a representative neural language model. In the results, the proposed methods effectively transfer the three types of structures and improve performance compared to state-of-the-art distillation methods. Indeed, the code for the methods is available in https://github.com/maroo-sky/FSD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源