带有已知子类标签的子类知识蒸馏

论文标题

带有已知子类标签的子类知识蒸馏

Subclass Knowledge Distillation with Known Subclass Labels

论文作者

Sajedi, Ahmad, Lawryshyn, Yuri A., Plataniotis, Konstantinos N.

论文摘要

这项工作介绍了一个新颖的知识蒸馏框架，用于分类任务，其中可用并考虑到现有子类信息。在具有少量类或二进制检测的分类任务中，从教师到学生的信息量受到限制，从而限制了知识蒸馏的效用。通过利用类中可能的子类信息，可以提高性能。为此，我们提出了所谓的子类知识蒸馏（SKD），这是将预测子类知识从老师转移到较小学生的过程。在老师的课堂逻辑中不存在的有意义的信息，而是在子类逻辑中存在（例如，课堂内的相似之处）将通过SKD传达给学生，这将提高学生的表现。从分析上，我们衡量教师可以通过SKD向学生提供多少额外信息，以证明我们工作的功效。在临床应用中评估了开发的框架，即结直肠息肉分类。这是两个类别和每个类的许多子类的实际问题。在此应用程序中，使用临床医生提供的注释来根据注释标签的学习方式来定义子类。一名接受SKD框架训练的轻便，低复杂的学生的F1得分为85.05％，提高了1.47％，比学生分别接受和没有常规知识蒸馏的培训的学生获得了2.10％的收益。通过和没有SKD培训的学生之间的2.10％的F1得分差距可以通过额外的子类知识来解释，即每个样本的额外的0.4656标签位，老师可以在我们的实验中转移。

This work introduces a novel knowledge distillation framework for classification tasks where information on existing subclasses is available and taken into consideration. In classification tasks with a small number of classes or binary detection, the amount of information transferred from the teacher to the student is restricted, thus limiting the utility of knowledge distillation. Performance can be improved by leveraging information of possible subclasses within the classes. To that end, we propose the so-called Subclass Knowledge Distillation (SKD), a process of transferring the knowledge of predicted subclasses from a teacher to a smaller student. Meaningful information that is not in the teacher's class logits but exists in subclass logits (e.g., similarities within classes) will be conveyed to the student through the SKD, which will then boost the student's performance. Analytically, we measure how much extra information the teacher can provide the student via the SKD to demonstrate the efficacy of our work. The framework developed is evaluated in clinical application, namely colorectal polyp binary classification. It is a practical problem with two classes and a number of subclasses per class. In this application, clinician-provided annotations are used to define subclasses based on the annotation label's variability in a curriculum style of learning. A lightweight, low-complexity student trained with the SKD framework achieves an F1-score of 85.05%, an improvement of 1.47%, and a 2.10% gain over the student that is trained with and without conventional knowledge distillation, respectively. The 2.10% F1-score gap between students trained with and without the SKD can be explained by the extra subclass knowledge, i.e., the extra 0.4656 label bits per sample that the teacher can transfer in our experiment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题