论文标题
可信赖的共标签从多个嘈杂注释者学习
Trustable Co-label Learning from Multiple Noisy Annotators
论文作者
论文摘要
监督的深度学习取决于大量准确注释的示例,这在许多现实世界中通常是不切实际的。一个典型的选择是从多个嘈杂的注释者那里学习。许多较早的作品都假定所有标签都是嘈杂的,而通常可以提供一些带有干净标签的可信样品。这提出了以下重要问题:我们如何有效地使用少量受信任的数据来促进从多个注释者中学习鲁棒的分类器学习?本文提出了一种数据效率的方法,称为\ emph {可信的共标签学习}(TCL),以在可用的一小群可信数据集时从多个嘈杂的注释器中学习深层分类器。这种方法遵循耦合视图学习方式,该学习方式共同学习数据分类器和标签聚合器。它有效地使用受信任的数据作为生成可信赖的软标签(称为共标签)的指南。然后,可以通过交替重新注释伪标签并完善分类器来进行共同的学习。此外,我们进一步改善了特殊的完整数据案例的TCL,其中每个实例均由所有注释者标记,并且标签聚合器由多层神经网络表示,以增强模型容量。关于合成和实际数据集的广泛实验清楚地证明了所提出方法的有效性和鲁棒性。源代码可从https://github.com/shikunli/tcl获得
Supervised deep learning depends on massive accurately annotated examples, which is usually impractical in many real-world scenarios. A typical alternative is learning from multiple noisy annotators. Numerous earlier works assume that all labels are noisy, while it is usually the case that a few trusted samples with clean labels are available. This raises the following important question: how can we effectively use a small amount of trusted data to facilitate robust classifier learning from multiple annotators? This paper proposes a data-efficient approach, called \emph{Trustable Co-label Learning} (TCL), to learn deep classifiers from multiple noisy annotators when a small set of trusted data is available. This approach follows the coupled-view learning manner, which jointly learns the data classifier and the label aggregator. It effectively uses trusted data as a guide to generate trustable soft labels (termed co-labels). A co-label learning can then be performed by alternately reannotating the pseudo labels and refining the classifiers. In addition, we further improve TCL for a special complete data case, where each instance is labeled by all annotators and the label aggregator is represented by multilayer neural networks to enhance model capacity. Extensive experiments on synthetic and real datasets clearly demonstrate the effectiveness and robustness of the proposed approach. Source code is available at https://github.com/ShikunLi/TCL