论文标题

从非IID数据中学习的灰色学习

Gray Learning from Non-IID Data with Out-of-distribution Samples

论文作者

Zhao, Zhilin, Cao, Longbing, Wang, Chang-Dong

论文摘要

即使在专家注释的注释时,培训数据的完整性远非保证,尤其是对于包括分发样本和分发样本的非IID数据集。在理想的情况下,大多数样本将是分配的,而语义上偏差的样本将被识别为分布范围并在注释过程中排除。但是,专家可能会错误地将这些分布样本分类为分配,将其分配给他们固有不可靠的标签。这种不可靠的标签和各种数据类型的混合使学习强大的神经网络的任务尤其具有挑战性。我们观察到,除了与不可靠的地面真实标签相对应的那些类别外,几乎始终可以排除分布式样本和分布外样品。这打开了使用可靠的互补标签的可能性,这些标签表明样本不属的类别。在这种见解的指导下,我们引入了一种新颖的方法,称为\ textit {灰色学习}(gl),该方法既利用地面真相和互补标签。至关重要的是,基于预测置信度水平,GL可以自适应调整这两种标签类型的损耗。通过将我们在统计学习理论中进行扎根,我们为概括误差提供了界限,表明GL即使在非IID设置中也达到了严格的约束。广泛的实验评估表明,我们的方法显着优于以强大统计数据为基础的替代方法。

The integrity of training data, even when annotated by experts, is far from guaranteed, especially for non-IID datasets comprising both in- and out-of-distribution samples. In an ideal scenario, the majority of samples would be in-distribution, while samples that deviate semantically would be identified as out-of-distribution and excluded during the annotation process. However, experts may erroneously classify these out-of-distribution samples as in-distribution, assigning them labels that are inherently unreliable. This mixture of unreliable labels and varied data types makes the task of learning robust neural networks notably challenging. We observe that both in- and out-of-distribution samples can almost invariably be ruled out from belonging to certain classes, aside from those corresponding to unreliable ground-truth labels. This opens the possibility of utilizing reliable complementary labels that indicate the classes to which a sample does not belong. Guided by this insight, we introduce a novel approach, termed \textit{Gray Learning} (GL), which leverages both ground-truth and complementary labels. Crucially, GL adaptively adjusts the loss weights for these two label types based on prediction confidence levels. By grounding our approach in statistical learning theory, we derive bounds for the generalization error, demonstrating that GL achieves tight constraints even in non-IID settings. Extensive experimental evaluations reveal that our method significantly outperforms alternative approaches grounded in robust statistics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源