论文标题
标签噪声表示的调查:过去,现在和未来
A Survey of Label-noise Representation Learning: Past, Present and Future
论文作者
论文摘要
经典的机器学习隐含地假设培训数据的标签是从干净分布中取样的,对于实际情况来说,这太过限制了。但是,基于统计学习的方法可能不会使用这些嘈杂的标签来牢固地训练深度学习模型。因此,迫切需要设计标签噪声表示学习(LNRL)方法,以训练具有嘈杂标签的深层模型。为了充分了解LNRL,我们进行了一项调查研究。我们首先从机器学习的角度阐明了LNRL的正式定义。然后,通过学习理论和经验研究的视角,我们找出为什么嘈杂的标签会影响深层模型的性能。根据理论指导,我们将不同的LNRL方法分为三个方向。根据这种统一的分类法,我们对不同类别的利弊进行了详尽的讨论。更重要的是,我们总结了可靠的LNRL的基本组成部分,这可以引发新的方向。最后,我们提出了LNRL内的可能的研究方向,例如新数据集,实例依赖性LNRL和对抗性LNRL。我们还设想了LNRL以外的潜在方向,例如使用特征 - 噪声,偏好,域中,域中,相似性 - 噪声,图形 - 噪声和示范示威。
Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios. However, statistical-learning-based methods may not train deep learning models robustly with these noisy labels. Therefore, it is urgent to design Label-Noise Representation Learning (LNRL) methods for robustly training deep models with noisy labels. To fully understand LNRL, we conduct a survey study. We first clarify a formal definition for LNRL from the perspective of machine learning. Then, via the lens of learning theory and empirical study, we figure out why noisy labels affect deep models' performance. Based on the theoretical guidance, we categorize different LNRL methods into three directions. Under this unified taxonomy, we provide a thorough discussion of the pros and cons of different categories. More importantly, we summarize the essential components of robust LNRL, which can spark new directions. Lastly, we propose possible research directions within LNRL, such as new datasets, instance-dependent LNRL, and adversarial LNRL. We also envision potential directions beyond LNRL, such as learning with feature-noise, preference-noise, domain-noise, similarity-noise, graph-noise and demonstration-noise.