从长尾嘈杂的数据中学习样本选择和均衡损失

论文标题

从长尾嘈杂的数据中学习样本选择和均衡损失

Learning from Long-Tailed Noisy Data with Sample Selection and Balanced Loss

论文作者

Zhang, Lefan, Tian, Zhang-Hao, Zhou, Wujun, Wang, Wei

论文摘要

深度学习的成功取决于大规模且精心策划的培训数据，而实际应用程序中的数据通常是长尾巴且嘈杂的。已经提出了许多方法来处理长尾数据或嘈杂的数据，而开发了一些方法来解决长尾噪声数据。为了解决这个问题，我们提出了一种可靠的方法，可以从长尾嘈杂的数据中学习，并进行样本选择和平衡的损失。具体而言，我们将嘈杂的训练数据分为清洁标记的集合和未标记的样本选择，并以半监督的方式训练深神网络，并基于模型偏差，并以平衡的损失。基准上的广泛实验表明，我们的方法优于现有的最新方法。

The success of deep learning depends on large-scale and well-curated training data, while data in real-world applications are commonly long-tailed and noisy. Many methods have been proposed to deal with long-tailed data or noisy data, while a few methods are developed to tackle long-tailed noisy data. To solve this, we propose a robust method for learning from long-tailed noisy data with sample selection and balanced loss. Specifically, we separate the noisy training data into clean labeled set and unlabeled set with sample selection, and train the deep neural network in a semi-supervised manner with a balanced loss based on model bias. Extensive experiments on benchmarks demonstrate that our method outperforms existing state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题