使用模棱两可的培训数据进行二进制分类

论文标题

使用模棱两可的培训数据进行二进制分类

Binary classification with ambiguous training data

论文作者

Otani, Naoya, Otsubo, Yosuke, Koike, Tetsuya, Sugiyama, Masashi

论文摘要

在监督的学习中，我们经常面对模棱两可的（a）样本，这些样本即使是由领域专家也很难标记的样本。在本文中，我们考虑了在这种样品存在下的二元分类问题。由于未标记的样本不一定是困难的样本，因此此问题与半监督的学习有很大不同。同样，由于我们不想将测试样本分类到A类中，因此它不同于带有正（P），负（N）和类别的3类分类。我们提出的方法将二进制分类扩展到拒绝选项，该选项基于0-1- $ c $损失，同时使用P和N样本同时训练分类器和拒绝者，而拒绝费用为$ c $。更具体地说，我们建议使用p，n和a样品培训分类器和拒绝者，其中$ d $是模棱两可的样本的错误分类罚款。在我们的实际实施中，我们使用0-1- $ c $ -d $ d $损失的凸上上限，用于计算障碍。数值实验表明，我们的方法可以成功利用通过此类培训数据带来的其他信息。

In supervised learning, we often face with ambiguous (A) samples that are difficult to label even by domain experts. In this paper, we consider a binary classification problem in the presence of such A samples. This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples. Also, it is different from 3-class classification with the positive (P), negative (N), and A classes since we do not want to classify test samples into the A class. Our proposed method extends binary classification with reject option, which trains a classifier and a rejector simultaneously using P and N samples based on the 0-1-$c$ loss with rejection cost $c$. More specifically, we propose to train a classifier and a rejector under the 0-1-$c$-$d$ loss using P, N, and A samples, where $d$ is the misclassification penalty for ambiguous samples. In our practical implementation, we use a convex upper bound of the 0-1-$c$-$d$ loss for computational tractability. Numerical experiments demonstrate that our method can successfully utilize the additional information brought by such A training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题