论文标题

使用模棱两可的培训数据进行二进制分类

Binary classification with ambiguous training data

论文作者

Otani, Naoya, Otsubo, Yosuke, Koike, Tetsuya, Sugiyama, Masashi

论文摘要

在监督的学习中,我们经常面对模棱两可的(a)样本,这些样本即使是由领域专家也很难标记的样本。在本文中,我们考虑了在这种样品存在下的二元分类问题。由于未标记的样本不一定是困难的样本,因此此问题与半监督的学习有很大不同。同样,由于我们不想将测试样本分类到A类中,因此它不同于带有正(P),负(N)和类别的3类分类。我们提出的方法将二进制分类扩展到拒绝选项,该选项基于0-1- $ c $损失,同时使用P和N样本同时训练分类器和拒绝者,而拒绝费用为$ c $。更具体地说,我们建议使用p,n和a样品培训分类器和拒绝者,其中$ d $是模棱两可的样本的错误分类罚款。在我们的实际实施中,我们使用0-1- $ c $ -d $ d $损失的凸上上限,用于计算障碍。数值实验表明,我们的方法可以成功利用通过此类培训数据带来的其他信息。

In supervised learning, we often face with ambiguous (A) samples that are difficult to label even by domain experts. In this paper, we consider a binary classification problem in the presence of such A samples. This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples. Also, it is different from 3-class classification with the positive (P), negative (N), and A classes since we do not want to classify test samples into the A class. Our proposed method extends binary classification with reject option, which trains a classifier and a rejector simultaneously using P and N samples based on the 0-1-$c$ loss with rejection cost $c$. More specifically, we propose to train a classifier and a rejector under the 0-1-$c$-$d$ loss using P, N, and A samples, where $d$ is the misclassification penalty for ambiguous samples. In our practical implementation, we use a convex upper bound of the 0-1-$c$-$d$ loss for computational tractability. Numerical experiments demonstrate that our method can successfully utilize the additional information brought by such A training data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源