论文标题

部分可观测时空混沌系统的无模型预测

Semi-supervised binary classification with latent distance learning

论文作者

Kamal, Imam Mustafa, Bae, Hyerim

论文摘要

二元分类(BC)是一项实际任务,在现实世界中无处不在,例如在生物医学诊断中区分健康和不健康的物体,以及制造检查中有缺陷且无缺陷的产品。尽管如此,通常需要完全注释的数据才能有效解决此问题,而域专家的收集是一个乏味且昂贵的程序。与BC相反,已经设计了几种严重依赖随机数据增强技术的重要半监督学习技术来解决多级分类。在这项研究中,我们证明了随机数据增强技术不太适合解决典型的BC问题,因为它可以忽略严格区分正样本和负样本的关键特征。为了解决这个问题,我们提出了一种新的学习表示形式,以使用具有随机K-PAIR跨距离学习机制的一些标签来解决BC问题。首先,通过利用一些标记的样品,编码器网络可以分别了解角空间中正和负样品的投影,以最大化和最小化其阶层间和阶层距离。其次,分类器学会使用基于角空间和标记样品来解决BC任务生成的直觉标签来区分正面和负样品。使用四个现实世界公开可用的BC数据集进行了广泛的实验。该建议的方法很少,没有任何数据增强技术,因此优于最先进的半监督和自我监督的学习方法。此外,与完全监督的设置相比,我们的半监督分类器具有10%的标签,我们的半监督分类器可以获得竞争精度。

Binary classification (BC) is a practical task that is ubiquitous in real-world problems, such as distinguishing healthy and unhealthy objects in biomedical diagnostics and defective and non-defective products in manufacturing inspections. Nonetheless, fully annotated data are commonly required to effectively solve this problem, and their collection by domain experts is a tedious and expensive procedure. In contrast to BC, several significant semi-supervised learning techniques that heavily rely on stochastic data augmentation techniques have been devised for solving multi-class classification. In this study, we demonstrate that the stochastic data augmentation technique is less suitable for solving typical BC problems because it can omit crucial features that strictly distinguish between positive and negative samples. To address this issue, we propose a new learning representation to solve the BC problem using a few labels with a random k-pair cross-distance learning mechanism. First, by harnessing a few labeled samples, the encoder network learns the projection of positive and negative samples in angular spaces to maximize and minimize their inter-class and intra-class distances, respectively. Second, the classifier learns to discriminate between positive and negative samples using on-the-fly labels generated based on the angular space and labeled samples to solve BC tasks. Extensive experiments were conducted using four real-world publicly available BC datasets. With few labels and without any data augmentation techniques, the proposed method outperformed state-of-the-art semi-supervised and self-supervised learning methods. Moreover, with 10% labeling, our semi-supervised classifier could obtain competitive accuracy compared with a fully supervised setting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源