论文标题
深度积极学习,基于增强的一致性估算
Deep Active Learning with Augmentation-based Consistency Estimation
论文作者
论文摘要
在积极学习中,重点主要放在未标记数据的选择策略上,以增强下一个学习周期的概括能力。为此,已经提出了各种不确定性测量方法。另一方面,随着数据增强指标的出现,作为一般深度学习的正常化程序,我们注意到未标记的数据选择方法与主动学习场景中的基于数据增强的正规化技术之间可能会产生相互影响。通过各种实验,我们证实了分析学习理论的基于一致性的正则化可能会影响分类器与现有的不确定性测量方法的概括能力。通过这个事实,我们提出了一种通过将基于数据增强的技术应用于主动学习方案的方法来提高泛化能力。对于基于数据增强的正规化损失,我们将切割(CO)和CutMix(CM)策略重新定义为定量指标,并在模型培训和未标记的数据选择步骤中应用。我们已经表明,基于增强的正常化程序可以在主动学习的训练步骤中提高性能,而同样的方法可以有效地与到目前为止提出的不确定性测量指标相结合。我们使用了时尚界,CIFAR10,CIFAR100和STL10等数据集来验证针对多个图像分类任务的提出的主动学习技术的性能。我们的实验显示每个数据集和预算方案的性能一致。
In active learning, the focus is mainly on the selection strategy of unlabeled data for enhancing the generalization capability of the next learning cycle. For this, various uncertainty measurement methods have been proposed. On the other hand, with the advent of data augmentation metrics as the regularizer on general deep learning, we notice that there can be a mutual influence between the method of unlabeled data selection and the data augmentation-based regularization techniques in active learning scenarios. Through various experiments, we confirmed that consistency-based regularization from analytical learning theory could affect the generalization capability of the classifier in combination with the existing uncertainty measurement method. By this fact, we propose a methodology to improve generalization ability, by applying data augmentation-based techniques to an active learning scenario. For the data augmentation-based regularization loss, we redefined cutout (co) and cutmix (cm) strategies as quantitative metrics and applied at both model training and unlabeled data selection steps. We have shown that the augmentation-based regularizer can lead to improved performance on the training step of active learning, while that same approach can be effectively combined with the uncertainty measurement metrics proposed so far. We used datasets such as FashionMNIST, CIFAR10, CIFAR100, and STL10 to verify the performance of the proposed active learning technique for multiple image classification tasks. Our experiments show consistent performance gains for each dataset and budget scenario.