对深度学习的比较调查

论文标题

对深度学习的比较调查

A Comparative Survey of Deep Active Learning

论文作者

Zhan, Xueying, Wang, Qingzhong, Huang, Kuan-hao, Xiong, Haoyi, Dou, Dejing, Chan, Antoni B.

论文摘要

虽然深度学习（DL）是渴望数据的，并且通常依靠广泛的标记数据来提供良好的性能，但主动学习（AL）通过从未标记的数据中选择一小部分样本进行标记和培训来降低标签成本。因此，近年来，在有限的标签成本/预算下，深入的积极学习（DAL）是可行的解决方案，以最大程度地提高模型性能。尽管已经开发了大量的DAL方法并进行了各种文献综述，但在公平比较设置下对DAL方法的性能评估尚未可用。我们的工作打算填补这一空白。在这项工作中，我们通过重新实现19种引用的DAL方法来构建DAL Toolkit，即Deepal+。我们调查和分类与DAL相关的作品，并构建经常使用的数据集和DAL算法的比较实验。此外，我们探讨了影响DAL功效的一些因素（例如批处理大小，训练过程中的时期数），这些因素为研究人员设计其DAL实验或执行DAL相关应用程序提供了更好的参考。

While deep learning (DL) is data-hungry and usually relies on extensive labeled data to deliver good performance, Active Learning (AL) reduces labeling costs by selecting a small proportion of samples from unlabeled data for labeling and training. Therefore, Deep Active Learning (DAL) has risen as a feasible solution for maximizing model performance under a limited labeling cost/budget in recent years. Although abundant methods of DAL have been developed and various literature reviews conducted, the performance evaluation of DAL methods under fair comparison settings is not yet available. Our work intends to fill this gap. In this work, We construct a DAL toolkit, DeepAL+, by re-implementing 19 highly-cited DAL methods. We survey and categorize DAL-related works and construct comparative experiments across frequently used datasets and DAL algorithms. Additionally, we explore some factors (e.g., batch size, number of epochs in the training process) that influence the efficacy of DAL, which provides better references for researchers to design their DAL experiments or carry out DAL-related applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题