论文标题
低预算标签通过域对齐执行查询
Low-Budget Label Query through Domain Alignment Enforcement
论文作者
论文摘要
深度学习革命发生了,这要归功于大量标记数据的可用性,这些数据有助于发展具有非凡推理能力的模型。尽管公众有大量数据集的可用性,但要解决特定要求,通常需要生成一组新的标记数据。通常,标签的生产成本很高,有时需要实现特定的专业知识。在这项工作中,我们解决了一个名为低预算标签查询的新问题,该问题包括向用户建议一组少量(低预算)的样本集,该样本是从一个完全未标记的数据集中标记的,其最终目标是最大程度地提高该数据集的分类精度。在这项工作中,我们首先改善了一种无监督的域适应性(UDA)方法,以使用一致性约束来更好地对齐源和目标域,从而在一些UDA任务上达到最新的状态。最后,使用先前训练的模型作为参考,我们提出了一种基于预测一致性分布的均匀采样的简单而有效的选择方法,这是确定性的,并且在各种各样的公开数据集中均超过其他基线以及竞争模型。
Deep learning revolution happened thanks to the availability of a massive amount of labelled data which have contributed to the development of models with extraordinary inference capabilities. Despite the public availability of a large quantity of datasets, to address specific requirements it is often necessary to generate a new set of labelled data. Quite often, the production of labels is costly and sometimes it requires specific know-how to be fulfilled. In this work, we tackle a new problem named low-budget label query that consists in suggesting to the user a small (low budget) set of samples to be labelled, from a completely unlabelled dataset, with the final goal of maximizing the classification accuracy on that dataset. In this work we first improve an Unsupervised Domain Adaptation (UDA) method to better align source and target domains using consistency constraints, reaching the state of the art on a few UDA tasks. Finally, using the previously trained model as reference, we propose a simple yet effective selection method based on uniform sampling of the prediction consistency distribution, which is deterministic and steadily outperforms other baselines as well as competing models on a large variety of publicly available datasets.