论文标题
未知的探索机器学习
Exploratory Machine Learning with Unknown Unknowns
论文作者
论文摘要
在传统的监督学习中,通过来自已知标签集的地面真实标签给出了培训数据集,并且学识渊博的模型将对已知标签进行分类。本文研究了一个新的问题设置,其中训练数据中有未知类别被误解为其他标签,因此从给定的监督中似乎未知它们的存在。我们将未知的未知数归因于以下事实:由于功能信息不足,因此未完全感知的标签空间不完全建议训练数据集。为此,我们提出了探索机器学习,该探索机器学习通过积极扩大特征空间来发现潜在隐藏的类,从而检查和研究培训数据。我们的方法包括三种成分,包括排斥模型,特征探索和模型级联。我们提供理论分析以证明其优势合理,并验证合成和真实数据集的有效性。
In conventional supervised learning, a training dataset is given with ground-truth labels from a known label set, and the learned model will classify unseen instances to known labels. This paper studies a new problem setting in which there are unknown classes in the training data misperceived as other labels, and thus their existence appears unknown from the given supervision. We attribute the unknown unknowns to the fact that the training dataset is badly advised by the incompletely perceived label space due to the insufficient feature information. To this end, we propose the exploratory machine learning, which examines and investigates training data by actively augmenting the feature space to discover potentially hidden classes. Our method consists of three ingredients including rejection model, feature exploration, and model cascade. We provide theoretical analysis to justify its superiority, and validate the effectiveness on both synthetic and real datasets.