论文标题
使用二进制模型进行实时数据标记的主动学习
Active learning with binary models for real time data labelling
论文作者
论文摘要
机器学习(ML)和深度学习(DL)任务主要取决于数据。大多数ML和DL应用程序涉及需要标记数据的监督学习。在ML领域的最初阶段缺乏数据是一个问题,现在我们正处于大数据的新时代。监督的ML算法需要标记数据并具有良好的质量。标签任务需要大量的金钱和时间投资。数据标签需要一个熟练的人,该熟练的人会为此任务收取高额费用,考虑医疗领域的情况,或者数据庞大,需要很多人分配给它标记它的人。需要知道足够的数据量,不可能浪费金钱和时间来标记整个数据。本文主要旨在提出一种有助于实时标记数据的策略。对于标签的模型贡献,对家具类型和英特尔场景图像数据集的平衡分别为89和81.1。进一步,由于家具类型和花朵数据集的平衡贡献分别为83.47和78.71。
Machine learning (ML) and Deep Learning (DL) tasks primarily depend on data. Most of the ML and DL applications involve supervised learning which requires labelled data. In the initial phases of ML realm lack of data used to be a problem, now we are in a new era of big data. The supervised ML algorithms require data to be labelled and of good quality. Labelling task requires a large amount of money and time investment. Data labelling require a skilled person who will charge high for this task, consider the case of the medical field or the data is in bulk that requires a lot of people assigned to label it. The amount of data that is well enough for training needs to be known, money and time can not be wasted to label the whole data. This paper mainly aims to propose a strategy that helps in labelling the data along with oracle in real-time. With balancing on model contribution for labelling is 89 and 81.1 for furniture type and intel scene image data sets respectively. Further with balancing being kept off model contribution is found to be 83.47 and 78.71 for furniture type and flower data sets respectively.