论文标题
seqmix:通过序列混音增强主动序列标记
SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup
论文作者
论文摘要
主动学习是低资源序列标记任务的重要技术。但是,当前的主动序列标记方法在每次迭代中仅使用查询样品,这是利用人类注释的一种低效方法。我们提出了一种简单但有效的数据增强方法,以提高主动序列标记的标签效率。我们的方法Seqmix只需通过在每种迭代中生成额外的标记序列来增强查询样品。关键困难是生成合理的序列以及令牌级标签。在Seqmix中,我们通过对查询样品的序列和令牌级标签进行混合来解决这一挑战。此外,我们在序列混合过程中设计了一个鉴别器,该判别判断生成的序列是否合理。我们对命名实体识别和事件检测任务的实验表明,SEQMIX可以将标准的活动序列标签方法提高$ 2.27 \%$ - $ 3.75 \%$ $ f_1 $分数。可以在https://github.com/rz-zhang/seqmix上找到SEQMIX的代码和数据
Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each iteration, which is an inefficient way of leveraging human annotations. We propose a simple but effective data augmentation method to improve the label efficiency of active sequence labeling. Our method, SeqMix, simply augments the queried samples by generating extra labeled sequences in each iteration. The key difficulty is to generate plausible sequences along with token-level labels. In SeqMix, we address this challenge by performing mixup for both sequences and token-level labels of the queried samples. Furthermore, we design a discriminator during sequence mixup, which judges whether the generated sequences are plausible or not. Our experiments on Named Entity Recognition and Event Detection tasks show that SeqMix can improve the standard active sequence labeling method by $2.27\%$--$3.75\%$ in terms of $F_1$ scores. The code and data for SeqMix can be found at https://github.com/rz-zhang/SeqMix