论文标题
Calibrenet:用于多语言序列标签的校准网络
CalibreNet: Calibration Networks for Multilingual Sequence Labeling
论文作者
论文摘要
缺乏低资源语言中的培训数据给诸如命名实体识别(NER)和机器阅读理解(MRC)等任务标签的序列标签提出了巨大挑战。一个主要的障碍是预测答案边界的错误。为了解决这个问题,我们提出了Calibrenet,该问题将以两个步骤预测答案。在第一步中,任何现有的序列标签方法都可以作为生成初始答案的基本模型。在第二步中,Calibrenet完善了初始答案的边界。为了应对缺乏低资源语言培训数据的挑战,我们致力于开发一种新颖的无监督短语边界恢复预训练的任务,以增强Calibrenet的多语言边界检测能力。在两个跨语义基准数据集上进行的实验表明,所提出的方法在零射击的跨语性NER和MRC任务上实现了SOTA。
Lack of training data in low-resource languages presents huge challenges to sequence labeling tasks such as named entity recognition (NER) and machine reading comprehension (MRC). One major obstacle is the errors on the boundary of predicted answers. To tackle this problem, we propose CalibreNet, which predicts answers in two steps. In the first step, any existing sequence labeling method can be adopted as a base model to generate an initial answer. In the second step, CalibreNet refines the boundary of the initial answer. To tackle the challenge of lack of training data in low-resource languages, we dedicatedly develop a novel unsupervised phrase boundary recovery pre-training task to enhance the multilingual boundary detection capability of CalibreNet. Experiments on two cross-lingual benchmark datasets show that the proposed approach achieves SOTA results on zero-shot cross-lingual NER and MRC tasks.