朝着零击的多语言综合问题和回答跨语性阅读理解的回答

论文标题

朝着零击的多语言综合问题和回答跨语性阅读理解的回答

Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension

论文作者

Shakeri, Siamak, Constant, Noah, Kale, Mihir Sanjay, Xue, Linting

论文摘要

我们提出了一种简单的方法，可以通过使用单个生成模型来大规模地生成多语言问题和答案对。这些合成样本可用于提高目标语言中多语言QA模型的零击性能。我们提出的对生成模型的多任务培训仅需要英文标记的培训样本，从而消除了目标语言中对此类样本的需求，从而使其适用于具有标签数据的语言。人类评估表明，大多数此类样本在语法上是正确且明智的。实验结果表明，我们提出的方法可以在Xquad数据集上获得巨大的收益，从而减少了零射击与各种语言上较小QA模型的监督性能之间的差距。

We propose a simple method to generate multilingual question and answer pairs on a large scale through the use of a single generative model. These synthetic samples can be used to improve the zero-shot performance of multilingual QA models on target languages. Our proposed multi-task training of the generative model only requires the labeled training samples in English, thus removing the need for such samples in the target languages, making it applicable to far more languages than those with labeled data. Human evaluations indicate the majority of such samples are grammatically correct and sensible. Experimental results show our proposed approach can achieve large gains on the XQuAD dataset, reducing the gap between zero-shot and supervised performance of smaller QA models on various languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题