porquénãoluiliserallaspråk？与梯度优化的混合训练，几次跨语性转移

论文标题

porquénãoluiliserallaspråk？与梯度优化的混合训练，几次跨语性转移

Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer

论文作者

Xu, Haoran, Murray, Kenton

论文摘要

当前的几次跨语言转移学习的最新最新训练首先使用源语言的大量标记数据进行训练，然后用一些示例在目标语言上进行了一些示例，称为目标语言，称为目标语言。尽管已经证明这可以处理各种任务，但在本文中，我们显示了这种方法的一些缺陷，并提出了一种单步混合训练方法，该方法均以\ textIt {随机梯度手术}（随机梯度手术）对源数据进行训练，这是一种新颖的梯度级别优化。与以前在目标适应目标时侧重于一种语言的研究不同，我们使用一种模型同时处理所有目标语言，以避免过度语言的模型。此外，我们讨论了利用大型目标开发集进行模型选择的不现实。我们进一步表明，我们的方法既无开发目标语言，也能够摆脱过度拟合的问题。我们对多达48种语言的4个不同的NLP任务进行了大规模实验。我们提出的方法在所有任务上实现了最先进的表现，并且超过目标适应目标，尤其是对于与源语言相距甚远的语言，例如，NER任务平均而言，7.36％的F1绝对增益，在旁遮普邦（Punjabi）高达17.60％。

The current state-of-the-art for few-shot cross-lingual transfer learning first trains on abundant labeled data in the source language and then fine-tunes with a few examples on the target language, termed target-adapting. Though this has been demonstrated to work on a variety of tasks, in this paper we show some deficiencies of this approach and propose a one-step mixed training method that trains on both source and target data with \textit{stochastic gradient surgery}, a novel gradient-level optimization. Unlike the previous studies that focus on one language at a time when target-adapting, we use one model to handle all target languages simultaneously to avoid excessively language-specific models. Moreover, we discuss the unreality of utilizing large target development sets for model selection in previous literature. We further show that our method is both development-free for target languages, and is also able to escape from overfitting issues. We conduct a large-scale experiment on 4 diverse NLP tasks across up to 48 languages. Our proposed method achieves state-of-the-art performance on all tasks and outperforms target-adapting by a large margin, especially for languages that are linguistically distant from the source language, e.g., 7.36% F1 absolute gain on average for the NER task, up to 17.60% on Punjabi.

下载PDF全文

下载文献需遵守相关版权规定

论文标题