在自学习中利用对抗性培训进行跨语义文本分类

论文标题

在自学习中利用对抗性培训进行跨语义文本分类

Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text Classification

论文作者

Dong, Xin, Zhu, Yaxin, Zhang, Yupeng, Fu, Zuohui, Xu, Dongkuan, Yang, Sen, de Melo, Gerard

论文摘要

在跨语性文本分类中，人们试图从一种语言中利用标记的数据来训练文本分类模型，然后将其应用于完全不同的语言。最近的多语言表示模型使实现这一目标变得更加容易。尽管如此，在这样做时忽略的语言之间仍然存在细微的差异。为了解决这个问题，我们提出了一个半监督的对抗训练过程，该过程最大程度地减少了标签保留输入扰动的最大损失。然后，最终的模型是教师，以诱导未标记的目标语言样本标签，这些样本可以在进一步的对抗训练中使用，从而使我们能够逐渐使模型适应目标语言。与许多强大的基线相比，我们观察到对各种语言的文档和意图分类的有效性显着提高。

In cross-lingual text classification, one seeks to exploit labeled data from one language to train a text classification model that can then be applied to a completely different language. Recent multilingual representation models have made it much easier to achieve this. Still, there may still be subtle differences between languages that are neglected when doing so. To address this, we present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations. The resulting model then serves as a teacher to induce labels for unlabeled target language samples that can be used during further adversarial training, allowing us to gradually adapt our model to the target language. Compared with a number of strong baselines, we observe significant gains in effectiveness on document and intent classification for a diverse set of languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题