使用域流插值嵌入强大的无监督的跨语言词

论文标题

使用域流插值嵌入强大的无监督的跨语言词

Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow Interpolation

论文作者

Tang, Liping, Li, Zhen, Luo, Zhiquan, Meng, Helen

论文摘要

本文研究了一种无监督的方法，用于得出一个通用的跨语言嵌入空间，其中具有不同语言的类似语义的单词彼此接近。以前的对手方法已经显示出令人鼓舞的结果，可以诱导跨语性单词嵌入而无需并行数据。但是，训练阶段显示了遥远语言对的不稳定。我们建议没有将源语言空间直接映射到目标语言空间，而是建议使用一系列中间空间进行平滑的桥接。每个中间空间都可以视为伪语言空间，并通过简单的线性插值引入。该方法是按照计算机视觉中的域流进行建模的，但具有修改的目标函数。对固有双语词典诱导任务的实验表明，所提出的方法可以改善具有可比性甚至更好的精度的对抗模型的鲁棒性。关于跨语性自然语言推断的下游任务的进一步实验表明，与最先进的对抗性和非对抗模型相比，所提出的模型在下游任务中遥远的语言对的性能得到了显着改善。

This paper investigates an unsupervised approach towards deriving a universal, cross-lingual word embedding space, where words with similar semantics from different languages are close to one another. Previous adversarial approaches have shown promising results in inducing cross-lingual word embedding without parallel data. However, the training stage shows instability for distant language pairs. Instead of mapping the source language space directly to the target language space, we propose to make use of a sequence of intermediate spaces for smooth bridging. Each intermediate space may be conceived as a pseudo-language space and is introduced via simple linear interpolation. This approach is modeled after domain flow in computer vision, but with a modified objective function. Experiments on intrinsic Bilingual Dictionary Induction tasks show that the proposed approach can improve the robustness of adversarial models with comparable and even better precision. Further experiments on the downstream task of Cross-Lingual Natural Language Inference show that the proposed model achieves significant performance improvement for distant language pairs in downstream tasks compared to state-of-the-art adversarial and non-adversarial models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题