密度匹配和建模的约束，以进行上下文化表示的跨语言对齐

论文标题

密度匹配和建模的约束，以进行上下文化表示的跨语言对齐

Constrained Density Matching and Modeling for Cross-lingual Alignment of Contextualized Representations

论文作者

Zhao, Wei, Eger, Steffen

论文摘要

通过单语言数据预先训练的多语言表示表现出跨语言的任务表现不平等。先前的研究通过资源密集的上下文对齐方式解决了这一挑战，该挑战假定大型并行数据的可用性，从而将代表性不足的语言社区留在后面。在这项工作中，我们将以前对齐技术的数据饥饿归因于两个局限性：（i）无法充分利用数据，并且（ii）这些技术未经适当的培训。为了解决这些问题，我们介绍了由正常流动的驱动，以执行对齐方式的驱动，介绍了名为Real-NVP和GAN-REAL-NVP的受监督和无监督密度的方法，既将多语言子空间的对齐方式分解为密度匹配和密度匹配和密度建模。我们通过验证标准对这些方法进行补充，以指导培训过程。我们的实验包括16个对齐，包括我们的方法，对6个语言对，合成数据和5个NLP任务进行了评估。我们在有限和没有并行数据的情况下证明了方法的有效性。首先，我们对20K平行数据（句子）培训的监督方法主要超过了对超过100K并行句子训练的联合对准和Infoxlm。其次，在将我们的无监督方法集成在引导程序中时，可以删除并行数据，而这是从理论上动机，以实施多语言子空间的平等。此外，我们证明了验证标准的优势，而不是指导监督培训的验证数据。

Multilingual representations pre-trained with monolingual data exhibit considerably unequal task performances across languages. Previous studies address this challenge with resource-intensive contextualized alignment, which assumes the availability of large parallel data, thereby leaving under-represented language communities behind. In this work, we attribute the data hungriness of previous alignment techniques to two limitations: (i) the inability to sufficiently leverage data and (ii) these techniques are not trained properly. To address these issues, we introduce supervised and unsupervised density-based approaches named Real-NVP and GAN-Real-NVP, driven by Normalizing Flow, to perform alignment, both dissecting the alignment of multilingual subspaces into density matching and density modeling. We complement these approaches with our validation criteria in order to guide the training process. Our experiments encompass 16 alignments, including our approaches, evaluated across 6 language pairs, synthetic data and 5 NLP tasks. We demonstrate the effectiveness of our approaches in the scenarios of limited and no parallel data. First, our supervised approach trained on 20k parallel data (sentences) mostly surpasses Joint-Align and InfoXLM trained on over 100k parallel sentences. Second, parallel data can be removed without sacrificing performance when integrating our unsupervised approach in our bootstrapping procedure, which is theoretically motivated to enforce equality of multilingual subspaces. Moreover, we demonstrate the advantages of validation criteria over validation data for guiding supervised training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题