一种计算方法，用于测量人物的语义差异

论文标题

一种计算方法，用于测量人物的语义差异

A Computational Approach to Measuring the Semantic Divergence of Cognates

论文作者

Uban, Ana-Sabina, Ciobanu, Alina-Maria, Dinu, Liviu P.

论文摘要

意义是跨文化交流的基础。语言正在不断变化，而单词出于各种原因改变了它们的含义。相关语言的语义差异是历史语言学的关键问题。在本文中，我们通过测量多种语言的同源集的语义相似性来研究跨语言的语义差异。我们提出的方法基于跨语性单词嵌入。在本文中，我们对英语和五种浪漫语言进行了实施和评估我们的方法，但是它可以轻松地扩展到任何语言对，只需要大的单语言语料库来涉及语言，而这对小语言词典。这种语言敏锐的方法促进了对认知差异的定量分析 - 通过计算同源对之间的语义相似性，并为识别错误的朋友提供了见解。作为第二个贡献，我们制定了一种直接的方法来检测错误的朋友，并介绍了“软假朋友”和“硬虚假朋友”的概念，并衡量了假朋友对的“虚假”程度。此外，我们提出了一种算法，该算法可以输出纠正虚假朋友的建议，这可能会导致一种非常有用的语言学习或翻译工具。

Meaning is the foundation stone of intercultural communication. Languages are continuously changing, and words shift their meanings for various reasons. Semantic divergence in related languages is a key concern of historical linguistics. In this paper we investigate semantic divergence across languages by measuring the semantic similarity of cognate sets in multiple languages. The method that we propose is based on cross-lingual word embeddings. In this paper we implement and evaluate our method on English and five Romance languages, but it can be extended easily to any language pair, requiring only large monolingual corpora for the involved languages and a small bilingual dictionary for the pair. This language-agnostic method facilitates a quantitative analysis of cognates divergence -- by computing degrees of semantic similarity between cognate pairs -- and provides insights for identifying false friends. As a second contribution, we formulate a straightforward method for detecting false friends, and introduce the notion of "soft false friend" and "hard false friend", as well as a measure of the degree of "falseness" of a false friends pair. Additionally, we propose an algorithm that can output suggestions for correcting false friends, which could result in a very helpful tool for language learning or translation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题