论文标题

通过基于关节功能的语音编辑距离发现词汇相似性

Discovering Lexical Similarity Through Articulatory Feature-based Phonetic Edit Distance

论文作者

Ahmed, Tafseer, Nizami, Muhammad Suffian, Khan, Muhammad Yaseen

论文摘要

两种语言之间的词汇相似性(LS)发现了许多有趣的语言见解,例如遗传关系,相互清晰度以及对另一种词汇的使用。有多种评估LS的方法。在同一方面,本文提出了一种语音编辑距离(PED)的方法,该方法使用与之相关的信号特征对字母进行了软比较。该系统将单词转换为相应的国际语音字母(IPA),然后将IPA转换为其一组发音特征。稍后,使用建议的方法比较了一组发音特征的列表。例如,PED将德语vater和波斯语单词pidar的编辑距离为0.82;同样,希伯来语单词shalom和阿拉伯语salaam为0.93,而在并置比较的情况下,基于IPA的编辑距离分别为4和2。实验使用六种语言(阿拉伯语,印地语,马拉地语,波斯语,梵语和乌尔都语)进行。在这方面,我们从通用依赖性语料库中提取了语音明智的单词列表的一部分,并评估了每对语言的LS。因此,通过提出的方法,尽管这些语言之间存在脚本差异和声音变化现象,但我们发现了遗传亲和力,相似性和借用/借贷。

Lexical Similarity (LS) between two languages uncovers many interesting linguistic insights such as genetic relationship, mutual intelligibility, and the usage of one's vocabulary into other. There are various methods through which LS is evaluated. In the same regard, this paper presents a method of Phonetic Edit Distance (PED) that uses a soft comparison of letters using the articulatory features associated with them. The system converts the words into the corresponding International Phonetic Alphabet (IPA), followed by the conversion of IPA into its set of articulatory features. Later, the lists of the set of articulatory features are compared using the proposed method. As an example, PED gives edit distance of German word vater and Persian word pidar as 0.82; and similarly, Hebrew word shalom and Arabic word salaam as 0.93, whereas for a juxtapose comparison, their IPA based edit distances are 4 and 2 respectively. Experiments are performed with six languages (Arabic, Hindi, Marathi, Persian, Sanskrit, and Urdu). In this regard, we extracted part of speech wise word-lists from the Universal Dependency corpora and evaluated the LS for every pair of language. Thus, with the proposed approach, we find the genetic affinity, similarity, and borrowing/loan-words despite having script differences and sound variation phenomena among these languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源