论文标题

无监督的形态范式完成

Unsupervised Morphological Paradigm Completion

论文作者

Jin, Huiming, Cai, Liwei, Peng, Yihui, Xia, Chen, McCarthy, Arya D., Kann, Katharina

论文摘要

我们提出了无监督的形态范式完成的任务。只有原始文本和引理清单,该任务包括产生形态学范式,即所有弯曲形式的引理形式。从自然语言处理(NLP)的角度来看,这是一项具有挑战性的无监督任务,高性能的系统有可能改善低资源语言或协助语言注释者的工具。从认知科学的角度来看,这可以阐明儿童如何获得形态学知识。我们进一步介绍了一个用于该任务的系统,该系统通过以下步骤生成形态范式:(i)编辑树检索,(ii)其他引理检索,(iii)范式尺寸发现和(iv)拐点产生。我们对14种类型上的语言进行评估。我们的系统的表现可以轻松优于微不足道的基线,甚至对于某些语言,甚至比最小监督的系统获得了更高的准确性。

We propose the task of unsupervised morphological paradigm completion. Given only raw text and a lemma list, the task consists of generating the morphological paradigms, i.e., all inflected forms, of the lemmas. From a natural language processing (NLP) perspective, this is a challenging unsupervised task, and high-performing systems have the potential to improve tools for low-resource languages or to assist linguistic annotators. From a cognitive science perspective, this can shed light on how children acquire morphological knowledge. We further introduce a system for the task, which generates morphological paradigms via the following steps: (i) EDIT TREE retrieval, (ii) additional lemma retrieval, (iii) paradigm size discovery, and (iv) inflection generation. We perform an evaluation on 14 typologically diverse languages. Our system outperforms trivial baselines with ease and, for some languages, even obtains a higher accuracy than minimally supervised systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源