论文标题

低资源语言的形态处理:我们在哪里,下一步是什么

Morphological Processing of Low-Resource Languages: Where We Are and What's Next

论文作者

Wiemerslage, Adam, Silfverberg, Miikka, Yang, Changbing, McCarthy, Arya D., Nicolai, Garrett, Colunga, Eliana, Kann, Katharina

论文摘要

自动形态学处理可以帮助下游自然语言处理应用程序,尤其是对于低资源语言,并协助濒危语言的语言文档工作。长期以来一直是多语言的计算形态领域,越来越多地朝着适用于具有最小或没有注释资源的语言的方法。首先,我们调查了计算形态学的最新发展,重点是低资源语言。其次,我们认为该领域已准备好应对逻辑下一个挑战:仅凭原始文本了解语言的形态。我们对范式完成任务的真正无监督的版本进行了一项实证研究,并表明,尽管现有的最新模型由我们设计合理地设计的两个新提出的模型桥接,但仍有很大的改进空间可以进行改进。赌注很高:解决这项任务将使形态资源的语言覆盖范围通过许多幅度。

Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the field of computational morphology is increasingly moving towards approaches suitable for languages with minimal or no annotated resources. First, we survey recent developments in computational morphology with a focus on low-resource languages. Second, we argue that the field is ready to tackle the logical next challenge: understanding a language's morphology from raw text alone. We perform an empirical study on a truly unsupervised version of the paradigm completion task and show that, while existing state-of-the-art models bridged by two newly proposed models we devise perform reasonably, there is still much room for improvement. The stakes are high: solving this task will increase the language coverage of morphological resources by a number of magnitudes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源