论文标题

挖掘Wikipedia修订历史记录的自然校正和释义

Mining Naturally-occurring Corrections and Paraphrases from Wikipedia's Revision History

论文作者

Max, Aurélien, Wisniewski, Guillaume

论文摘要

语言现象的自然发生实例对于训练和评估文本上的自动过程都很重要。当大量提供语言研究的有趣材料时,它们还可以提供有趣的材料。在本文中,我们提出了一种由Wikipedia的修订历史记录(Wicopedia校正和释义语料库)建立的新资源,其中包含人类贡献者的大量编辑,包括各种更正和重写。我们讨论建立此类资源的主要动机,描述其建造方式,并在法语上提出初始应用。

Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic processes on text. When available in large quantities, they also prove interesting material for linguistic studies. In this article, we present a new resource built from Wikipedia's revision history, called WiCoPaCo (Wikipedia Correction and Paraphrase Corpus), which contains numerous editings by human contributors, including various corrections and rewritings. We discuss the main motivations for building such a resource, describe how it was built and present initial applications on French.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源