Wikipedia文章的跨语性扩展名称命名实体分类

论文标题

Wikipedia文章的跨语性扩展名称命名实体分类

Cross-lingual Extended Named Entity Classification of Wikipedia Articles

论文作者

Bui, The Viet, Le-Hong, Phuong

论文摘要

FPT.AI团队参加了NTCIR-15 Shinra任务的Shinra2020-ML子任务。本文描述了我们解决问题并讨论官方结果的方法。我们的方法着重于在单词级别和文档级别上学习跨语言表示，用于页面分类。我们提出了一种三阶段的方法，包括多语言模型预训练，单语模型微调和跨语性投票。我们的系统能够获得30种语言中25种的最佳分数；它与其他五种语言的最佳性能系统的准确性差距相对较小。

The FPT.AI team participated in the SHINRA2020-ML subtask of the NTCIR-15 SHINRA task. This paper describes our method to solving the problem and discusses the official results. Our method focuses on learning cross-lingual representations, both on the word level and document level for page classification. We propose a three-stage approach including multilingual model pre-training, monolingual model fine-tuning and cross-lingual voting. Our system is able to achieve the best scores for 25 out of 30 languages; and its accuracy gaps to the best performing systems of the other five languages are relatively small.

下载PDF全文

下载文献需遵守相关版权规定

论文标题