论文标题

卫士 - 语法错误校正:标签,不重写

GECToR -- Grammatical Error Correction: Tag, Not Rewrite

论文作者

Omelianchuk, Kostiantyn, Atrasevych, Vitaliy, Chernodub, Artem, Skurzhanskyi, Oleksandr

论文摘要

在本文中,我们使用变压器编码器提出了一个简单有效的GEC序列标记器。我们的系统已在合成数据上进行了预训练,然后在两个阶段进行了微调:首先是错误的语料库,其次是错误和无错误的并行语料库的组合。我们设计自定义令牌级别的转换,以将输入令牌映射到目标校正。我们最好的单模型/合奏GEC Tagger在CONLL-2014(test)上实现了65.3/66.5的$ f_ {0.5} $,在Bea-2019上的72.4/73.6 $ f_ {0.5} $(test)。它的推理速度的快速速度是基于变压器的SEQ2SEQ GEC系统的10倍。代码和训练有素的模型已公开可用。

In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an $F_{0.5}$ of 65.3/66.5 on CoNLL-2014 (test) and $F_{0.5}$ of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system. The code and trained models are publicly available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源