类型驱动的多转弯校正语法误差校正

论文标题

类型驱动的多转弯校正语法误差校正

Type-Driven Multi-Turn Corrections for Grammatical Error Correction

论文作者

Lai, Shaopeng, Zhou, Qingyu, Zeng, Jiali, Li, Zhongli, Li, Chao, Cao, Yunbo, Su, Jinsong

论文摘要

语法误差校正（GEC）旨在自动检测和纠正语法错误。在这方面，在推断过程中进行多次校正，通过一读物学习对主导模型进行培训。先前的研究主要集中在数据增强方法上，以打击暴露偏见，这遭受了两个缺点。首先，他们只是将另外构建的培训实例和原始培训实例混合在一起，以训练模型，这将无助于模型明确意识到逐渐校正的过程。其次，他们忽略了不同类型的校正之间的相互依赖性。在本文中，我们为GEC提出了一种类型驱动的多转弯校正方法。使用这种方法，从每个培训实例中，我们还构建了多个培训实例，每种培训实例都涉及对特定类型错误的纠正。然后，我们使用这些额外构建的培训实例和原始培训实例依次训练模型。实验结果和深入分析表明，我们的方法显着受益于模型培训。特别是，我们的增强模型在英语GEC基准上实现了最先进的单模性能。我们在Github发布代码。

Grammatical Error Correction (GEC) aims to automatically detect and correct grammatical errors. In this aspect, dominant models are trained by one-iteration learning while performing multiple iterations of corrections during inference. Previous studies mainly focus on the data augmentation approach to combat the exposure bias, which suffers from two drawbacks. First, they simply mix additionally-constructed training instances and original ones to train models, which fails to help models be explicitly aware of the procedure of gradual corrections. Second, they ignore the interdependence between different types of corrections. In this paper, we propose a Type-Driven Multi-Turn Corrections approach for GEC. Using this approach, from each training instance, we additionally construct multiple training instances, each of which involves the correction of a specific type of errors. Then, we use these additionally-constructed training instances and the original one to train the model in turn. Experimental results and in-depth analysis show that our approach significantly benefits the model training. Particularly, our enhanced model achieves state-of-the-art single-model performance on English GEC benchmarks. We release our code at Github.

下载PDF全文

下载文献需遵守相关版权规定

论文标题