Csyngec：通过定制的面向GEC的解析器合并基于组成的语法，以进行语法误差校正

论文标题

Csyngec：通过定制的面向GEC的解析器合并基于组成的语法，以进行语法误差校正

CSynGEC: Incorporating Constituent-based Syntax for Grammatical Error Correction with a Tailored GEC-Oriented Parser

论文作者

Zhang, Yue, Li, Zhenghua

论文摘要

最近，张等人。（2022）提出了一种名为Syngec的语法感知语法误差校正（GEC）方法，表明将输入句子的基于量身定制的基于依赖关系的语法合并对GEC非常有益。这项工作考虑了另一种主流语法形式主义，即基于成分的语法。通过借鉴Syngec的成功体验，我们首先提出了一种扩展的基于构件的语法方案，以适应不语法句子中的错误。然后，我们通过使用并行的GEC数据作为枢轴来自动获得不语法句子的组成树来训练面向GEC的选区解析器。对于语法编码，我们采用图形卷积网络（GCN）。实验结果表明，我们的方法称为Cyngec，对强基础产生了重大改进。此外，我们通过两种方式研究了GEC的基于成分和基于依赖关系的语法的集成：1）模型内组合，这意味着使用单独的GCN编码两种语法，以在单个模型中解码； 2）模型间组合，这意味着收集和选择不同模型预测的编辑以实现最终校正。我们发现，以前的方法改善了使用一种独立语法形式主义的回忆，而后者则提高了精度，并且两者都会提高f0.5值。

Recently, Zhang et al. (2022) propose a syntax-aware grammatical error correction (GEC) approach, named SynGEC, showing that incorporating tailored dependency-based syntax of the input sentence is quite beneficial to GEC. This work considers another mainstream syntax formalism, i.e., constituent-based syntax. By drawing on the successful experience of SynGEC, we first propose an extended constituent-based syntax scheme to accommodate errors in ungrammatical sentences. Then, we automatically obtain constituency trees of ungrammatical sentences to train a GEC-oriented constituency parser by using parallel GEC data as a pivot. For syntax encoding, we employ the graph convolutional network (GCN). Experimental results show that our method, named CSynGEC, yields substantial improvements over strong baselines. Moreover, we investigate the integration of constituent-based and dependency-based syntax for GEC in two ways: 1) intra-model combination, which means using separate GCNs to encode both kinds of syntax for decoding in a single model; 2)inter-model combination, which means gathering and selecting edits predicted by different models to achieve final corrections. We find that the former method improves recall over using one standalone syntax formalism while the latter improves precision, and both lead to better F0.5 values.

下载PDF全文

下载文献需遵守相关版权规定

论文标题