集合蒸馏方法进行语法误差校正

论文标题

集合蒸馏方法进行语法误差校正

Ensemble Distillation Approaches for Grammatical Error Correction

论文作者

Fathullah, Yassir, Gales, Mark, Malinin, Andrey

论文摘要

合奏方法是通过组合多个模型预测来改善系统的常用技术。另外，这些方案允许为预测得出不确定性以及不确定性的来源。不幸的是，这些好处以计算成本和内存成本。为了解决这个问题集合蒸馏（END）和最近的集合分布蒸馏（ENDD），已提出将集合压缩到单个模型中，分别代表集成平均预测或预测分布。本文研究了这两种蒸馏方法在序列预测任务，语法误差校正（GEC）中的应用。这是语言学习任务的重要应用领域，因为它可以为学习者提供非常有用的反馈。但是，它比研究蒸馏的标准任务更具挑战性，因为对单词的任何语法校正的预测都将在很大程度上取决于单词的输入序列和生成的输出历史记录。在公开可用的GEC任务以及口语任务上都评估了End和EndD的性能。

Ensemble approaches are commonly used techniques to improving a system by combining multiple model predictions. Additionally these schemes allow the uncertainty, as well as the source of the uncertainty, to be derived for the prediction. Unfortunately these benefits come at a computational and memory cost. To address this problem ensemble distillation (EnD) and more recently ensemble distribution distillation (EnDD) have been proposed that compress the ensemble into a single model, representing either the ensemble average prediction or prediction distribution respectively. This paper examines the application of both these distillation approaches to a sequence prediction task, grammatical error correction (GEC). This is an important application area for language learning tasks as it can yield highly useful feedback to the learner. It is, however, more challenging than the standard tasks investigated for distillation as the prediction of any grammatical correction to a word will be highly dependent on both the input sequence and the generated output history for the word. The performance of both EnD and EnDD are evaluated on both publicly available GEC tasks as well as a spoken language task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题