伯特（Bert）作为老师：序列级别奖励的上下文嵌入

论文标题

伯特（Bert）作为老师：序列级别奖励的上下文嵌入

BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward

论文作者

Schmidt, Florian, Hofmann, Thomas

论文摘要

在许多学习框架中，测量与一组参考的生成序列的质量是一个核心问题，无论是计算得分，分配奖励还是执行歧视。尽管模型体系结构取得了长足的进步，但独立于参考数量扩展的指标仍基于N-Gram估计值。我们表明，可以将基本操作，计数单词和比较计数抬起，以嵌入单词和比较嵌入。对BERT嵌入的深入分析表明，可以使用上下文嵌入来捕获所需的依赖项，同时通过适当的修剪和平滑技术维持必要的可扩展性。我们将无条件的生成作为加强学习问题，并表明我们的奖励功能确实提供了比N-gram奖励在这种挑战性的环境中更有效的学习信号。

Measuring the quality of a generated sequence against a set of references is a central problem in many learning frameworks, be it to compute a score, to assign a reward, or to perform discrimination. Despite great advances in model architectures, metrics that scale independently of the number of references are still based on n-gram estimates. We show that the underlying operations, counting words and comparing counts, can be lifted to embedding words and comparing embeddings. An in-depth analysis of BERT embeddings shows empirically that contextual embeddings can be employed to capture the required dependencies while maintaining the necessary scalability through appropriate pruning and smoothing techniques. We cast unconditional generation as a reinforcement learning problem and show that our reward function indeed provides a more effective learning signal than n-gram reward in this challenging setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题