层次结构上下文标记，以重写话语

论文标题

层次结构上下文标记，以重写话语

Hierarchical Context Tagging for Utterance Rewriting

论文作者

Jin, Lisa, Song, Linfeng, Jin, Lifeng, Yu, Dong, Gildea, Daniel

论文摘要

话语重写旨在恢复核心发挥，并从最新的多转对话中省略信息。最近，在内部和室外重写设置中，标记而不是线性生成序列的方法已被证明更强。这是由于标记器的较小搜索空间所致，因为它只能从对话环境中复制令牌。但是，当必须将短语添加到源语言中时，单个上下文跨度不能涵盖这些方法时，这些方法可能会遭受较低的覆盖范围。这可能会以英语等语言发生，这些语言将令牌（例如介词）引入语法重写。我们提出了一个层次上下文标记器（HCT），通过预测插槽规则（例如，“ bastel_”）来减轻此问题，其插槽后来充满了上下文跨度。 HCT（i）使用令牌级的编辑操作和开槽规则标记源字符串，并且（ii）用对话上下文中的跨度填充了由此产生的规则插槽。此规则标记允许HCT一次添加外在的令牌和多个跨度。我们进一步集中了规则，以截断规则分布的长尾巴。几个基准测试的实验表明，HCT可以比2个BLEU点胜过最先进的重写系统。

Utterance rewriting aims to recover coreferences and omitted information from the latest turn of a multi-turn dialogue. Recently, methods that tag rather than linearly generate sequences have proven stronger in both in- and out-of-domain rewriting settings. This is due to a tagger's smaller search space as it can only copy tokens from the dialogue context. However, these methods may suffer from low coverage when phrases that must be added to a source utterance cannot be covered by a single context span. This can occur in languages like English that introduce tokens such as prepositions into the rewrite for grammaticality. We propose a hierarchical context tagger (HCT) that mitigates this issue by predicting slotted rules (e.g., "besides_") whose slots are later filled with context spans. HCT (i) tags the source string with token-level edit actions and slotted rules and (ii) fills in the resulting rule slots with spans from the dialogue context. This rule tagging allows HCT to add out-of-context tokens and multiple spans at once; we further cluster the rules to truncate the long tail of the rule distribution. Experiments on several benchmarks show that HCT can outperform state-of-the-art rewriting systems by ~2 BLEU points.

下载PDF全文

下载文献需遵守相关版权规定

论文标题