论文标题
层次结构上下文标记,以重写话语
Hierarchical Context Tagging for Utterance Rewriting
论文作者
论文摘要
话语重写旨在恢复核心发挥,并从最新的多转对话中省略信息。最近,在内部和室外重写设置中,标记而不是线性生成序列的方法已被证明更强。这是由于标记器的较小搜索空间所致,因为它只能从对话环境中复制令牌。但是,当必须将短语添加到源语言中时,单个上下文跨度不能涵盖这些方法时,这些方法可能会遭受较低的覆盖范围。这可能会以英语等语言发生,这些语言将令牌(例如介词)引入语法重写。我们提出了一个层次上下文标记器(HCT),通过预测插槽规则(例如,“ bastel_”)来减轻此问题,其插槽后来充满了上下文跨度。 HCT(i)使用令牌级的编辑操作和开槽规则标记源字符串,并且(ii)用对话上下文中的跨度填充了由此产生的规则插槽。此规则标记允许HCT一次添加外在的令牌和多个跨度。我们进一步集中了规则,以截断规则分布的长尾巴。几个基准测试的实验表明,HCT可以比2个BLEU点胜过最先进的重写系统。
Utterance rewriting aims to recover coreferences and omitted information from the latest turn of a multi-turn dialogue. Recently, methods that tag rather than linearly generate sequences have proven stronger in both in- and out-of-domain rewriting settings. This is due to a tagger's smaller search space as it can only copy tokens from the dialogue context. However, these methods may suffer from low coverage when phrases that must be added to a source utterance cannot be covered by a single context span. This can occur in languages like English that introduce tokens such as prepositions into the rewrite for grammaticality. We propose a hierarchical context tagger (HCT) that mitigates this issue by predicting slotted rules (e.g., "besides_") whose slots are later filled with context spans. HCT (i) tags the source string with token-level edit actions and slotted rules and (ii) fills in the resulting rule slots with spans from the dialogue context. This rule tagging allows HCT to add out-of-context tokens and multiple spans at once; we further cluster the rules to truncate the long tail of the rule distribution. Experiments on several benchmarks show that HCT can outperform state-of-the-art rewriting systems by ~2 BLEU points.