神经释义产生粒度的连续分解

论文标题

神经释义产生粒度的连续分解

Continuous Decomposition of Granularity for Neural Paraphrase Generation

论文作者

Gu, Xiaodong, Zhang, Zhaowei, Lee, Sang-Woo, Yoo, Kang Min, Ha, Jung-Woo

论文摘要

尽管变形金刚在段落的生成中取得了巨大的成功，但它们将句子视为令牌的线性序列，并且经常忽略其层次结构信息。先前的工作表明，分解粒度的水平（例如，单词，短语或句子）用于输入令牌已产生实质性改进，这表明有可能通过更细粒度的粒度建模来增强变形金刚。在这项工作中，我们提出了神经释义（C-DNPG）的粒度连续分解。为了有效地将粒度纳入编码句子中，C-DNPG引入了一种粒度意识到的注意（GA-注意）机制，该机制将多头自我注意力扩展到以下：1）一个粒度头，该粒度头部自动摄取句子的句子的层次结构，通过神经估算每个句子的粒度结构。 2）两个新的注意力面膜，即粒度共振和粒度范围，以有效地将粒度编码为注意力。在两个基准测试中进行的实验，包括Quora问题对和Twitter URL，表明C-DNPG的表现优于基线模型，而在许多指标方面，C-DNPG的基线模型优于基线模型，并实现了最先进的结果。定性分析表明，C-DNPG确实具有有效性捕获细粒度的颗粒状水平。

While Transformers have had significant success in paragraph generation, they treat sentences as linear sequences of tokens and often neglect their hierarchical information. Prior work has shown that decomposing the levels of granularity~(e.g., word, phrase, or sentence) for input tokens has produced substantial improvements, suggesting the possibility of enhancing Transformers via more fine-grained modeling of granularity. In this work, we propose a continuous decomposition of granularity for neural paraphrase generation (C-DNPG). In order to efficiently incorporate granularity into sentence encoding, C-DNPG introduces a granularity-aware attention (GA-Attention) mechanism which extends the multi-head self-attention with: 1) a granularity head that automatically infers the hierarchical structure of a sentence by neurally estimating the granularity level of each input token; and 2) two novel attention masks, namely, granularity resonance and granularity scope, to efficiently encode granularity into attention. Experiments on two benchmarks, including Quora question pairs and Twitter URLs have shown that C-DNPG outperforms baseline models by a remarkable margin and achieves state-of-the-art results in terms of many metrics. Qualitative analysis reveals that C-DNPG indeed captures fine-grained levels of granularity with effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题