论文标题
Hibrids:引起层次偏见的关注结构感知长期文档摘要
HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization
论文作者
论文摘要
文档结构对于有效的信息消耗至关重要。但是,将其有效地编码为现代变压器体系结构是一项挑战。在这项工作中,我们提出了hibrids,它将层次偏差注入了将文档结构纳入注意力评分的计算中。我们进一步介绍了一个新的任务,分层问题 - 苏格尔生成,以将源文档中的显着内容汇总到问题和摘要的层次结构中,其中每个后续问题都会查询其父母问题 - 苏格尔对的内容。我们还注释了一个新的数据集,其中有6,153个问题 - 苏格尔等级结构,该层次结构在长期政府报告中标记。实验结果表明,我们的模型比对层次结构质量和内容覆盖率的比较产生更好的问题 - 苏格尔等级结构,这一发现也由人类法官呼应。此外,我们的模型改善了由Rouge分数衡量的冗长政府报告和Wikipedia文章的长期摘要。
Document structure is critical for efficient information consumption. However, it is challenging to encode it efficiently into the modern Transformer architecture. In this work, we present HIBRIDS, which injects Hierarchical Biases foR Incorporating Document Structure into the calculation of attention scores. We further present a new task, hierarchical question-summary generation, for summarizing salient content in the source document into a hierarchy of questions and summaries, where each follow-up question inquires about the content of its parent question-summary pair. We also annotate a new dataset with 6,153 question-summary hierarchies labeled on long government reports. Experiment results show that our model produces better question-summary hierarchies than comparisons on both hierarchy quality and content coverage, a finding also echoed by human judges. Additionally, our model improves the generation of long-form summaries from lengthy government reports and Wikipedia articles, as measured by ROUGE scores.