论文标题

TSTR:太短而无法代表,总结详细信息!介绍引导的扩展摘要世代

TSTR: Too Short to Represent, Summarize with Details! Intro-Guided Extended Summary Generation

论文作者

Sotudeh, Sajad, Goharian, Nazli

论文摘要

许多科学论文(例如Arxiv和PubMed数据收集)的摘要,其长度为50-1000个单词,平均长度约为200个单词,较长的摘要通常会传达有关源文件的更多信息。直到最近,科学摘要研究通常集中于生成用于科学摘要的现有数据集之后的简短,类似抽象的摘要。在源文本相对较长的域中,例如在科学文档中,此类摘要无法超越一般和粗略的概述,并从源文档中提供显着信息。解决这一问题的最新兴趣是促使科学数据集(Arxiv-Long和PubMed-Long)的策划,其中包含400-600个单词的人写的摘要,因此为生成长/扩展的摘要提供了研究场所。扩展的摘要促进了更快的阅读速度,同时提供了粗略信息以外的详细信息。在本文中,我们提出了TSTR,这是一种提取性摘要,将文档的介绍性信息用作其显着信息的指示。对两个现有的大规模扩展摘要数据集的评估表明,与强大的基线和最新的基线相比,相比之下,胭脂和平均胭脂(F1)得分(F1)得分(F1)得分具有统计学意义的显着改善。全面的人类评估有利于我们在凝聚力和完整性方面产生的扩展摘要。

Many scientific papers such as those in arXiv and PubMed data collections have abstracts with varying lengths of 50-1000 words and average length of approximately 200 words, where longer abstracts typically convey more information about the source paper. Up to recently, scientific summarization research has typically focused on generating short, abstract-like summaries following the existing datasets used for scientific summarization. In domains where the source text is relatively long-form, such as in scientific documents, such summary is not able to go beyond the general and coarse overview and provide salient information from the source document. The recent interest to tackle this problem motivated curation of scientific datasets, arXiv-Long and PubMed-Long, containing human-written summaries of 400-600 words, hence, providing a venue for research in generating long/extended summaries. Extended summaries facilitate a faster read while providing details beyond coarse information. In this paper, we propose TSTR, an extractive summarizer that utilizes the introductory information of documents as pointers to their salient information. The evaluations on two existing large-scale extended summarization datasets indicate statistically significant improvement in terms of Rouge and average Rouge (F1) scores (except in one case) as compared to strong baselines and state-of-the-art. Comprehensive human evaluations favor our generated extended summaries in terms of cohesion and completeness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源