论文标题
Scisummpip:无监督的科学论文摘要管道
SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline
论文作者
论文摘要
学术文档处理(SDP)研讨会是为了鼓励对科学任务的自然语言理解的更多努力。它包含三个共享任务,我们参加了朗姆姆共享任务。在本文中,我们描述了我们的文本摘要系统Scisummpip,灵感来自Summpip(Zhao等,2020),这是一个无监督的文本摘要系统,用于新闻领域中的多文章。我们的Scisummpip包括基于变压器的语言模型Scibert(Beltagy等,2019),用于上下文句子表示,使用Pagerank的内容选择(Page等,1999),句子图构造,具有深度和语言信息,句子图集群和绘画摘要。我们的工作与以前的方法选择不同,并且应用了摘要长度约束以适应科学领域。训练数据集和盲测数据集的实验结果显示了我们方法的有效性,并且我们经验验证了Scisummpip中使用BertScore中使用的模块的鲁棒性(Zhang等,2019a)。
The Scholarly Document Processing (SDP) workshop is to encourage more efforts on natural language understanding of scientific task. It contains three shared tasks and we participate in the LongSumm shared task. In this paper, we describe our text summarization system, SciSummPip, inspired by SummPip (Zhao et al., 2020) that is an unsupervised text summarization system for multi-document in news domain. Our SciSummPip includes a transformer-based language model SciBERT (Beltagy et al., 2019) for contextual sentence representation, content selection with PageRank (Page et al., 1999), sentence graph construction with both deep and linguistic information, sentence graph clustering and within-graph summary generation. Our work differs from previous method in that content selection and a summary length constraint is applied to adapt to the scientific domain. The experiment results on both training dataset and blind test dataset show the effectiveness of our method, and we empirically verify the robustness of modules used in SciSummPip with BERTScore (Zhang et al., 2019a).