论文标题

文本文档的无监督语义句子排名方案

An Unsupervised Semantic Sentence Ranking Scheme for Text Documents

论文作者

Zhang, Hao, Wang, Jie

论文摘要

本文介绍了语义句子范围(SSR),这是一种根据其相对重要性在单个文档中自动对句子进行自动排名的方案。特别是,SSR从文本文档中提取基本单词和短语,并使用语义度量分别在短语和单词上构造语义短语图,以及句子上的语义句子图。它应用了文章结构偏见的Pagerank的两个变体来在第二个图上的第一个图和句子上对短语和单词进行评分。然后,它结合了这些分数以生成每个句子的最终分数。最后,SSR通过语义亚主题聚类解决了基于其最终分数和主题多样性对句子进行排名的多目标优化问题。提出了在二次时间内运行的SSR的实现,并且在Summbank基准上,它的表现优于每个法官的排名,并与所有法官的联合排名进行了比较。

This paper presents Semantic SentenceRank (SSR), an unsupervised scheme for automatically ranking sentences in a single document according to their relative importance. In particular, SSR extracts essential words and phrases from a text document, and uses semantic measures to construct, respectively, a semantic phrase graph over phrases and words, and a semantic sentence graph over sentences. It applies two variants of article-structure-biased PageRank to score phrases and words on the first graph and sentences on the second graph. It then combines these scores to generate the final score for each sentence. Finally, SSR solves a multi-objective optimization problem for ranking sentences based on their final scores and topic diversity through semantic subtopic clustering. An implementation of SSR that runs in quadratic time is presented, and it outperforms, on the SummBank benchmarks, each individual judge's ranking and compares favorably with the combined ranking of all judges.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源