论文标题
零声主题生成
Zero-shot topic generation
论文作者
论文摘要
我们提出了一种使用仅针对文档标题生成的模型来生成主题的方法,其培训期间给出了零示例。我们利用捕获候选人跨度在文档中的相关性的功能,以生成该文档的标题。输出是短语的加权集合,与描述文档并在语料库中区分文档最相关,而无需访问其余的语料库。我们进行了一项双盲试验,其中人类注释者评分了机器生成的主题的质量以及与《卫报》和《赫芬顿邮报》的新闻文章相关的原始人工写作的主题。结果表明,我们的零拍模型生成了与人类所判断的人类所撰写的新闻文档的主题标签。
We present an approach to generating topics using a model trained only for document title generation, with zero examples of topics given during training. We leverage features that capture the relevance of a candidate span in a document for the generation of a title for that document. The output is a weighted collection of the phrases that are most relevant for describing the document and distinguishing it within a corpus, without requiring access to the rest of the corpus. We conducted a double-blind trial in which human annotators scored the quality of our machine-generated topics along with original human-written topics associated with news articles from The Guardian and The Huffington Post. The results show that our zero-shot model generates topic labels for news documents that are on average equal to or higher quality than those written by humans, as judged by humans.