论文标题

预测短期语言影响的长期引用

Predicting Long-Term Citations from Short-Term Linguistic Influence

论文作者

Soni, Sandeep, Bamman, David, Eisenstein, Jacob

论文摘要

研究论文影响的标准度量是引用的次数。但是,可能出于多种原因引用论文,并且引用数量提供了有关论文影响后续出版物内容程度的有限信息。因此,我们提出了一种新颖的方法来量化时间戳文档集合中的语言影响。有两个主要步骤:首先,使用上下文嵌入和单词频率识别词汇和语义变化;其次,通过估计具有低级别参数矩阵的高维霍克斯过程,有关这些变化的这些变化的汇总信息会影响得分。我们表明,这种语言影响的措施预测了$ \ textit {Future} $引文:在论文发表后两年的语言影响估计与随后三年中的引用数量相关联并预测。与包括最初引用计数,主题和词汇特征的预测指标相比,使用增量时间训练/测试分裂的在线评估证明了这一点。

A standard measure of the influence of a research paper is the number of times it is cited. However, papers may be cited for many reasons, and citation count offers limited information about the extent to which a paper affected the content of subsequent publications. We therefore propose a novel method to quantify linguistic influence in timestamped document collections. There are two main steps: first, identify lexical and semantic changes using contextual embeddings and word frequencies; second, aggregate information about these changes into per-document influence scores by estimating a high-dimensional Hawkes process with a low-rank parameter matrix. We show that this measure of linguistic influence is predictive of $\textit{future}$ citations: the estimate of linguistic influence from the two years after a paper's publication is correlated with and predictive of its citation count in the following three years. This is demonstrated using an online evaluation with incremental temporal training/test splits, in comparison with a strong baseline that includes predictors for initial citation counts, topics, and lexical features.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源