论文标题

统计学家的共同引文和共同作者网络

Co-citation and Co-authorship Networks of Statisticians

论文作者

Ji, Pengsheng, Jin, Jiashun, Ke, Zheng Tracy, Li, Wanshan

论文摘要

我们收集并清洁了有关统计出版物的大型数据集。数据集由合着者的关系和引文关系组成83,331篇论文,该论文在统计,概率和机器学习方面的36个代表性期刊上发表,涵盖了41年。数据集使我们能够构建许多不同的网络,并激发了有关统计界的研究模式和趋势,研究影响以及网络拓扑的许多研究问题。在本文中,我们关注(i)使用引文关系来估计作者的研究兴趣,以及(ii)使用合着者关系来研究网络拓扑。 使用我们构建的共同引文网络,我们发现了一个“统计三角形”,让人想起统计哲学三角形(Efron,1998)。我们提出了构建统计学家的“研究图”的新方法,以及给定作者的“研究轨迹”,以可视化他/她的研究兴趣发展。使用我们构建的共同创作网络,我们发现了一棵多层社区树,并制作了Sankey图来可视化作者在不同的子区域中的迁移。我们还提出了几个针对个人作者研究多样性的新指标。 我们发现“贝叶斯”,“生物统计学”和“非参数”是统计中的三个主要领域。我们还确定了15个次级分会,每个分会都可以看作是主要领域的加权平均水平,并确定了共同授权社区形成的几个根本原因。我们还发现,统计学家的研究兴趣在我们研究的41年窗口中已经显着发展:某些领域(例如,生物统计学,高维数据分析等)变得越来越流行。

We collected and cleaned a large data set on publications in statistics. The data set consists of the coauthor relationships and citation relationships of 83, 331 papers published in 36 representative journals in statistics, probability, and machine learning, spanning 41 years. The data set allows us to construct many different networks, and motivates a number of research problems about the research patterns and trends, research impacts, and network topology of the statistics community. In this paper we focus on (i) using the citation relationships to estimate the research interests of authors, and (ii) using the coauthor relationships to study the network topology. Using co-citation networks we constructed, we discover a "statistics triangle", reminiscent of the statistical philosophy triangle (Efron, 1998). We propose new approaches to constructing the "research map" of statisticians, as well as the "research trajectory" for a given author to visualize his/her research interest evolvement. Using co-authorship networks we constructed, we discover a multi-layer community tree and produce a Sankey diagram to visualize the author migrations in different sub-areas. We also propose several new metrics for research diversity of individual authors. We find that "Bayes", "Biostatistics", and "Nonparametric" are three primary areas in statistics. We also identify 15 sub-areas, each of which can be viewed as a weighted average of the primary areas, and identify several underlying reasons for the formation of co-authorship communities. We also find that the research interests of statisticians have evolved significantly in the 41-year time window we studied: some areas (e.g., biostatistics, high-dimensional data analysis, etc.) have become increasingly more popular.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源