论文标题

开源Python项目生存分析的两种方法

Two Approaches to Survival Analysis of Open Source Python Projects

论文作者

Robinson, Derek, Enns, Keanelek, Koulecar, Neha, Sihag, Manish

论文摘要

最近的一项研究将频繁的生存分析方法应用于软件遗产图的一个子集,并确定OSS项目的哪些属性有助于其健康。本文是该研究的精确复制。此外,将贝叶斯生存分析方法应用于同一数据集,并研究了其他项目属性作为概念复制。两种分析都集中在某些属性对通过其修订活动衡量的开源软件项目存活的影响。每个项目属性都使用了诸如Kaplan-Meier估计量,Cox比例模型模型以及后生存函数的可视化方法。结果表明,发布主要版本,在多个托管服务上拥有存储库,拥有大量开发人员的项目,从长远来看,频繁的修订可能具有更高的生存可能性。这些发现与原始研究相似。但是,更深入的外观揭示了定量不一致。

A recent study applied frequentist survival analysis methods to a subset of the Software Heritage Graph and determined which attributes of an OSS project contribute to its health. This paper serves as an exact replication of that study. In addition, Bayesian survival analysis methods were applied to the same dataset, and an additional project attribute was studied to serve as a conceptual replication. Both analyses focus on the effects of certain attributes on the survival of open-source software projects as measured by their revision activity. Methods such as the Kaplan-Meier estimator, Cox Proportional-Hazards model, and the visualization of posterior survival functions were used for each of the project attributes. The results show that projects which publish major releases, have repositories on multiple hosting services, possess a large team of developers, and make frequent revisions have a higher likelihood of survival in the long run. The findings were similar to the original study; however, a deeper look revealed quantitative inconsistencies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源