论文标题
开源Python项目生存分析的两种方法
Two Approaches to Survival Analysis of Open Source Python Projects
论文作者
论文摘要
最近的一项研究将频繁的生存分析方法应用于软件遗产图的一个子集,并确定OSS项目的哪些属性有助于其健康。本文是该研究的精确复制。此外,将贝叶斯生存分析方法应用于同一数据集,并研究了其他项目属性作为概念复制。两种分析都集中在某些属性对通过其修订活动衡量的开源软件项目存活的影响。每个项目属性都使用了诸如Kaplan-Meier估计量,Cox比例模型模型以及后生存函数的可视化方法。结果表明,发布主要版本,在多个托管服务上拥有存储库,拥有大量开发人员的项目,从长远来看,频繁的修订可能具有更高的生存可能性。这些发现与原始研究相似。但是,更深入的外观揭示了定量不一致。
A recent study applied frequentist survival analysis methods to a subset of the Software Heritage Graph and determined which attributes of an OSS project contribute to its health. This paper serves as an exact replication of that study. In addition, Bayesian survival analysis methods were applied to the same dataset, and an additional project attribute was studied to serve as a conceptual replication. Both analyses focus on the effects of certain attributes on the survival of open-source software projects as measured by their revision activity. Methods such as the Kaplan-Meier estimator, Cox Proportional-Hazards model, and the visualization of posterior survival functions were used for each of the project attributes. The results show that projects which publish major releases, have repositories on multiple hosting services, possess a large team of developers, and make frequent revisions have a higher likelihood of survival in the long run. The findings were similar to the original study; however, a deeper look revealed quantitative inconsistencies.