论文标题
更有效的软件存储库采矿
More Effective Software Repository Mining
论文作者
论文摘要
背景:公共GIT软件存储库的数据挖掘和分析是一个不断发展的研究领域。研究了研究单个项目或一组项目的研究工具,但尚不清楚在此类``便利样本''中获得的结果是否概括。目的:本文旨在阐明研究人员所面临的困难,他们想通过引入一个界面来确定其发现的普遍性,该界面解决了获得代表样本的问题。结果:为此,我们探讨了如何利用代码系统的世界,以使软件存储库采样和分析更容易访问。具体而言,我们提供了一种用于采矿软件存储库研究人员的资源,该资源旨在简化数据采样和检索工作流程,并通过此提高数据的有效性和完整性。结论:该系统有可能为研究人员提供一种大大减轻数据检索难度的资源,并解决了数据采样的许多当前站立问题。
Background: Data mining and analyzing of public Git software repositories is a growing research field. The tools used for studies that investigate a single project or a group of projects have been refined, but it is not clear whether the results obtained on such ``convenience samples'' generalize. Aims: This paper aims to elucidate the difficulties faced by researchers who would like to ascertain the generalizability of their findings by introducing an interface that addresses the issues with obtaining representative samples. Results: To do that we explore how to exploit the World of Code system to make software repository sampling and analysis much more accessible. Specifically, we present a resource for Mining Software Repository researchers that is intended to simplify data sampling and retrieval workflow and, through that, increase the validity and completeness of data. Conclusions: This system has the potential to provide researchers a resource that greatly eases the difficulty of data retrieval and addresses many of the currently standing issues with data sampling.