论文标题

我应该参与吗?关于研究参与者的采矿软件存储库的隐私危险

Should I Get Involved? On the Privacy Perils of Mining Software Repositories for Research Participants

论文作者

Vidoni, Melina, Ferreyra, Nicolás E. Díaz

论文摘要

采矿软件存储库(MSRS)是一种基于证据的方法,可以交叉链接数据以发现有关软件系统的可行信息。软件工程中的经验研究通常会利用MSR技术,因为它们使研究人员能够揭示软件开发中的问题和缺陷,从而分析对它们的不同因素。因此,依靠有关正在开采的存储库和来源(例如服务器名称和贡献者的身份)的细粒度信息对于MSR研究的可重复性和透明度至关重要。但是,这也可能引入对参与者的隐私的威胁,因为他们的身份可能与缺陷/次优的编程实践有关(例如,代码气味,不当文档)或反之亦然。此外,这对于接近合作者和社区成员而导致的“有罪”可能是可扩展的。该立场论文旨在开始讨论有关间隔参与MSRS调查的讨论,关于共享非聚集数据的“隐私与实用性”的二分法及其对隐私限制的影响以及参与者参与的道德考虑。

Mining Software Repositories (MSRs) is an evidence-based methodology that cross-links data to uncover actionable information about software systems. Empirical studies in software engineering often leverage MSR techniques as they allow researchers to unveil issues and flaws in software development so as to analyse the different factors contributing to them. Hence, counting on fine-grained information about the repositories and sources being mined (e.g., server names, and contributors' identities) is essential for the reproducibility and transparency of MSR studies. However, this can also introduce threats to participants' privacy as their identities may be linked to flawed/sub-optimal programming practices (e.g., code smells, improper documentation), or vice-versa. Moreover, this can be extensible to close collaborators and community members resulting "guilty by association". This position paper aims to start a discussion about indirect participation in MSRs investigations, the dichotomy of 'privacy vs. utility' regarding sharing non-aggregated data, and its effects on privacy restrictions and ethical considerations for participant involvement.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源