论文标题
有效的开源代码的先前出版物标识
Efficient Prior Publication Identification for Open Source Code
论文作者
论文摘要
免费/开源软件(FOSS)实现了先前存在的软件组件的大规模重复使用。主要缺点是软件供应链管理中的复杂性增加。一种常见的这种复杂性方法是自动开源依从性,它包括自动遵守有关各种开源管理的最佳实践的验证,内容涉及许可义务填充,脆弱性跟踪,软件组成分析以及附近的问题。我们考虑审核源代码基础的问题,以确定其在哪些零件之前已发布了哪些零件,该零件已发布了哪些重要的自动组成部分,自动组成的构建量是由自动组成的构建组合的。 Indeed, if source code allegedly developed in house is recognized as having been previously published elsewhere, alerts should be raised to investigate where it comes from and whether this entails that additional obligations shall be fullled before product shipment.We propose an ecient approach for prior publication identication that relies on a knowledge base of known source code artifacts linked together in a global Merkle direct acyclic graph and a dedicated discovery protocol.我们介绍了SWH-SCANNER,这是一种源代码扫描仪,它在实践中使用AS Knowledge Base Software Heritage(最大的公共源代码文物档案)实现了建议的方法。我们通过实验验证了所提出的方法,以抽象(查询数)和具体项(壁挂时间)的效率表明其效率,并在16 845实际尺寸的现实公共代码基础上进行基准,从小到很大。
Free/Open Source Software (FOSS) enables large-scale reuse of preexisting software components. The main drawback is increased complexity in software supply chain management. A common approach to tame such complexity is automated open source compliance, which consists in automating the verication of adherence to various open source management best practices about license obligation fulllment, vulnerability tracking, software composition analysis, and nearby concerns.We consider the problem of auditing a source code base to determine which of its parts have been published before, which is an important building block of automated open source compliance toolchains. Indeed, if source code allegedly developed in house is recognized as having been previously published elsewhere, alerts should be raised to investigate where it comes from and whether this entails that additional obligations shall be fullled before product shipment.We propose an ecient approach for prior publication identication that relies on a knowledge base of known source code artifacts linked together in a global Merkle direct acyclic graph and a dedicated discovery protocol. We introduce swh-scanner, a source code scanner that realizes the proposed approach in practice using as knowledge base Software Heritage, the largest public archive of source code artifacts. We validate experimentally the proposed approach, showing its eciency in both abstract (number of queries) and concrete terms (wall-clock time), performing benchmarks on 16 845 real-world public code bases of various sizes, from small to very large.