论文标题
PREML:用于机器学习软件系统出处管理的分散平台
ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems
论文作者
论文摘要
基于大规模的机器学习(ML)软件系统越来越多地由位于不同信任域中的分布式团队开发。内幕威胁可以启动任何域中的攻击,以妥协ML资产(模型和数据集)。因此,从业人员需要有关如何以及由谁开发ML资产来评估其质量属性的信息,例如安全性,安全性和公平性。不幸的是,ML团队访问和重建ML资产的历史信息(ML出处)是一项挑战,因为它通常在分布式ML团队中分散,并受到攻击ML资产的对手的威胁。本文提出了一个分散的平台PREML,该平台利用区块链和智能合约授权分配的ML团队能够共同管理有关流通的ML资产的出处的单一真实来源,而无需依靠第三方,这很容易受到内部威胁的影响,并带来了单一的失败。我们提出了一种名为“ As-A-As-A-State-Machine”的新型建筑方法,以利用区块链交易和智能合约来管理ML出处信息,并引入用户驱动的出处捕获捕获机制,以将现有的脚本和工具集成到PREML,而无需损害参与者对他们的资产和工具的控制。我们通过在全球区块链上对概念验证系统进行基准测试,评估了起点的性能和间接费用。此外,我们根据分布式ML工作流的威胁模型评估了Proml的安全性。
Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fairness. Unfortunately, it is challenging for ML teams to access and reconstruct such historical information of ML assets (ML provenance) because it is generally fragmented across distributed ML teams and threatened by the same adversaries that attack ML assets. This paper proposes ProML, a decentralised platform that leverages blockchain and smart contracts to empower distributed ML teams to jointly manage a single source of truth about circulated ML assets' provenance without relying on a third party, which is vulnerable to insider threats and presents a single point of failure. We propose a novel architectural approach called Artefact-as-a-State-Machine to leverage blockchain transactions and smart contracts for managing ML provenance information and introduce a user-driven provenance capturing mechanism to integrate existing scripts and tools to ProML without compromising participants' control over their assets and toolchains. We evaluate the performance and overheads of ProML by benchmarking a proof-of-concept system on a global blockchain. Furthermore, we assessed ProML's security against a threat model of a distributed ML workflow.