论文标题

Flamby:在现实的医疗机构中用于联合跨核心学习的数据集和基准

FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

论文作者

Terrail, Jean Ogier du, Ayed, Samy-Safwan, Cyffers, Edwige, Grimberg, Felix, He, Chaoyang, Loeb, Regis, Mangold, Paul, Marchand, Tanguy, Marfoq, Othmane, Mushtaq, Erum, Muzellec, Boris, Philippenko, Constantin, Silva, Santiago, Teleńczuk, Maria, Albarqouni, Shadi, Avestimehr, Salman, Bellet, Aurélien, Dieuleveut, Aymeric, Jaggi, Martin, Karimireddy, Sai Praneeth, Lorenzi, Marco, Neglia, Giovanni, Tommasi, Marc, Andreux, Mathieu

论文摘要

联合学习(FL)是一种新颖的方法,使几个持有敏感数据的客户可以协作训练机器学习模型,而无需集中数据。跨索洛FL设置对应于很少的可靠客户($ 2 $ - $ 50 $)的情况,每个持有媒介的大型数据集,通常在医疗保健,金融或行业等应用程序中找到。尽管以前的作品提出了用于跨设备FL的代表性数据集,但很少有现实的医疗保健跨索洛FL数据集存在,从而减慢了此关键应用中的算法研究。在这项工作中,我们提出了一个专注于医疗保健的新型跨核数据集套件,Flamby(联合学习的跨性别策略的足够基准),以弥合Cross-Silo FL理论与实践之间的差距。 Flamby涵盖了7个具有自然拆分的医疗保健数据集,涵盖了多个任务,模式和数据量,每个数据量都附有基线培训代码。作为例证,我们还在所有数据集上基准基准标准fl算法。我们的灵活和模块化套件使研究人员可以轻松下载数据集,重现结果并重新使用不同的组件进行研究。 Flamby可在〜\ url {www.github.com/owkin/flamby}上找到。

Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源