要克服材料科学中的数据稀缺：将模型和数据集与专家框架混合在一起

论文标题

要克服材料科学中的数据稀缺：将模型和数据集与专家框架混合在一起

Towards overcoming data scarcity in materials science: unifying models and datasets with a mixture of experts framework

论文作者

Chang, Rees, Wang, Yu-Xiong, Ertekin, Elif

论文摘要

近年来，机器学习已成为快速预测材料属性的有用工具，但生成足够的数据以可靠的训练模型而无需过度拟合，对于许多应用来说仍然是不切实际的。为了克服这一限制，我们提出了一个通用框架，用于利用不同模型和数据集的互补信息，以准确预测数据稀缺材料属性。我们的方法是基于机器学习范式称为专家的混合物，在19个材料属性回归任务中的16个中，均优于成对转移学习，对剩下的三个进行了相当的表现。与成对转移学习不同，我们的框架会自动学习以在单次培训运行中从多个源任务中结合信息，从而减轻了对蛮力实验的需求，以确定要从哪些源任务转移。该方法还提供了一种可解释的，模型的和可扩展的机制，可以将信息从任意数量的模型和数据集传输到任何下游属性预测任务。我们预计，随着更好的模型架构，新的预训练任务以及较大的材料数据集由社区开发，我们的框架的性能将进一步改善。

While machine learning has emerged in recent years as a useful tool for rapid prediction of materials properties, generating sufficient data to reliably train models without overfitting is still impractical for many applications. Towards overcoming this limitation, we present a general framework for leveraging complementary information across different models and datasets for accurate prediction of data scarce materials properties. Our approach, based on a machine learning paradigm called mixture of experts, outperforms pairwise transfer learning on 16 of 19 materials property regression tasks, performing comparably on the remaining three. Unlike pairwise transfer learning, our framework automatically learns to combine information from multiple source tasks in a single training run, alleviating the need for brute-force experiments to determine which source task to transfer from. The approach also provides an interpretable, model-agnostic, and scalable mechanism to transfer information from an arbitrary number of models and datasets to any downstream property prediction task. We anticipate the performance of our framework will further improve as better model architectures, new pre-training tasks, and larger materials datasets are developed by the community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题