论文标题
多任务学习的聚合物信息学
Polymer Informatics with Multi-Task Learning
论文作者
论文摘要
现代数据驱动的工具正在改变应用特定的聚合物开发周期。可以训练以预测新聚合物特性的替代模型变得司空见惯。然而,这些模型并未利用数据集中可用的全部知识的全部广度,这些知识通常很少。忽略了不同属性数据集之间的固有相关性。在这里,我们演示了有效利用这种固有相关性的多任务学习方法的效力,尤其是当某些属性数据集大小很小时。与36个不同属性相关的数据超过$ 13,000 $的聚合物(相当于$ 23,000 $数据点)合并并提供给深度学习的多任务体系结构。与常规的单任务学习模型(在单个属性数据集中受到独立培训)相比,多任务方法是准确,有效,可扩展的,并且可以随着有关相同或不同属性的更多数据可用的传输学习。而且,这些模型是可以解释的。化学规则,解释了某些特征如何控制特定财产价值的趋势,这是从当前工作中出现的,为符合所需的财产或绩效目标的应用特定聚合物的合理设计铺平了道路。
Modern data-driven tools are transforming application-specific polymer development cycles. Surrogate models that can be trained to predict the properties of new polymers are becoming commonplace. Nevertheless, these models do not utilize the full breadth of the knowledge available in datasets, which are oftentimes sparse; inherent correlations between different property datasets are disregarded. Here, we demonstrate the potency of multi-task learning approaches that exploit such inherent correlations effectively, particularly when some property dataset sizes are small. Data pertaining to 36 different properties of over $13, 000$ polymers (corresponding to over $23,000$ data points) are coalesced and supplied to deep-learning multi-task architectures. Compared to conventional single-task learning models (that are trained on individual property datasets independently), the multi-task approach is accurate, efficient, scalable, and amenable to transfer learning as more data on the same or different properties become available. Moreover, these models are interpretable. Chemical rules, that explain how certain features control trends in specific property values, emerge from the present work, paving the way for the rational design of application specific polymers meeting desired property or performance objectives.