论文标题
使用机器学习方法的数据驱动材料发现和合成
Data-Driven Materials Discovery and Synthesis using Machine Learning Methods
论文作者
论文摘要
在实验[1-38]和计算[39-50]经过验证的机器学习(ML)文章中,根据培训数据的大小进行分类:1-100、101-10000和10000+在综合集合中,总结了遗产和该领域的最新进展。综述强调了综合,表征和预测的相互关联领域。尺寸范围1-100主要由贝叶斯优化(BO)文章组成,而101-10000主要由支持向量机(SVM)文章组成。这些文章通常使用ML的组合,特征选择(FS),自适应设计(AD),高通量(HITP)技术和域知识来增强预测性能和/或模型的解释性。分组交叉验证(G-CV)技术遏制过度乐观的外推预测性能。依赖AD的较小数据集通常能够识别具有所需属性的新材料,但在受约束的设计空间中这样做。在较大的数据集中,通常已经发现了材料优化的低音果实,并且在推断到新材料时,模型通常不太成功,尤其是当模型训练数据偏爱特定类型的材料时。对预测结果进行实验或计算验证的ML材料科学文章的大量增加表明,材料科学学科的材料信息学和对现实世界应用的加速材料发现的互化。
Experimentally [1-38] and computationally [39-50] validated machine learning (ML) articles are sorted based on the size of the training data: 1-100, 101-10000, and 10000+ in a comprehensive set summarizing legacy and recent advances in the field. The review emphasizes the interrelated fields of synthesis, characterization, and prediction. Size range 1-100 consists mostly of Bayesian optimization (BO) articles, whereas 101-10000 consists mostly of support vector machine (SVM) articles. The articles often use combinations of ML, feature selection (FS), adaptive design (AD), high-throughput (HiTp) techniques, and domain knowledge to enhance predictive performance and/or model interpretability. Grouping cross-validation (G-CV) techniques curb overly optimistic extrapolative predictive performance. Smaller datasets relying on AD are typically able to identify new materials with desired properties but do so in a constrained design space. In larger datasets, the low-hanging fruit of materials optimization is typically already discovered, and the models are generally less successful at extrapolating to new materials, especially when the model training data favors a particular type of material. The large increase of ML materials science articles that perform experimental or computational validation on the predicted results demonstrates the interpenetration of materials informatics with the materials science discipline and an accelerating materials discovery for real-world applications.