论文标题
使用自然语言处理和机器学习算法搜索铬酸盐替换
Searching for chromate replacements using natural language processing and machine learning algorithms
论文作者
论文摘要
过去几年,在探索新材料中使用的机器学习的应用。与许多研究领域一样,绝大多数知识都以文本的形式发表,这在整个研究和报告的合并或统计分析中提出了挑战。这些挑战包括无法提取定量信息以及访问非数字信息的广度。为了解决这个问题,迄今为止的几项研究已经探讨了自然语言处理(NLP)的应用。在NLP中,高维矢量(称为嵌入)的分配文本段落保留了单词之间的句法和语义关系。嵌入依赖机器学习算法,在当前的工作中,我们采用了以前由其他人探索的Word2Vec模型,以及BERT模型 - 将它们应用于材料工程中的独特挑战。这一挑战是在腐蚀保护领域中寻找铬酸盐的替代品。从一个超过8000万张记录的数据库中,使用NLP检查了5990篇论文的下调。这项研究表明,可以从科学文献的自动解释中提取知识并获得专家人类级别的见解。
The past few years has seen the application of machine learning utilised in the exploration of new materials. As in many fields of research - the vast majority of knowledge is published as text, which poses challenges in either a consolidated or statistical analysis across studies and reports. Such challenges include the inability to extract quantitative information, and in accessing the breadth of non-numerical information. To address this issue, the application of natural language processing (NLP) has been explored in several studies to date. In NLP, assignment of high-dimensional vectors, known as embeddings, to passages of text preserves the syntactic and semantic relationship between words. Embeddings rely on machine learning algorithms and in the present work, we have employed the Word2Vec model, previously explored by others, and the BERT model - applying them towards a unique challenge in materials engineering. That challenge is the search for chromate replacements in the field of corrosion protection. From a database of over 80 million records, a down-selection of 5990 papers focused on the topic of corrosion protection were examined using NLP. This study demonstrates it is possible to extract knowledge from the automated interpretation of the scientific literature and achieve expert human level insights.