论文标题

自动从超导体科学文献中自动提取材料和特性

Automatic extraction of materials and properties from superconductors scientific literature

论文作者

Foppiano, Luca, de Castro, Pedro Baptista, Suarez, Pedro Ortiz, Terashima, Kensei, Takano, Yoshihiko, Ishii, Masashi

论文摘要

从科学文献中自动提取材料和相关特性正在引起数据驱动的材料科学(材料信息学)的关注。在本文中,我们讨论了Grobid-Superconductors,这是我们的解决方案,用于自动从文本中提取超导体材料名称和各自的属性。它以Grobid模块的形式构建,在多步体系结构中结合了机器学习和启发式方法,该架构支持输入数据作为原始文本或PDF文档。使用Grobid-Superconductors,我们构建了SuperCon2,这是一个来自37700篇论文的40324材料和属性记录的数据库。材料(或样品)信息以名称,化学公式和材料类别表示,其特征是形状,掺杂,组件的替换变量,而基板作为相邻的信息为特征。这些特性包括TC超导临界温度,并在使用时使用TC测量方法施加压力。

The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40324 materials and properties records from 37700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源