天体物理学的分类与机器学习一起识别NED的数据

论文标题

天体物理学的分类与机器学习一起识别NED的数据

Classification of Astrophysics Journal Articles with Machine Learning to Identify Data for NED

论文作者

Chen, Tracy X., Ebert, Rick, Mazzarella, Joseph M., Frayer, Cren, Terek, Scott, Chan, Ben H. P., Cook, David, Lo, Tak, Schmitz, Marion, Wu, Xiuqin

论文摘要

NASA/IPAC外静脉外数据库（NED）是一项全面的在线服务，它结合了银河系以外的已知对象的基本多波长信息，并提供了增值，派生的数量和工具来搜索和访问数据。数据库中的测量值之间的内容和关系经过不断增强并修订，以保持天体物理学文献和新的天空调查。从文献中提取和提取数据的常规过程涉及人类专家来审查期刊文章并确定一篇文章是否具有外层状性质，如果是的，则其包含哪些类型的数据。这既是劳动密集型又不可持续的，尤其是考虑到每年不断增加的出版物。我们在这里提出了一种机器学习（ML）方法（ML）方法，并将其集成到NED生产管道中，以帮助自动化期刊文章主题的分类及其数据内容，以包含在NED中。我们表明，这种ML应用程序可以成功地将人类专家的分类重现为超过90％的人，而超过90％的时间，使我们能够将人类专业知识集中在更难自动化的任务上。

The NASA/IPAC Extragalactic Database (NED) is a comprehensive online service that combines fundamental multi-wavelength information for known objects beyond the Milky Way and provides value-added, derived quantities and tools to search and access the data. The contents and relationships between measurements in the database are continuously augmented and revised to stay current with astrophysics literature and new sky surveys. The conventional process of distilling and extracting data from the literature involves human experts to review the journal articles and determine if an article is of extragalactic nature, and if so, what types of data it contains. This is both labor intensive and unsustainable, especially given the ever-increasing number of publications each year. We present here a machine learning (ML) approach developed and integrated into the NED production pipeline to help automate the classification of journal article topics and their data content for inclusion into NED. We show that this ML application can successfully reproduce the classifications of a human expert to an accuracy of over 90% in a fraction of the time it takes a human, allowing us to focus human expertise on tasks that are more difficult to automate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题