论文标题
使用自然语言处理搜索跨学科的弥漫性星际乐队的载体
Searching for Carriers of the Diffuse Interstellar Bands Across Disciplines, using Natural Language Processing
论文作者
论文摘要
科学出版物的爆炸爆炸使研究人员的信息超负荷。对于跨学科研究而言,这更加引人注目,需要探索几个领域。一种帮助研究人员克服这一点的工具是自然语言处理(NLP):一种机器学习(ML)技术,允许科学家自动合成许多文章中的信息。作为一个实际的例子,我们已经使用NLP对可能是漫射的星际频带(DIB)的化合物进行跨学科搜索,这是天体物理学中的长期开放问题。我们已经在开放访问的150万个跨域文章的语料库上培训了NLP模型,并通过有关DIB的天体物理出版物进行了微调。我们的分析将我们指向了几个分子,主要在生物学上研究,其在几个DIB的波长下进行过渡,并由大量的星际原子组成。这些分子中有几个包含发色团,即负责该分子颜色的小分子基团,这可能是有希望的候选载体。识别可行的载体证明了使用NLP以跨学科的方式解决开放科学问题的价值。
The explosion of scientific publications overloads researchers with information. This is even more dramatic for interdisciplinary studies, where several fields need to be explored. A tool to help researchers overcome this is Natural Language Processing (NLP): a machine-learning (ML) technique that allows scientists to automatically synthesize information from many articles. As a practical example, we have used NLP to conduct an interdisciplinary search for compounds that could be carriers for Diffuse Interstellar Bands (DIBs), a long-standing open question in astrophysics. We have trained a NLP model on a corpus of 1.5 million cross-domain articles in open access, and fine-tuned this model with a corpus of astrophysical publications about DIBs. Our analysis points us toward several molecules, studied primarily in biology, having transitions at the wavelengths of several DIBs and composed of abundant interstellar atoms. Several of these molecules contain chromophores, small molecular groups responsible for the molecule's colour, that could be promising candidate carriers. Identifying viable carriers demonstrates the value of using NLP to tackle open scientific questions, in an interdisciplinary manner.