分子和自然语言之间的翻译

论文标题

分子和自然语言之间的翻译

Translation between Molecules and Natural Language

论文作者

Edwards, Carl, Lai, Tuan, Ros, Kevin, Honke, Garrett, Cho, Kyunghyun, Ji, Heng

论文摘要

我们提出$ \ textbf {molt5} $ $ - $一个自我监督的学习框架，用于在大量未标记的自然语言文本和分子字符串上进行预读取模型。 $ \ textbf {molt5} $允许对传统视觉语言任务的新，有用且具有挑战性的类似物，例如分子字幕和基于文本的de从头分子产生（完全：分子和语言之间的翻译），我们是我们第一次探索的。由于$ \ textbf {Molt5} $在单模式数据上预处理模型，因此有助于克服化学域稀缺性的化学领域缺点。此外，我们考虑了几个指标，包括一个新的基于跨模式的度量指标，以评估分子字幕和基于文本的分子产生的任务。我们的结果表明，基于$ \ textbf {Molt5} $的模型能够生成分子和字幕的输出，在许多情况下，它们都是高质量的。

We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), which we explore for the first time. Since $\textbf{MolT5}$ pretrains models on single-modal data, it helps overcome the chemistry domain shortcoming of data scarcity. Furthermore, we consider several metrics, including a new cross-modal embedding-based metric, to evaluate the tasks of molecule captioning and text-based molecule generation. Our results show that $\textbf{MolT5}$-based models are able to generate outputs, both molecules and captions, which in many cases are high quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题