论文标题
分子和自然语言之间的翻译
Translation between Molecules and Natural Language
论文作者
论文摘要
我们提出$ \ textbf {molt5} $ $ - $一个自我监督的学习框架,用于在大量未标记的自然语言文本和分子字符串上进行预读取模型。 $ \ textbf {molt5} $允许对传统视觉语言任务的新,有用且具有挑战性的类似物,例如分子字幕和基于文本的de从头分子产生(完全:分子和语言之间的翻译),我们是我们第一次探索的。由于$ \ textbf {Molt5} $在单模式数据上预处理模型,因此有助于克服化学域稀缺性的化学领域缺点。此外,我们考虑了几个指标,包括一个新的基于跨模式的度量指标,以评估分子字幕和基于文本的分子产生的任务。我们的结果表明,基于$ \ textbf {Molt5} $的模型能够生成分子和字幕的输出,在许多情况下,它们都是高质量的。
We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), which we explore for the first time. Since $\textbf{MolT5}$ pretrains models on single-modal data, it helps overcome the chemistry domain shortcoming of data scarcity. Furthermore, we consider several metrics, including a new cross-modal embedding-based metric, to evaluate the tasks of molecule captioning and text-based molecule generation. Our results show that $\textbf{MolT5}$-based models are able to generate outputs, both molecules and captions, which in many cases are high quality.