论文标题
Immunolingo:基于语言学的抗体语言形式化
ImmunoLingo: Linguistics-based formalization of the antibody language
论文作者
论文摘要
自然语言和生物学序列之间的明显相似之处已导致最新语言模型(LMS)在抗体和其他生物学序列分析中的应用激增。 However, a lack of a rigorous linguistic formalization of biological sequence languages, which would define basic components, such as lexicon (i.e., the discrete units of the language) and grammar (i.e., the rules that link sequence well-formedness, structure, and meaning) has led to largely domain-unspecific applications of LMs, which do not take into account the underlying structure of the biological sequences studied.另一方面,语言形式化为LM应用建立了语言信息,因此是针对域适应的组件。它将促进更好地理解自然语言和生物序列之间的差异和相似性如何影响LMS的质量,这对于具有可解释的模型的设计至关重要,具有可提取的序列功能关系规则,例如抗体特异性预测问题的基础。解解抗体特异性规则对于加速有理和硅生物治疗药物设计至关重要。在这里,我们将抗体语言的特性形式化,因此不仅建立了在适应性免疫受体分析中应用语言工具的基础,而且还为免疫受体特异性的系统免疫语言研究提供了基础。
Apparent parallels between natural language and biological sequence have led to a recent surge in the application of deep language models (LMs) to the analysis of antibody and other biological sequences. However, a lack of a rigorous linguistic formalization of biological sequence languages, which would define basic components, such as lexicon (i.e., the discrete units of the language) and grammar (i.e., the rules that link sequence well-formedness, structure, and meaning) has led to largely domain-unspecific applications of LMs, which do not take into account the underlying structure of the biological sequences studied. A linguistic formalization, on the other hand, establishes linguistically-informed and thus domain-adapted components for LM applications. It would facilitate a better understanding of how differences and similarities between natural language and biological sequences influence the quality of LMs, which is crucial for the design of interpretable models with extractable sequence-functions relationship rules, such as the ones underlying the antibody specificity prediction problem. Deciphering the rules of antibody specificity is crucial to accelerating rational and in silico biotherapeutic drug design. Here, we formalize the properties of the antibody language and thereby establish not only a foundation for the application of linguistic tools in adaptive immune receptor analysis but also for the systematic immunolinguistic studies of immune receptor specificity in general.