deepprophet2-深度学习基因推荐引擎

论文标题

deepprophet2-深度学习基因推荐引擎

DeepProphet2 -- A Deep Learning Gene Recommendation Engine

论文作者

Brambilla, Daniele, Giacomini, Davide Maria, Muscarnera, Luca, Mazzoleni, Andrea

论文摘要

机器学习的最新进展创造了解决生命科学问题的新的强大工具。本文的目的是讨论人工智能（AI）执行的基因推荐的潜在优势。实际上，基因推荐引擎试图解决此问题：如果用户对一组基因感兴趣，那么其他基因可能与起始集有关，应该研究？该任务是通过自定义深度学习推荐引擎Deepprophet2（DP2）来解决的，该引擎可以通过https://www.generecommender.com？utm_source = deepprophet2_paper＆utm_medium = pdf。此后，说明了算法背后的见解及其实际应用。可以通过将基因映射到可以定义距离以表示它们之间的真实语义距离的度量空间来解决基因推荐问题。为了实现这一目标，基于变压器的模型已在公开良好的纸质纸语料库中进行了培训。本文描述了多种优化程序，这些程序用于获得最佳的偏见变化权衡，重点是嵌入尺寸和网络深度。在这种情况下，通过交叉验证评估了模型发现与疾病和途径有关的基因集的能力。一个简单的假设指导了该过程：网络没有直接了解途径和疾病的知识，而是学会了基因的相似性及其之间的相互作用。此外，为了进一步研究神经网络代表基因的空间，嵌入的维度减少了，结果被投影到了可忽视的空间上。总之，一组用例以真实的单词设置说明了该算法的潜在应用。

New powerful tools for tackling life science problems have been created by recent advances in machine learning. The purpose of the paper is to discuss the potential advantages of gene recommendation performed by artificial intelligence (AI). Indeed, gene recommendation engines try to solve this problem: if the user is interested in a set of genes, which other genes are likely to be related to the starting set and should be investigated? This task was solved with a custom deep learning recommendation engine, DeepProphet2 (DP2), which is freely available to researchers worldwide via https://www.generecommender.com?utm_source=DeepProphet2_paper&utm_medium=pdf. Hereafter, insights behind the algorithm and its practical applications are illustrated. The gene recommendation problem can be addressed by mapping the genes to a metric space where a distance can be defined to represent the real semantic distance between them. To achieve this objective a transformer-based model has been trained on a well-curated freely available paper corpus, PubMed. The paper describes multiple optimization procedures that were employed to obtain the best bias-variance trade-off, focusing on embedding size and network depth. In this context, the model's ability to discover sets of genes implicated in diseases and pathways was assessed through cross-validation. A simple assumption guided the procedure: the network had no direct knowledge of pathways and diseases but learned genes' similarities and the interactions among them. Moreover, to further investigate the space where the neural network represents genes, the dimensionality of the embedding was reduced, and the results were projected onto a human-comprehensible space. In conclusion, a set of use cases illustrates the algorithm's potential applications in a real word setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题