从科学中检测信号：研究社区的结构和先验知识改善了遗传调节实验的预测

论文标题

从科学中检测信号：研究社区的结构和先验知识改善了遗传调节实验的预测

Detecting signal from science:The structure of research communities and prior knowledge improves prediction of genetic regulatory experiments

论文作者

Belikov, Alexander V., Rzhetsky, Andrey, Evans, James

论文摘要

近年来，科学家，科学期刊，文章和发现的爆炸性增长呈指数增长，这增加了科学家在导航先验知识方面面临的困难。关于已发表发现的可重复性的不确定性，这一挑战加剧了。一方面，大规模的数字档案，机器阅读和提取工具的可用性，另一方面是自动化的高通量实验，使我们能够在大规模评估这些挑战，并确定加速科学进步的新机会。在这里，我们演示了一个贝叶斯演算，该贝叶斯微积分能够对可复制的科学主张进行积极的预测，从而自动从有关基因相互作用的文献中提取的发现。我们将这些发现与科学过滤的结果与未经过滤的基因相互作用进行了匹配，该基因相互作用是通过大量Lincs L1000高通量实验测量的，以识别和抵消偏见的来源。我们的演算基于易于提取的出版物元数据，涉及科学主张在先验知识网络中的位置及其在机构，作者和社区之间的支持广度，揭示了科学专注但在社会上和机构独立的研究活动中最有可能复制。这些发现建议政策违反将生物医学研究资金引入集中研究财团和机构的共同做法，而不是更广泛地分散。我们的结果表明，强大的科学发现取决于共同的重点和独立性的微妙平衡，并且可以在计算上利用这种复杂的模式来解码偏见并预测已发表的发现的可复制性。这些见解为科学家提供了指导，并为寻求改善它的科学资助者提供了指导。

The explosive growth of scientists, scientific journals, articles and findings in recent years exponentially increases the difficulty scientists face in navigating prior knowledge. This challenge is exacerbated by uncertainty about the reproducibility of published findings. The availability of massive digital archives, machine reading and extraction tools on the one hand, and automated high-throughput experiments on the other, allow us to evaluate these challenges at scale and identify novel opportunities for accelerating scientific advance. Here we demonstrate a Bayesian calculus that enables the positive prediction of robust, replicable scientific claims with findings automatically extracted from published literature on gene interactions. We matched these findings, filtered by science, with unfiltered gene interactions measured by the massive LINCS L1000 high-throughput experiment to identify and counteract sources of bias. Our calculus is built on easily extracted publication meta-data regarding the position of a scientific claim within the web of prior knowledge, and its breadth of support across institutions, authors and communities, revealing that scientifically focused but socially and institutionally independent research activity is most likely to replicate. These findings recommend policies that go against the common practice of channeling biomedical research funding into centralized research consortia and institutes rather than dispersing it more broadly. Our results demonstrate that robust scientific findings hinge upon a delicate balance of shared focus and independence, and that this complex pattern can be computationally exploited to decode bias and predict the replicability of published findings. These insights provide guidance for scientists navigating the research literature and for science funders seeking to improve it.

下载PDF全文

下载文献需遵守相关版权规定

论文标题