论文标题

建立一个集成,公平和可再现的数据的框架,以了解微生物社区的动态平衡

Toward a Framework for Integrative, FAIR, and Reproducible Management of Data on the Dynamic Balance of Microbial Communities

论文作者

Gadelha, Luiz, Hohmuth, Martin, Zulfiqar, Mahnoor, Schöne, David, Samuel, Sheeba, Sorokina, Maria, Steinbeck, Christoph, König-Ries, Birgitta

论文摘要

高通量工具以及用于科学计算的先进计算基础架构产生的数据增加的数据已启用了基于大型数据集的探索,用于科学研究的经常被称为{\ em第四个范式}。当前的科学研究通常是跨学科的,使数据整合成为结合来自不同科学领域的数据的关键技术。研究数据管理是通过通过其生命周期管理科学数据来管理科学数据的方法,技术和实践的命题和开发,是该范式的关键部分。对微生物群落的研究遵循从环境样本中存在的测序生物中获得的大量数据的相同生产模式。微生物群落的数据可能来自多种来源,可以以不同的格式存储。例如,研究中通常将来自宏基因组学,元文字组学,代谢组学和生物成像的数据合并。在本文中,我们描述了一个综合研究数据管理框架的设计和现状,以实现微观的卓越平衡,以便更容易地发现,访问,合并和重复使用微生物群落的数据。该框架基于研究数据存储库和用于管理微生物社区分析的工作流程的最佳实践,其中包括记录用于跟踪数据推导的出处信息。

The increasing volumes of data produced by high-throughput instruments coupled with advanced computational infrastructures for scientific computing have enabled what is often called a {\em Fourth Paradigm} for scientific research based on the exploration of large datasets. Current scientific research is often interdisciplinary, making data integration a critical technique for combining data from different scientific domains. Research data management is a critical part of this paradigm, through the proposition and development of methods, techniques, and practices for managing scientific data through their life cycle. Research on microbial communities follows the same pattern of production of large amounts of data obtained, for instance, from sequencing organisms present in environmental samples. Data on microbial communities can come from a multitude of sources and can be stored in different formats. For example, data from metagenomics, metatranscriptomics, metabolomics, and biological imaging are often combined in studies. In this article, we describe the design and current state of implementation of an integrative research data management framework for the Cluster of Excellence Balance of the Microverse aiming to allow for data on microbial communities to be more easily discovered, accessed, combined, and reused. This framework is based on research data repositories and best practices for managing workflows used in the analysis of microbial communities, which includes recording provenance information for tracking data derivation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源