论文标题
reproducegit:一种可视化工具,用于分析jupyter笔记本的可重复性
ReproduceMeGit: A Visualization Tool for Analyzing Reproducibility of Jupyter Notebooks
论文作者
论文摘要
计算笔记本在支持可重复的科学方面,在学术界和行业的研究人员中广泛采用。这些笔记本允许用户组合代码,文本和可视化,以方便共享实验和结果。它们在Github中广泛共享,该Github目前拥有超过1亿个存储库,使其成为世界上最大的源代码。最近的可重复性研究表明,编写这些笔记本上存在好的和坏实践,这些笔记本可能会影响其整体可重复性。我们提供了RododuceGit,这是一种可视化工具,用于分析Jupyter笔记本电脑的可重复性。这将有助于存储库用户和所有者再现并直接分析和评估任何包含Jupyter笔记本的GitHub存储库的可重复性。该工具提供了有关成功重复可重复的笔记本数量的信息,那些导致例外的笔记本,与原始笔记本电脑等不同结果等。存储库中的每个笔记本电脑以及其执行的出处信息也可以在RDF中与预备书工具的集成一起导出。
Computational notebooks have gained widespread adoption among researchers from academia and industry as they support reproducible science. These notebooks allow users to combine code, text, and visualizations for easy sharing of experiments and results. They are widely shared in GitHub, which currently has more than 100 million repositories making it the largest host of source code in the world. Recent reproducibility studies have indicated that there exist good and bad practices in writing these notebooks which can affect their overall reproducibility. We present ReproduceMeGit, a visualization tool for analyzing the reproducibility of Jupyter Notebooks. This will help repository users and owners to reproduce and directly analyze and assess the reproducibility of any GitHub repository containing Jupyter Notebooks. The tool provides information on the number of notebooks that were successfully reproducible, those that resulted in exceptions, those with different results from the original notebooks, etc. Each notebook in the repository along with the provenance information of its execution can also be exported in RDF with the integration of the ProvBook tool.