论文标题

引发与计算笔记本合作的最佳实践

Eliciting Best Practices for Collaboration with Computational Notebooks

论文作者

Quaranta, Luigi, Calefato, Fabio, Lanubile, Filippo

论文摘要

尽管计算笔记本的广泛采用,但在协作环境中使用的最佳实践知之甚少。在本文中,我们通过诱导使用计算笔记本的协作数据科学的最佳实践目录来填补这一空白。以此目的,我们首先通过多次文献综述来寻找最佳实践。然后,我们对专业数据科学家进行访谈,以评估他们对这些最佳实践的认识。最后,我们通过分析从Kaggle平台检索到的1,380个Jupyter笔记本来评估最佳实践的采用。调查结果表明,专家大多了解最佳实践,并倾向于在日常工作中采用它们。尽管如此,它们并没有始终如一地遵循所有建议,因为根据具体的情况,由于缺乏适当的工具支持,有些建议被认为是不可行或适得其反的。因此,我们设想了笔记本解决方案的设计,这些解决方案允许数据科学家不必优先考虑探索和快速原型,而不是编写质量守则。

Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data science with computational notebooks. With this aim, we first look for best practices through a multivocal literature review. Then, we conduct interviews with professional data scientists to assess their awareness of these best practices. Finally, we assess the adoption of best practices through the analysis of 1,380 Jupyter notebooks retrieved from the Kaggle platform. Findings reveal that experts are mostly aware of the best practices and tend to adopt them in their daily work. Nonetheless, they do not consistently follow all the recommendations as, depending on specific contexts, some are deemed unfeasible or counterproductive due to the lack of proper tool support. As such, we envision the design of notebook solutions that allow data scientists not to have to prioritize exploration and rapid prototyping over writing code of quality.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源