Python Jupyter笔记本的错误识别策略

论文标题

Python Jupyter笔记本的错误识别策略

Error Identification Strategies for Python Jupyter Notebooks

论文作者

Robinson, Derek, Ernst, Neil A., Vargas, Enrique Larios, Storey, Margaret-Anne D.

论文摘要

计算笔记本（例如jupyter或colab）结合了文本和数据分析代码。它们在数据科学和探索性数据分析领域已变得无处不在。由于这些笔记本具有与传统IDE驱动的编程不同的编程范式，因此计算笔记本中的调试可能也有所不同是合理的。更具体地说，由于创建笔记本融合了域知识，统计分析和编程，因此笔记本用户在这些不同形式中查找和修复错误的方式可能会有所不同。在本文中，我们介绍了一项有关Python Jupyter笔记本用户如何查找笔记本中潜在错误的探索性观察性研究。通过研究设计设计设计设计设计的概念复制，我们向用户介绍了用户python jupyter笔记本，这些笔记本本已预先填充了通用笔记本错误 - 源于统计数据分析，域概念的知识或计划中的错误。然后，我们分析了我们的研究参与者用于发现这些错误的策略，并确定了每种策略在识别错误方面的成功。我们的发现表明，虽然笔记本编程环境与用于传统编程的环境不同，但调试策略仍然非常相似。我们希望本文提供的见解将有助于笔记本工具设计师和教育工作者进行更改，以改善数据科学家在他们编写的笔记本中如何更轻松地发现错误。

Computational notebooks -- such as Jupyter or Colab -- combine text and data analysis code. They have become ubiquitous in the world of data science and exploratory data analysis. Since these notebooks present a different programming paradigm than conventional IDE-driven programming, it is plausible that debugging in computational notebooks might also be different. More specifically, since creating notebooks blends domain knowledge, statistical analysis, and programming, the ways in which notebook users find and fix errors in these different forms might be different. In this paper, we present an exploratory, observational study on how Python Jupyter notebook users find and understand potential errors in notebooks. Through a conceptual replication of study design investigating the error identification strategies of R notebook users, we presented users with Python Jupyter notebooks pre-populated with common notebook errors -- errors rooted in either the statistical data analysis, the knowledge of domain concepts, or in the programming. We then analyzed the strategies our study participants used to find these errors and determined how successful each strategy was at identifying errors. Our findings indicate that while the notebook programming environment is different from the environments used for traditional programming, debugging strategies remain quite similar. It is our hope that the insights presented in this paper will help both notebook tool designers and educators make changes to improve how data scientists discover errors more easily in the notebooks they write.

下载PDF全文

下载文献需遵守相关版权规定

论文标题