论文标题

一项关于开发人员如何讨论熊猫主题的实证研究

An Empirical Study on How the Developers Discussed about Pandas Topics

论文作者

Joy, Sajib Kumar Saha, Ahmed, Farzad, Mahamud, Al Hasib, Mandal, Nibir Chandra

论文摘要

熊猫被定义为软件库,用于使用Python编程语言进行数据分析。由于Pandas是一种快速,简单和开源的数据分析工具,因此它可以迅速用于软件工程项目,例如软件开发,机器学习,计算机视觉,自然语言处理,机器人技术等。因此,软件开发人员对大熊猫的兴趣表现出了巨大的兴趣,现在在在线开发人员论坛(例如堆栈溢出)中占主导地位,例如堆栈溢出(So)。这样的讨论可以帮助理解熊猫图书馆的普及,也可以帮助理解熊猫主题的重要性,普遍性和困难。该研究论文的主要目的是找到熊猫主题的普及和困难。为此,收集了与熊猫主题讨论有关的帖子。主题建模是在帖子的文本内容上完成的。我们发现了26个主题,我们将这些主题进一步分为5个董事会类别。我们观察到,开发人员讨论了与错误,可视化,外部支持,数据框架和优化有关的熊猫主题的多种多样。此外,根据预定义的时间序列中对主题的讨论产生趋势图。本文的发现可以提供帮助开发人员,教育者和学习者的途径。例如,初学者开发人员可以学习熊猫中最重要的主题,这对于开发任何模型至关重要。教育工作者可以理解看来很难学习的主题,并且可以构建不同的教程,从而使Pandas主题可以理解。从这项实证研究中,可以通过处理他们的SO帖子来了解Pandas主题中开发人员的偏好

Pandas is defined as a software library which is used for data analysis in Python programming language. As pandas is a fast, easy and open source data analysis tool, it is rapidly used in different software engineering projects like software development, machine learning, computer vision, natural language processing, robotics, and others. So a huge interests are shown in software developers regarding pandas and a huge number of discussions are now becoming dominant in online developer forums, like Stack Overflow (SO). Such discussions can help to understand the popularity of pandas library and also can help to understand the importance, prevalence, difficulties of pandas topics. The main aim of this research paper is to find the popularity and difficulty of pandas topics. For this regard, SO posts are collected which are related to pandas topic discussions. Topic modeling are done on the textual contents of the posts. We found 26 topics which we further categorized into 5 board categories. We observed that developers discuss variety of pandas topics in SO related to error and excepting handling, visualization, External support, dataframe, and optimization. In addition, a trend chart is generated according to the discussion of topics in a predefined time series. The finding of this paper can provide a path to help the developers, educators and learners. For example, beginner developers can learn most important topics in pandas which are essential for develop any model. Educators can understand the topics which seem hard to learners and can build different tutorials which can make that pandas topic understandable. From this empirical study it is possible to understand the preferences of developers in pandas topic by processing their SO posts

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源