自动问题分类器：用于分类问题报告的转移学习框架

论文标题

自动问题分类器：用于分类问题报告的转移学习框架

Automatic Issue Classifier: A Transfer Learning Framework for Classifying Issue Reports

论文作者

Nadeem, Anas, Sarwar, Muhammad Usman, Malik, Muhammad Zubair

论文摘要

在软件行业中，使用问题跟踪系统用于促进维护活动，以保持软件稳健并符合不断变化的行业要求。通常，用户报告的问题可以分类为不同的标签，例如错误报告，增强请求以及与软件相关的问题。大多数问题跟踪系统都使这些问题报告的标签是该问题提交者的可选，这导致了大量未标记的问题报告。在本文中，我们提出了一种最先进的方法，将问题报告分类为各自类别，即错误，增强和问题。这是一项艰巨的任务，因为在问题报告中普遍使用非正式语言。现有研究使用采用基于钥匙字的特征的传统自然语言处理方法，这些方法未能纳入单词之间的上下文关系，因此导致了很高的假阳性和假否定性的速度。此外，以前的作品利用Uni-Label方法对问题报告进行分类，但是实际上，问题使用机可以一次使用一个以上的标签标记一个问题报告。本文介绍了我们在多标签环境中对问题报告进行分类的方法。我们使用一个名为Roberta的现成神经网络并进行微调以对问题报告进行分类。我们验证了属于Github众多工业项目的问题报告的方法。分别为错误报告，增强功能和问题分别获得了81％，74％和80％的F-1分数。我们还开发了一种名为“自动问题分类器”（AIC）的行业工具，该工具将自动将标签分配给GitHub存储库的新报告的问题，其精度很高。

Issue tracking systems are used in the software industry for the facilitation of maintenance activities that keep the software robust and up to date with ever-changing industry requirements. Usually, users report issues that can be categorized into different labels such as bug reports, enhancement requests, and questions related to the software. Most of the issue tracking systems make the labelling of these issue reports optional for the issue submitter, which leads to a large number of unlabeled issue reports. In this paper, we present a state-of-the-art method to classify the issue reports into their respective categories i.e. bug, enhancement, and question. This is a challenging task because of the common use of informal language in the issue reports. Existing studies use traditional natural language processing approaches adopting key-word based features, which fail to incorporate the contextual relationship between words and therefore result in a high rate of false positives and false negatives. Moreover, previous works utilize a uni-label approach to classify the issue reports however, in reality, an issue-submitter can tag one issue report with more than one label at a time. This paper presents our approach to classify the issue reports in a multi-label setting. We use an off-the-shelf neural network called RoBERTa and fine-tune it to classify the issue reports. We validate our approach on issue reports belonging to numerous industrial projects from GitHub. We were able to achieve promising F-1 scores of 81%, 74%, and 80% for bug reports, enhancements, and questions, respectively. We also develop an industry tool called Automatic Issue Classifier (AIC), which automatically assigns labels to newly reported issues on GitHub repositories with high accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题