论文标题

DeepRelease:开源软件的拉动请求中的语言敏捷发行说明生成

DeepRelease: Language-agnostic Release Notes Generation from Pull Requests of Open-source Software

论文作者

Jiang, Huaxi, Zhu, Jie, Yang, Li, Liang, Geng, Zuo, Chun

论文摘要

发行说明是开源软件的必不可少的软件工件,它记录了有关更改的重要信息,例如新功能和错误修复。在发行说明的帮助下,开发人员和用户都可以在不浏览源代码的情况下对最新版本有一般的了解。但是,对于开发人员来说,制作发行说明是一项艰巨且耗时的工作。尽管先前的研究提供了一些自动方法,但它们主要通过从代码更改中提取信息来生成发行说明。这将导致特定于语言,并且不足以适用。因此,帮助开发人员有效地生产发行说明仍然是未解决的挑战。为了解决这个问题,我们首先对900个GitHub项目的发行说明进行了手动研究,该研究表明,超过54%的项目通过拉扯请求制作其发行说明。根据经验发现,我们提出了一种基于深度学习的方法,名为DeepRelease(基于深度学习的发行说明生成器),以根据拉的请求生成发行笔记。 DeepRelease中发行注释生成的过程包括变更条目生成和变更类别(即新功能或错误修复),这些变化类别分别为文本摘要任务和多类分类问题。由于DeepRelease完全采用了来自拉的请求中的文本信息来汇总更改并确定变更类别,因此它是语言不可思议的,可用于任何语言的项目。我们构建一个具有超过46K发行说明的数据集并评估数据集上的DeepRelease。实验结果表明,DeepRelease的表现优于四个基础,并且可以生成与一小部分时间手动编写的释放音符相似的释放音符。

The release note is an essential software artifact of open-source software that documents crucial information about changes, such as new features and bug fixes. With the help of release notes, both developers and users could have a general understanding of the latest version without browsing the source code. However, it is a daunting and time-consuming job for developers to produce release notes. Although prior studies have provided some automatic approaches, they generate release notes mainly by extracting information from code changes. This will result in language-specific and not being general enough to be applicable. Therefore, helping developers produce release notes effectively remains an unsolved challenge. To address the problem, we first conduct a manual study on the release notes of 900 GitHub projects, which reveals that more than 54% of projects produce their release notes with pull requests. Based on the empirical finding, we propose a deep learning based approach named DeepRelease (Deep learning based Release notes generator) to generate release notes according to pull requests. The process of release notes generation in DeepRelease includes the change entries generation and the change category (i.e., new features or bug fixes) generation, which are formulated as a text summarization task and a multi-class classification problem, respectively. Since DeepRelease fully employs text information from pull requests to summarize changes and identify the change category, it is language-agnostic and can be used for projects in any language. We build a dataset with over 46K release notes and evaluate DeepRelease on the dataset. The experimental results indicate that DeepRelease outperforms four baselines and can generate release notes similar to those manually written ones in a fraction of the time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源