论文标题
寻找有关GitHub讨论的相关讨论
Looking for related discussions on GitHub Discussions
论文作者
论文摘要
软件团队越来越多地采用不同的工具和通信渠道来帮助软件协作开发模型并协调任务。在此类资源中,开发人员广泛使用了基于社区的编程问题答案(PCQA)论坛。这样的环境使开发人员能够获取和共享技术信息。 Github对支持开源软件(OSS)项目的开发和管理感兴趣,宣布了Github讨论 - 一个本机论坛,旨在促进用户与平台上托管的社区成员之间的协作讨论。正如GitHub讨论相似的PCQA论坛时,它面临着与此类环境相似的挑战,其中包括发生相关讨论(重复或近乎解换的帖子)的挑战。虽然重复的帖子具有相同的内容 - 并且可能是确切的副本,但近构想具有相似的主题和信息。两者都可以将噪声引入平台并损害项目知识共享。在本文中,我们解决了在GitHub讨论中检测相关帖子的问题。为此,我们提出了一种基于句子的预培训模型:RD-detector的方法。我们使用来自不同OSS社区的数据评估了RD-detector。 OSS维护人员和软件工程(SE)研究人员手动评估了RD检测器的结果,从精度方面达到了75%至100%。此外,维护者还指出了该方法的实际应用,例如合并讨论的线程并互相评论。 OSS维护者可以从RD探测器中受益,以解决手动检测相关讨论并多次回答相同问题的劳动密集型任务。
Software teams are increasingly adopting different tools and communication channels to aid the software collaborative development model and coordinate tasks. Among such resources, Programming Community-based Question Answering (PCQA) forums have become widely used by developers. Such environments enable developers to get and share technical information. Interested in supporting the development and management of Open Source Software (OSS) projects, GitHub announced GitHub Discussions - a native forum to facilitate collaborative discussions between users and members of communities hosted on the platform. As GitHub Discussions resembles PCQA forums, it faces challenges similar to those faced by such environments, which include the occurrence of related discussions (duplicates or near-duplicated posts). While duplicate posts have the same content - and may be exact copies - near-duplicates share similar topics and information. Both can introduce noise to the platform and compromise project knowledge sharing. In this paper, we address the problem of detecting related posts in GitHub Discussions. To do so, we propose an approach based on a Sentence-BERT pre-trained model: the RD-Detector. We evaluated RD-Detector using data from different OSS communities. OSS maintainers and Software Engineering (SE) researchers manually evaluated the RD-Detector results, which achieved 75% to 100% in terms of precision. In addition, maintainers pointed out practical applications of the approach, such as merging the discussions' threads and making discussions as comments on one another. OSS maintainers can benefit from RD-Detector to address the labor-intensive task of manually detecting related discussions and answering the same question multiple times.