论文标题
评估现成的SE特定情感分析工具:扩展复制研究
Assessment of Off-the-Shelf SE-specific Sentiment Analysis Tools: An Extended Replication Study
论文作者
论文摘要
情绪分析方法在调查人类交流方面已经流行,包括与软件项目有关的讨论。由于通用情感分析工具与软件开发人员交换的信息不太吻合,因此已开发了针对软件工程(SE)的新工具。我们调查了情感分析的SE特异性工具在多大程度上减轻了先前研究中的软件工程经验研究的结论有效性的威胁。首先,我们复制了两项研究,涉及情绪在GitHub上的安全讨论中的作用以及在堆栈溢出上写入的问题。然后,我们通过评估工具在多大程度上彼此同意以及与600个文档的黄金标准相互注释来扩展先前的研究。我们发现,当使用“现成”时,不同的SE特定情感分析工具可能会在细粒度上以细粒度的水平矛盾。相反,可能需要特定于平台的调整或再培训来考虑平台约定,行话或文档长度的差异。
Sentiment analysis methods have become popular for investigating human communication, including discussions related to software projects. Since general-purpose sentiment analysis tools do not fit well with the information exchanged by software developers, new tools, specific for software engineering (SE), have been developed. We investigate to what extent SE-specific tools for sentiment analysis mitigate the threats to conclusion validity of empirical studies in software engineering, highlighted by previous research. First, we replicate two studies addressing the role of sentiment in security discussions on GitHub and in question-writing on Stack Overflow. Then, we extend the previous studies by assessing to what extent the tools agree with each other and with the manual annotation on a gold standard of 600 documents. We find that different SE-specific sentiment analysis tools might lead to contradictory results at a fine-grain level, when used 'off-the-shelf'. Conversely, platform-specific tuning or retraining might be needed to take into account differences in platform conventions, jargon, or document lengths.