论文标题
积极学习用于API滥用检测的判别子图模式
Active Learning of Discriminative Subgraph Patterns for API Misuse Detection
论文作者
论文摘要
错误和漏洞的常见原因是违反与应用程序编程接口(API)相关的使用约束。 API滥用在软件项目中很常见,尽管已经提出了发现这些滥用的技术,但研究表明,它们在报告许多假阳性的同时无法可靠地检测到滥用。先前工作的一个局限性是无法可靠地确定正确的使用模式。许多方法都使使用模式的频率混淆了正确性。由于可能不常见但正确的替代用法模式多种多样,因此基于异常检测的技术在识别滥用方面的成功有限。我们应对这些挑战并提出ALP(积极学习的模式),将API滥用检测重新定义为分类问题。在将程序表示为图形之后,ALP地雷地雷歧视性子图。尽管仍在合并频率信息,但通过有限的人类监督,我们减少了对频率和正确性相关的假设的依赖。积极学习的原则被纳入了,以将人类注意力转移到最常见的模式中。取而代之的是,ALP在最小化标签工作的同时,为信息提供了信息和代表性的示例。在我们的经验评估中,ALP在Mubench,API滥用基准测试和我们从现实世界软件项目中构建的新数据集上都大大优于先验方法。
A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct patterns of usage. Many approaches confuse a usage pattern's frequency for correctness. Due to the variety of alternative usage patterns that may be uncommon but correct, anomaly detection-based techniques have limited success in identifying misuses. We address these challenges and propose ALP (Actively Learned Patterns), reformulating API misuse detection as a classification problem. After representing programs as graphs, ALP mines discriminative subgraphs. While still incorporating frequency information, through limited human supervision, we reduce the reliance on the assumption relating frequency and correctness. The principles of active learning are incorporated to shift human attention away from the most frequent patterns. Instead, ALP samples informative and representative examples while minimizing labeling effort. In our empirical evaluation, ALP substantially outperforms prior approaches on both MUBench, an API Misuse benchmark, and a new dataset that we constructed from real-world software projects.