论文标题

主题:使用注意力从源代码中学习存储库嵌入

Topical: Learning Repository Embeddings from Source Code using Attention

论文作者

Lherondelle, Agathe, Babbar, Varun, Satsangi, Yash, Silavong, Fran, Eloul, Shaltiel, Moran, Sean

论文摘要

本文介绍了一种新型的深层神经网络,用于存储库水平嵌入。依靠自然语言文档或天真的聚集技术的现有方法优于局部使用注意机制。该机制从源代码,完整依赖关系图和脚本级文本数据中生成存储库级表示。经过公共可访问的GitHub存储库的培训,局部局部在诸如存储库自动标记之类的任务中超过了多个基线,突出了注意机制对传统聚合方法的功效。主题还证明了可扩展性和效率,这使其对存储库级表示计算做出了宝贵的贡献。为了进一步研究,随附的工具,代码和培训数据集提供:https://github.com/jpmorganchase/topical。

This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical's utilization of an attention mechanism. This mechanism generates repository-level representations from source code, full dependency graphs, and script level textual data. Trained on publicly accessible GitHub repositories, Topical surpasses multiple baselines in tasks such as repository auto-tagging, highlighting the attention mechanism's efficacy over traditional aggregation methods. Topical also demonstrates scalability and efficiency, making it a valuable contribution to repository-level representation computation. For further research, the accompanying tools, code, and training dataset are provided at: https://github.com/jpmorganchase/topical.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源