大型语言模型的系统评估

论文标题

大型语言模型的系统评估

A Systematic Evaluation of Large Language Models of Code

论文作者

Xu, Frank F., Alon, Uri, Neubig, Graham, Hellendoorn, Vincent J.

论文摘要

代码的大型语言模型（LMS）最近在完成自然语言描述中完成代码和综合代码方面表现出了巨大的希望。但是，当前的最新代码LMS（例如，Codex（Chen等，2021））尚未公开，留下了许多有关其模型和数据设计决策的问题。我们的目标是通过对现有最大的模型进行系统评估来填充其中一些空白：跨各种编程语言，Codex，GPT-J，GPT-NEO，GPT-NEO，GPT-NEOX-20B和CODEPARROT。尽管法典本身不是开源的，但我们发现现有的开源模型确实在某些编程语言中取得了近距离的结果，尽管主要针对自然语言建模。我们进一步以大型开源模型的形式确定了一个重要的丢失作品，该模型专门培训了多种语言代码语料库。我们发布了一个新的模型PolyCoder，其基于GPT-2体系结构的2.7B参数，该参数在单台计算机上的12种编程语言上接受了249GB代码的培训。在C编程语言中，PolyCoder优于包括Codex在内的所有模型。我们训练有素的模型是开源的，可在https://github.com/vhellendoorn/code-lms上公开获得，这可以在该领域进行未来的研究和应用。

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not publicly available, leaving many questions about their model and data design decisions. We aim to fill in some of these blanks through a systematic evaluation of the largest existing models: Codex, GPT-J, GPT-Neo, GPT-NeoX-20B, and CodeParrot, across various programming languages. Although Codex itself is not open-source, we find that existing open-source models do achieve close results in some programming languages, although targeted mainly for natural language modeling. We further identify an important missing piece in the form of a large open-source model trained exclusively on a multi-lingual corpus of code. We release a new model, PolyCoder, with 2.7B parameters based on the GPT-2 architecture, which was trained on 249GB of code across 12 programming languages on a single machine. In the C programming language, PolyCoder outperforms all models including Codex. Our trained models are open-source and publicly available at https://github.com/VHellendoorn/Code-LMs, which enables future research and application in this area.

下载PDF全文

下载文献需遵守相关版权规定

论文标题