竞争级代码生成用字母

论文标题

竞争级代码生成用字母

Competition-Level Code Generation with AlphaCode

论文作者

Li, Yujia, Choi, David, Chung, Junyoung, Kushman, Nate, Schrittwieser, Julian, Leblond, Rémi, Eccles, Tom, Keeling, James, Gimeno, Felix, Lago, Agustin Dal, Hubert, Thomas, Choy, Peter, d'Autume, Cyprien de Masson, Babuschkin, Igor, Chen, Xinyun, Huang, Po-Sen, Welbl, Johannes, Gowal, Sven, Cherepanov, Alexey, Molloy, James, Mankowitz, Daniel J., Robson, Esme Sutherland, Kohli, Pushmeet, de Freitas, Nando, Kavukcuoglu, Koray, Vinyals, Oriol

论文摘要

编程是一种强大而无处不在的解决问题的工具。开发可以协助程序员甚至独立生成程序的系统可以使编程更加富有成效和易于访问，但是到目前为止，将创新纳入了AI已被证明具有挑战性。最近的大规模语言模型表现出令人印象深刻的生成代码的能力，现在能够完成简单的编程任务。但是，当对更复杂，看不见的问题进行评估时，这些模型仍然表现不佳，这些问题需要解决问题的技能，而不是简单地将指令转化为代码。例如，需要了解算法和复杂自然语言的竞争性编程问题仍然极具挑战性。为了解决这一差距，我们介绍了AlphaCode，这是一个用于代码生成的系统，可以为需要更深入推理的这些问题创建新颖的解决方案。在对CodeForces平台上最近的编程竞赛的模拟评估中，AlphaCode平均在5,000多名参与者的比赛中平均获得了前54.3％的排名。我们发现，三个关键组件对于实现良好和可靠的性能至关重要：（1）用于培训和评估的广泛而干净的竞争编程数据集，（2）大型且高效的基于样本的基于变压器的体系结构，以及（3）大规模模型采样，以探索搜索空间，然后基于对小组提交的计划行为进行过滤。

Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code. For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging. To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants. We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题