少更多：长途指示的摘要更适合程序合成

论文标题

少更多：长途指示的摘要更适合程序合成

Less is More: Summary of Long Instructions is Better for Program Synthesis

论文作者

Kuznia, Kirby, Mishra, Swaroop, Parmar, Mihir, Baral, Chitta

论文摘要

尽管大型预训练的语言模型（LMS）（例如Codex）取得了成功，但它们在较大且更复杂的与编程相关的问题上表现出低于标准的性能。我们表明，LMS受益于复杂问题的摘要版本。我们的发现表明，问题描述中经常存在的多余信息，例如人类角色，背景故事和名称（包括为了帮助人类理解任务），这无助于模型理解任务。在此范围内，我们从经常使用的应用数据集中创建一个元数据，并为程序合成任务创建了新创建的CodeContests数据集。我们的元数据由人类和综合的综合编程问题组成。 Codex上的实验结果表明，我们所提出的方法在应用数据集上的基线优于基线8.13％，而CodeContests数据集的实验方法平均超过了严格的准确性。我们的分析表明，总结可显着提高入门（9.86％）和访谈（11.48％）编程问题的绩效。但是，对于竞争性编程问题，它显示出少量利润率（约2％），这意味着朝着这个方向进行研究的范围。

Despite the success of large pre-trained language models (LMs) such as Codex, they show below-par performance on the larger and more complicated programming related questions. We show that LMs benefit from the summarized version of complicated questions. Our findings show that superfluous information often present in problem description such as human characters, background stories, and names (which are included to help humans in understanding a task) does not help models in understanding a task. To this extent, we create a meta-dataset from the frequently used APPS dataset and the newly created CodeContests dataset for the program synthesis task. Our meta-dataset consists of human and synthesized summaries of the long and complicated programming questions. Experimental results on Codex show that our proposed approach outperforms baseline by 8.13% on the APPS dataset and 11.88% on the CodeContests dataset on average in terms of strict accuracy. Our analysis shows that summaries significantly improve performance for introductory (9.86%) and interview (11.48%) programming questions. However, it shows improvement by a small margin (~ 2%) for competitive programming questions, implying scope for future research in this direction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题