新手类型错误诊断自然语言模型

论文标题

新手类型错误诊断自然语言模型

Novice Type Error Diagnosis with Natural Language Models

论文作者

Geng, Chuqin, Ye, Haolin, Li, Yixuan, Han, Tianyu, Pientka, Brigitte, Si, Xujie

论文摘要

强大的静态类型系统可帮助程序员消除许多错误，而没有太多提供类型注释的负担。但是，这种灵活性使得诊断不体计划的程序高度不足，尤其是对于新手程序员而言。与经典的约束解决和基于优化的方法相比，数据驱动的方法在以更高的精度识别类型误差的根本原因方面表现出了巨大的希望。这项工作不依靠手工设计的功能，而是探索了自然语言模型的类型错误本地化，可以以端到端的方式对其进行训练，而无需任何功能。我们证明，对于新手类型错误诊断，基于语言模型的方法显着优于先前的最新数据驱动方法。具体而言，我们的模型可以在62％的时间内正确预测类型误差，以更严格的精度度量标准，使Nate的数据驱动模型的表现优于最先进的数据驱动模型。此外，我们还应用结构探针来解释不同语言模型之间的性能差异。

Strong static type systems help programmers eliminate many errors without much burden of supplying type annotations. However, this flexibility makes it highly non-trivial to diagnose ill-typed programs, especially for novice programmers. Compared to classic constraint solving and optimization-based approaches, the data-driven approach has shown great promise in identifying the root causes of type errors with higher accuracy. Instead of relying on hand-engineered features, this work explores natural language models for type error localization, which can be trained in an end-to-end fashion without requiring any features. We demonstrate that, for novice type error diagnosis, the language model-based approach significantly outperforms the previous state-of-the-art data-driven approach. Specifically, our model could predict type errors correctly 62% of the time, outperforming the state-of-the-art Nate's data-driven model by 11%, in a more rigorous accuracy metric. Furthermore, we also apply structural probes to explain the performance difference between different language models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题