论文标题
GPT-NEO用于常识性推理 - 理论和实用的镜头
GPT-Neo for commonsense reasoning -- a theoretical and practical lens
论文作者
论文摘要
最近的工作表明,在训练大型语言模型(LLMS)中,对下游任务进行了微调,并进行了微调。在本文中,我们使用$ 6 $ Commonsense推理基准任务评估GPT-NEO模型的性能。我们旨在使用GPT-NEO模型来检查较小型号的性能,以与几个较大的模型基线,例如GPT- $ 3 $,LLAMA- $ 2 $,MPT和FALCON。在用适当的超参数进行微调时,我们的模型在多个任务上实现了竞争精度。我们还使用注意力头可视化研究并证实了我们的结果,以更好地了解模型性能。最后,我们使用各种方法在众多设置下测量模型性能进行了各种鲁棒性测试。
Recent work has demonstrated substantial gains in pre-training large-language models (LLMs) followed by supervised fine-tuning on the downstream task. In this paper, we evaluate the performance of the GPT-neo model using $6$ commonsense reasoning benchmark tasks. We aim to examine the performance of smaller models using the GPT-neo models against several larger model baselines such as GPT-$3$, Llama-$2$, MPT and Falcon. Upon fine-tuning with the appropriate set of hyperparameters, our model achieves competitive accuracy on several tasks. We also investigate and substantiate our results using attention-head visualization to better understand the model performance. Finally, we conduct various robustness tests using various methods to gauge the model performance under numerous settings.