贝尔特学的入门：我们对伯特如何工作的了解

论文标题

贝尔特学的入门：我们对伯特如何工作的了解

A Primer in BERTology: What we know about how BERT works

论文作者

Rogers, Anna, Kovaleva, Olga, Rumshisky, Anna

论文摘要

基于变压器的模型已在NLP的许多领域推动了最新技术，但是我们对他们成功背后的内容的理解仍然有限。本文是对流行BERT模型的150多项研究的首次调查。我们回顾有关伯特工作原理的当前知识状态，它所学的哪种信息以及如何表示，对其培训目标和体系结构，过度参数化问题以及压缩方法进行了共同的修改。然后，我们概述了未来研究的方向。

Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression. We then outline directions for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题