伯特的层有什么特别之处？仔细观察单语和多语言模型中的NLP管道

论文标题

伯特的层有什么特别之处？仔细观察单语和多语言模型中的NLP管道

What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models

论文作者

de Vries, Wietse, van Cranenburgh, Andreas, Nissim, Malvina

论文摘要

窥视BERT的内部工作，表明其层类似于经典的NLP管道，并且逐渐更复杂的任务集中在以后的层中。为了调查这些结果在何种程度上适用于英语以外的其他语言，我们探讨了基于荷兰的BERT模型和用于荷兰NLP任务的多语言BERT模型。此外，通过对言论部分标记的更深入的分析，我们还表明，在给定的任务中，信息也分布在网络的不同部分上，管道可能不像看起来那样整洁。每个层都有不同的专业知识，因此结合不同层的信息可能更有用，而不是根据最佳整体性能选择单个信息。

Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipeline, with progressively more complex tasks being concentrated in later layers. To investigate to what extent these results also hold for a language other than English, we probe a Dutch BERT-based model and the multilingual BERT model for Dutch NLP tasks. In addition, through a deeper analysis of part-of-speech tagging, we show that also within a given task, information is spread over different parts of the network and the pipeline might not be as neat as it seems. Each layer has different specialisations, so that it may be more useful to combine information from different layers, instead of selecting a single one based on the best overall performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题