论文标题
干扰掩蔽:用于分析和解释Bert的无参数探测
Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT
论文作者
论文摘要
通过引入一小部分其他参数,探测学会了使用特征表示(例如,上下文化的嵌入式)以有监督的方式以监督方式求解特定的语言任务(例如,依赖性解析)。这种探测任务的有效性被视为证据表明预训练的模型编码语言知识。但是,这种评估语言模型的方法因探测本身所学的知识量的不确定性而破坏了。与这些作品的补充,我们提出了一种无参数探测技术,用于分析预训练的语言模型(例如BERT)。我们的方法不需要从探测任务中进行直接监督,也不需要向探测过程介绍其他参数。我们在伯特(Bert)上的实验表明,使用我们的方法从伯特(Bert)回收的句法树明显优于语言未知的基线。我们进一步将经验诱导的依赖性结构馈入下游情绪分类任务,并找到与人为设计的依赖模式兼容甚至优越的改进。
By introducing a small set of additional parameters, a probe learns to solve specific linguistic tasks (e.g., dependency parsing) in a supervised manner using feature representations (e.g., contextualized embeddings). The effectiveness of such probing tasks is taken as evidence that the pre-trained model encodes linguistic knowledge. However, this approach of evaluating a language model is undermined by the uncertainty of the amount of knowledge that is learned by the probe itself. Complementary to those works, we propose a parameter-free probing technique for analyzing pre-trained language models (e.g., BERT). Our method does not require direct supervision from the probing tasks, nor do we introduce additional parameters to the probing process. Our experiments on BERT show that syntactic trees recovered from BERT using our method are significantly better than linguistically-uninformed baselines. We further feed the empirically induced dependency structures into a downstream sentiment classification task and find its improvement compatible with or even superior to a human-designed dependency schema.