论文标题
评估自信而不是困惑,以使零射常识性推理
Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense Reasoning
论文作者
论文摘要
常识性推理是自然语言处理(NLP)的一个吸引人的话题,因为它在支持NLP系统的类似人类行为方面起着基本作用。以大规模的语言模型作为骨干,无监督的预培训在众多语料库中表明了捕获常识性知识的潜力。当前基于预训练的语言模型(PLM)推理遵循传统实践使用困惑度量。但是,常识性推理远远超过现有的概率评估,这是由单词频率偏见的。本文重新考虑了常识性推理的性质,并提出了一种新颖的常识性推理指标,非替代信心(NRC)。详细介绍,它根据Electra中替换的令牌检测(RTD)预训练目标的PLMS起作用,在该目标中,腐败检测目标反映了对上下文完整性的信心,而与现有概率相比,与常识性推理更相关。我们提出的新方法可以在两个常识性推理基准数据集上提高零射击性能,并在另外七个共识性提问数据集上提高了零拍摄性能。我们的分析表明,预先认识的常识性知识,尤其是对于基于RTD的PLM,对于下游推理至关重要。
Commonsense reasoning is an appealing topic in natural language processing (NLP) as it plays a fundamental role in supporting the human-like actions of NLP systems. With large-scale language models as the backbone, unsupervised pre-training on numerous corpora shows the potential to capture commonsense knowledge. Current pre-trained language model (PLM)-based reasoning follows the traditional practice using perplexity metric. However, commonsense reasoning is more than existing probability evaluation, which is biased by word frequency. This paper reconsiders the nature of commonsense reasoning and proposes a novel commonsense reasoning metric, Non-Replacement Confidence (NRC). In detail, it works on PLMs according to the Replaced Token Detection (RTD) pre-training objective in ELECTRA, in which the corruption detection objective reflects the confidence on contextual integrity that is more relevant to commonsense reasoning than existing probability. Our proposed novel method boosts zero-shot performance on two commonsense reasoning benchmark datasets and further seven commonsense question-answering datasets. Our analysis shows that pre-endowed commonsense knowledge, especially for RTD-based PLMs, is essential in downstream reasoning.