论文标题
从理论上讲,低级别的软马克斯可以拥有不可舒服的类别,但在实践中很少
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice
论文作者
论文摘要
自然语言处理(NLP)中的分类器通常具有大量的输出类。例如,神经语言模型(LMS)和机器翻译(MT)模型都可以从成千上万的词汇中预测令牌。这些模型的SoftMax输出层通常会作为输入的密集特征表示,其维度远低于输出。从理论上讲,结果可能是不可能通过argmax预测的话,而不论输入特征,从经验上讲,有证据表明这在小语言模型中发生。在本文中,我们询问它是否可以在实用的大语言模型和翻译模型中发生。为此,我们开发了算法来检测公共模型中的这种\ emph {Unargmaxable}令牌。我们发现,150个模型中有13个确实具有这样的令牌。但是,它们非常罕见,不太可能影响模型质量。我们发布代码,以便其他人可以检查他们的模型。
Classifiers in natural language processing (NLP) often have a large number of output classes. For example, neural language models (LMs) and machine translation (MT) models both predict tokens from a vocabulary of thousands. The Softmax output layer of these models typically receives as input a dense feature representation, which has much lower dimensionality than the output. In theory, the result is some words may be impossible to be predicted via argmax, irrespective of input features, and empirically, there is evidence this happens in small language models. In this paper we ask whether it can happen in practical large language models and translation models. To do so, we develop algorithms to detect such \emph{unargmaxable} tokens in public models. We find that 13 out of 150 models do indeed have such tokens; however, they are very infrequent and unlikely to impact model quality. We release our code so that others can inspect their models.