语言建模的上下文温度

论文标题

语言建模的上下文温度

Contextual Temperature for Language Modeling

论文作者

Wang, Pei-Hsin, Hsieh, Sheng-Iou, Chang, Shih-Chieh, Chen, Yu-Ting, Pan, Jia-Yu, Wei, Wei, Juan, Da-Chang

论文摘要

温度缩放已被广泛用作控制分布平滑度的有效方法，这有助于各种任务中的模型性能。应用温度缩放的当前实践假设固定或手动制作的动态变化时间表。但是，我们的研究表明，每个类别的个体最佳轨迹可以随着上下文而变化。为此，我们提出了上下文温度，这是一种广义方法，该方法在上下文中学习了每个词汇的最佳温度轨迹。实验结果证实，所提出的方法显着改善了最先进的语言模型，在Penn Treebank和Wikitext-2的测试集上分别达到了55.31和62.89的困惑。深入的分析表明，学到的温度时间表的行为因词汇而异，最佳时间表有助于控制不确定性。这些证据进一步证明了对拟议方法的需求及其优势比固定温度计划。

Temperature scaling has been widely used as an effective approach to control the smoothness of a distribution, which helps the model performance in various tasks. Current practices to apply temperature scaling assume either a fixed, or a manually-crafted dynamically changing schedule. However, our studies indicate that the individual optimal trajectory for each class can change with the context. To this end, we propose contextual temperature, a generalized approach that learns an optimal temperature trajectory for each vocabulary over the context. Experimental results confirm that the proposed method significantly improves state-of-the-art language models, achieving a perplexity of 55.31 and 62.89 on the test set of Penn Treebank and WikiText-2, respectively. In-depth analyses show that the behaviour of the learned temperature schedules varies dramatically by vocabulary, and that the optimal schedules help in controlling the uncertainties. These evidences further justify the need for the proposed method and its advantages over fixed temperature schedules.

下载PDF全文

下载文献需遵守相关版权规定

论文标题