自适应标签平滑与自然语言生成中的自我知识

论文标题

自适应标签平滑与自然语言生成中的自我知识

Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation

论文作者

Lee, Dongkyu, Cheung, Ka Chun, Zhang, Nevin L.

论文摘要

过度自信已显示会损害神经网络的概括和校准。先前的研究通过将正则术语添加到损失函数中，以防止模型进行峰值分布来解决此问题。标记具有预定义的先前标签分布的平滑目标标签；结果，学会了模型以最大程度地预测软标签的可能性。但是，在所有样品中的平滑量均相同，并且仍在训练中固定。换句话说，标签平滑并不能反映模型在培训过程中映射的概率分布的变化。为了解决此问题，我们提出了一种正则化方案，该方案将动态性质带入平滑参数中，通过考虑模型概率分布，从而改变每个实例的参数。训练中的模型自我调节在正向传播过程中即时进行平滑程度。此外，我们的工作受到桥接标签平滑和知识蒸馏的最新工作的启发，我们的工作利用自我知识作为软化目标标签中的先前标签分布，并通过知识蒸馏和动态平滑参数为正则化效果提供了理论支持。我们的常规器得到了全面验证，结果说明了模型概括和校准的明显改进，增强了模型的鲁棒性和可信度。

Overconfidence has been shown to impair generalization and calibration of a neural network. Previous studies remedy this issue by adding a regularization term to a loss function, preventing a model from making a peaked distribution. Label smoothing smoothes target labels with a pre-defined prior label distribution; as a result, a model is learned to maximize the likelihood of predicting the soft label. Nonetheless, the amount of smoothing is the same in all samples and remains fixed in training. In other words, label smoothing does not reflect the change in probability distribution mapped by a model over the course of training. To address this issue, we propose a regularization scheme that brings dynamic nature into the smoothing parameter by taking model probability distribution into account, thereby varying the parameter per instance. A model in training self-regulates the extent of smoothing on the fly during forward propagation. Furthermore, inspired by recent work in bridging label smoothing and knowledge distillation, our work utilizes self-knowledge as a prior label distribution in softening target labels, and presents theoretical support for the regularization effect by knowledge distillation and the dynamic smoothing parameter. Our regularizer is validated comprehensively, and the result illustrates marked improvements in model generalization and calibration, enhancing robustness and trustworthiness of a model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题