学习能量网络，具有普遍的Fenchel-Young损失

论文标题

学习能量网络，具有普遍的Fenchel-Young损失

Learning Energy Networks with Generalized Fenchel-Young Losses

论文作者

Blondel, Mathieu, Llinares-López, Felipe, Dadashi, Robert, Hussenot, Léonard, Geist, Matthieu

论文摘要

基于能量的模型，又称能量网络，通过优化能量函数（通常由神经网络参数）来执行推理。这使人们可以捕获输入和输出之间的潜在复杂关系。要了解能量函数的参数，该优化问题的解决方案通常被送入损失函数。训练能源网络的主要挑战在于计算损失梯度，因为这通常需要Argmin/argmax分化。在本文中，建立在共轭功能的广义概念的基础上，该函数将通常的双线性配对替换为一般的能量函数，我们提出了普遍的Fenchel-young损失，这是学习能量网络的自然损失构建。我们的损失享有许多理想的特性，并且可以有效地计算其梯度，而无需Argmin/argmax分化。在线性凸置能量的情况下，我们还证明了它们过量风险的校准。我们证明了我们对多标签分类和模仿学习任务的损失。

Energy-based models, a.k.a. energy networks, perform inference by optimizing an energy function, typically parametrized by a neural network. This allows one to capture potentially complex relationships between inputs and outputs. To learn the parameters of the energy function, the solution to that optimization problem is typically fed into a loss function. The key challenge for training energy networks lies in computing loss gradients, as this typically requires argmin/argmax differentiation. In this paper, building upon a generalized notion of conjugate function, which replaces the usual bilinear pairing with a general energy function, we propose generalized Fenchel-Young losses, a natural loss construction for learning energy networks. Our losses enjoy many desirable properties and their gradients can be computed efficiently without argmin/argmax differentiation. We also prove the calibration of their excess risk in the case of linear-concave energies. We demonstrate our losses on multilabel classification and imitation learning tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题