论文标题
与连通性切线内核的比例不变贝叶斯神经网络
Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel
论文作者
论文摘要
解释概括并防止过度自信的预测是关于神经网络损失格局的核心目标。在这种情况下,平坦度定义为预训练解决方案的扰动上的损失不可分性,被广泛接受为概括的预测指标。但是,可以根据参数的规模任意更改平坦度和泛化界限的问题,并且先前的研究部分解决了问题:违反直觉,它们的泛型范围仍然是函数保护参数缩放转换或仅限于不切实际网络结构的变体。作为一个更基本的解决方案,我们提出了通过\ textIt {分解}参数的规模和连接性来扩展变换的新的先验和后验分布,从而允许所得的泛化绑定到描述一系列网络的概括性,以及更实用的变换类别,例如重量衰减,并逐步降低了批量识别率。我们还表明,上述问题会不利地影响拉普拉斯近似的不确定性校准,并使用我们的不变后验提出了解决方案。我们从经验上证明了我们的后验在这种实际的参数转换案例中提供了有效的固定性和校准度量,其复杂性低,从而支持其实际有效性,这与我们的理由相一致。
Explaining generalizations and preventing over-confident predictions are central goals of studies on the loss landscape of neural networks. Flatness, defined as loss invariability on perturbations of a pre-trained solution, is widely accepted as a predictor of generalization in this context. However, the problem that flatness and generalization bounds can be changed arbitrarily according to the scale of a parameter was pointed out, and previous studies partially solved the problem with restrictions: Counter-intuitively, their generalization bounds were still variant for the function-preserving parameter scaling transformation or limited only to an impractical network structure. As a more fundamental solution, we propose new prior and posterior distributions invariant to scaling transformations by \textit{decomposing} the scale and connectivity of parameters, thereby allowing the resulting generalization bound to describe the generalizability of a broad class of networks with the more practical class of transformations such as weight decay with batch normalization. We also show that the above issue adversely affects the uncertainty calibration of Laplace approximation and propose a solution using our invariant posterior. We empirically demonstrate our posterior provides effective flatness and calibration measures with low complexity in such a practical parameter transformation case, supporting its practical effectiveness in line with our rationale.