关于推理和学习的动力

论文标题

关于推理和学习的动力

On the Dynamics of Inference and Learning

论文作者

Berman, David S., Heckman, Jonathan J., Klinger, Marc

论文摘要

统计推断是确定在给定数据集的模型参数空间上确定概率分布的过程。随着越来越多的数据可用，此概率分布将通过贝叶斯定理的应用进行更新。我们将这种贝叶斯更新过程作为连续的动力系统提供了处理。然后，统计推断受一阶微分方程的控制，该方程描述了由模型的参数家族确定的信息几何形状中的轨迹或流。我们为一些简单的模型解决了该方程式，并表明，当Cramér-rao界限饱和时，学习率由简单的$ 1/t $ power-law管辖，$ t $ $ t $ th $ tim-lip-listial变量表示数据数量。隐藏变量的存在可以在此设置中纳入，从而导致产生的流程方程中的额外驾驶项。我们使用基于高斯和高斯随机过程的分析和数值示例来说明这一点，并推断1D ISING模型中耦合常数的推断。最后，我们将贝叶斯流的定性行为与基准数据集（例如MNIST和CIFAR10）上的各种神经网络展示的定性行为进行了比较，并展示了如何满足最终损失的网络，也满足了简单的损失。

Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cramér-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law, with $T$ a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We illustrate this with both analytic and numerical examples based on Gaussians and Gaussian Random Processes and inference of the coupling constant in the 1D Ising model. Finally we compare the qualitative behaviour exhibited by Bayesian flows to the training of various neural networks on benchmarked data sets such as MNIST and CIFAR10 and show how that for networks exhibiting small final losses the simple power-law is also satisfied.

下载PDF全文

下载文献需遵守相关版权规定

论文标题