将深度三角网络与深层流程连接起来：协方差，表现性和神经切线内核

论文标题

将深度三角网络与深层流程连接起来：协方差，表现性和神经切线内核

On Connecting Deep Trigonometric Networks with Deep Gaussian Processes: Covariance, Expressivity, and Neural Tangent Kernel

论文作者

Lu, Chi-Ken, Shafto, Patrick

论文摘要

Deep Gaussian过程（DGP）作为贝叶斯学习的先验模型，可以直观地利用功能组成中的表达能力。 DGP还提供了多种建模功能，但是推断很具有挑战性，因为潜在功能空间的边缘化是无法处理的。使用Bochner定理，具有平方指数内核的DGP可以看作是由随机特征层，正弦和余弦激活单元以及随机重量层组成的深三角网络。在用瓶颈的宽极限中，我们表明，重量空间视图产生了相同的有效协方差函数，该函数先前在功能空间中获得。同样，在网络参数上改变先前的分布相当于使用不同的内核。因此，DGP可以翻译成深瓶颈触发网络，可以通过该网络获得确切的最大后验估计。有趣的是，网络表示可以研究DGP的神经切线核，这也可能揭示了棘手的预测分布的平均值。从统计上讲，与浅网络不同，有限宽度的深网具有偏离限制内核的协方差，并且内部和外部宽度可能在功能学习中起不同的作用。存在数值模拟以支持我们的发现。

Deep Gaussian Process (DGP) as a model prior in Bayesian learning intuitively exploits the expressive power in function composition. DGPs also offer diverse modeling capabilities, but inference is challenging because marginalization in latent function space is not tractable. With Bochner's theorem, DGP with squared exponential kernel can be viewed as a deep trigonometric network consisting of the random feature layers, sine and cosine activation units, and random weight layers. In the wide limit with a bottleneck, we show that the weight space view yields the same effective covariance functions which were obtained previously in function space. Also, varying the prior distributions over network parameters is equivalent to employing different kernels. As such, DGPs can be translated into the deep bottlenecked trig networks, with which the exact maximum a posteriori estimation can be obtained. Interestingly, the network representation enables the study of DGP's neural tangent kernel, which may also reveal the mean of the intractable predictive distribution. Statistically, unlike the shallow networks, deep networks of finite width have covariance deviating from the limiting kernel, and the inner and outer widths may play different roles in feature learning. Numerical simulations are present to support our findings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题