论文标题
辍学神经网络中的通用近似
Universal Approximation in Dropout Neural Networks
论文作者
论文摘要
我们证明了一系列辍学神经网络的两个通用近似定理。这些是馈送前向神经网络,在其中给每个边缘一个随机的$ \ {0,1 \} $ - 有价值的过滤器,它们具有两种操作模式:第一个每个边缘输出乘以其随机过滤器,从而导致随机输出,而在第二个边缘中,每个边缘在第二个边缘输出中,每个边缘输出乘以其过滤器的预期,导致了确定性的输出。在训练和预测过程中使用随机模式和确定模式是常见的。 这两种定理均为以下形式:给出了近似函数和一个阈值$ \ varepsilon> 0 $,存在$ \ varepsilon $ - 概率和$ l^q $的辍学网络。第一个定理适用于随机模式下的辍学网络。它在激活函数上几乎没有假设,适用于广泛的网络,甚至可以应用于神经网络以外的近似方案。核心是一个代数属性,表明确定性网络可以通过随机网络完全匹配。第二个定理做出了更强的假设,并给出了更强的结果。给定近似函数,它提供了同时在两种模式中近似的网络的存在。证明组件是通过独立副本递归边缘的递归替换,并且是一个特殊的一层替换,可以将最大的网络与输入相结合。 假定要近似的函数是一般规范空间的要素,并且在相应的规范中测量了近似值。网络是明确构建的。由于有不同的证明方法,这两个结果给出了对随机辍学网络的近似属性的独立见解。因此,我们确定辍学神经网络可以广泛满足通用的抗焦化属性。
We prove two universal approximation theorems for a range of dropout neural networks. These are feed-forward neural networks in which each edge is given a random $\{0,1\}$-valued filter, that have two modes of operation: in the first each edge output is multiplied by its random filter, resulting in a random output, while in the second each edge output is multiplied by the expectation of its filter, leading to a deterministic output. It is common to use the random mode during training and the deterministic mode during testing and prediction. Both theorems are of the following form: Given a function to approximate and a threshold $\varepsilon>0$, there exists a dropout network that is $\varepsilon$-close in probability and in $L^q$. The first theorem applies to dropout networks in the random mode. It assumes little on the activation function, applies to a wide class of networks, and can even be applied to approximation schemes other than neural networks. The core is an algebraic property that shows that deterministic networks can be exactly matched in expectation by random networks. The second theorem makes stronger assumptions and gives a stronger result. Given a function to approximate, it provides existence of a network that approximates in both modes simultaneously. Proof components are a recursive replacement of edges by independent copies, and a special first-layer replacement that couples the resulting larger network to the input. The functions to be approximated are assumed to be elements of general normed spaces, and the approximations are measured in the corresponding norms. The networks are constructed explicitly. Because of the different methods of proof, the two results give independent insight into the approximation properties of random dropout networks. With this, we establish that dropout neural networks broadly satisfy a universal-approximation property.