论文标题
考虑到相同数量的参数,较宽的网络是否更好?
Are wider nets better given the same number of parameters?
论文作者
论文摘要
实证研究表明,神经网络的性能随着参数数量的增加而提高。在大多数研究中,通过增加网络宽度来增加参数的数量。这就提出了一个问题:由于参数数量较大而观察到的改进是由于宽度本身较大而引起的?我们比较增加模型宽度的不同方式,同时保持参数恒定的数量。我们表明,对于在重量张量中以随机的,静态稀疏模式初始初始化的模型,网络宽度是良好性能的决定因素,而权重的数量是次要的,只要确保训练性。为了理解这种效果,我们在高斯过程内核的框架中分析了这些模型。我们发现,初始化时稀疏有限宽度模型内核与无限宽度内核之间的距离表示模型性能。
Empirical studies demonstrate that the performance of neural networks improves with increasing number of parameters. In most of these studies, the number of parameters is increased by increasing the network width. This begs the question: Is the observed improvement due to the larger number of parameters, or is it due to the larger width itself? We compare different ways of increasing model width while keeping the number of parameters constant. We show that for models initialized with a random, static sparsity pattern in the weight tensors, network width is the determining factor for good performance, while the number of weights is secondary, as long as trainability is ensured. As a step towards understanding this effect, we analyze these models in the framework of Gaussian Process kernels. We find that the distance between the sparse finite-width model kernel and the infinite-width kernel at initialization is indicative of model performance.