深度学习符合非参数回归：重量已确定的DNN在本地适应性吗？

论文标题

深度学习符合非参数回归：重量已确定的DNN在本地适应性吗？

Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

论文作者

Zhang, Kaiqi, Wang, Yu-Xiang

论文摘要

我们从经典非参数回归问题的角度研究神经网络（NN）的理论，重点是NN具有异质平滑度适应性估计功能的能力 - BESOV或有界变异（BV）类中功能的属性。有关此问题的现有工作需要根据功能空间和样本尺寸来调整NN体系结构。我们考虑了Deep Relu网络的“并行NN”变体，并表明标准的$ \ ell_2 $正则化等效于在端到端学习功能基础的系数中促进$ \ ell_p $ -Sparsity（$ 0 <p <1 $），即，词典。使用这种等效性，我们进一步确定，仅通过调整正则化因子，这种平行的NN实现了一个任意接近BESOV和BV类的最小率率的估计误差。值得注意的是，随着NN的深度，它呈指数级接近最佳。我们的研究阐明了深度为何重要以及NNS比内核方法更强大的新灯。

We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample size. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard $\ell_2$ regularization is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) in the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the regularization factor, such parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题