论文标题
超越存储容量:数据驱动的满足性过渡
Beyond the storage capacity: data driven satisfiability transition
论文作者
论文摘要
数据结构对神经网络的特性有巨大的影响,但是在既定的理论框架中的重要性却鲜为人知。在这里,我们计算在分组为同样标记的子集的数据上运行的内核机的Vapnik-Chervonenkis熵。与非结构化方案不同,熵在训练集的大小上是非单调的,除了存储容量以外,还显示了一个额外的关键点。值得注意的是,即使在随机标记的数据中,边缘分类器也会发生相同的行为,这可以通过识别编码过渡的突触体积来阐明。这些发现揭示了表达性的方面,而不是存储容量提供的凝结描述,它们指示了对神经网络的概括误差的更现实界限的途径。
Data structure has a dramatic impact on the properties of neural networks, yet its significance in the established theoretical frameworks is poorly understood. Here we compute the Vapnik-Chervonenkis entropy of a kernel machine operating on data grouped into equally labelled subsets. At variance with the unstructured scenario, entropy is non-monotonic in the size of the training set, and displays an additional critical point besides the storage capacity. Remarkably, the same behavior occurs in margin classifiers even with randomly labelled data, as is elucidated by identifying the synaptic volume encoding the transition. These findings reveal aspects of expressivity lying beyond the condensed description provided by the storage capacity, and they indicate the path towards more realistic bounds for the generalization error of neural networks.