论文标题
多级教师知识的学习曲线
Learning curves for the multi-class teacher-student perceptron
论文作者
论文摘要
高维学习理论中最古典的结果之一为二进制分类的概括误差提供了封闭形式的表达,并在I.I.D.上使用了单层教师学生的感知。高斯输入。在此环境中,对贝叶斯最佳估计和经验风险最小化(ERM)进行了广泛的分析。同时,现代机器学习实践的相当一部分涉及多类分类。然而,缺少对相应的多级教师培训的类似分析。在本手稿中,我们通过得出和评估高维度中的贝叶斯最佳和ERM概括误差的渐近表达来填补这一空白。对于高斯教师的体重,我们研究了ERM的横向和正方形损失的表现,并探讨了山脊正则化在接近贝叶斯戏时的作用。特别是,我们观察到正则化的跨透镜最小化产生了几乎最佳的精度。取而代之的是,对于二元老师,我们表明贝叶斯最佳性能出现了一阶相过渡。
One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the corresponding multi-class teacher-student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for both the Bayes-optimal and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher weights, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for a binary teacher we show that a first-order phase transition arises in the Bayes-optimal performance.