论文标题
通过SGD训练的过度参数化元学习的可证明概括
Provable Generalization of Overparameterized Meta-learning Trained with SGD
论文作者
论文摘要
尽管深度学习的经验成功取得了较高的成功,但对过度参数化元学习的理论理解仍然有限。本文研究了一种广泛使用的元学习方法,模型不合稳定学习(MAML)的概括,该方法旨在找到快速适应新任务的良好初始化。在混合线性回归模型下,我们分析了在过度参数化制度中用SGD训练的MAML的概括。我们为MAML的多余风险提供上限和下限,这捕获了SGD动态如何影响这些泛化界限。通过这种鲜明的特征,我们进一步探讨了各种学习参数如何影响过度参数化MAML的概括能力,包括明确识别典型的数据和任务分布,这些数据和任务分布可以通过过度参数化来减少概括性误差,并表征适应性学习率对多余风险和早期停止时间的影响。我们的理论发现将通过实验进一步验证。
Despite the superior empirical success of deep meta-learning, theoretical understanding of overparameterized meta-learning is still limited. This paper studies the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML), which aims to find a good initialization for fast adaptation to new tasks. Under a mixed linear regression model, we analyze the generalization properties of MAML trained with SGD in the overparameterized regime. We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds. With such sharp characterizations, we further explore how various learning parameters impact the generalization capability of overparameterized MAML, including explicitly identifying typical data and task distributions that can achieve diminishing generalization error with overparameterization, and characterizing the impact of adaptation learning rate on both excess risk and the early stopping time. Our theoretical findings are further validated by experiments.