论文标题

元训练的代理商实施贝叶斯最佳代理商

Meta-trained agents implement Bayes-optimal agents

论文作者

Mikulik, Vladimir, Delétang, Grégoire, McGrath, Tom, Genewein, Tim, Martic, Miljan, Legg, Shane, Ortega, Pedro A.

论文摘要

基于内存的元学习是一种强大的技术,可以快速适应目标分布中任何任务的代理。先前的一项理论研究认为,这种出色的表现是因为元训练方案激发了代理人的表现。我们对许多预测和强盗任务进行经验研究。受理论计算机科学的想法的启发,我们表明,元学习和贝叶斯最佳的代理不仅表现相似,而且甚至具有相似的计算结构,从某种意义上说,一个代理系统可以大致模拟另一种代理。此外,我们表明贝叶斯最佳剂是元学习动力学的固定点。我们的结果表明,基于内存的元学习可能是一种通用技术,用于数值近似于贝叶斯最佳的代理,即使我们目前不具有可拖延模型的任务分布也是如此。

Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源